Worth Repeating: Rob Pike “Data dominates.” and Frederick Brooks “Representation is the Essence of Programming”

Rob Pike is a famous name in programming with a history going back to Bell Labs, co-author of two often quoted books, and today works at Google.

Back in February 1989 he wrote an essay, “Notes on Programming in C” which many consider contains insight to the “Unix Philosophy”.

One of the sections of the essay people focus on were six rules he listed on complexity. Rule number 5 is:

“Data dominates. If you’ve chosen the right data structures and organized things well, the algorithms will almost always be self­ evident. Data structures, not algorithms, are central to programming.”

I’ve seen people summarize that rule (why summarize three sentences?!) into “write stupid code that uses smart objects”, but I believe that misses the point.

To help us understand the context behind the rule, Pike cites Frederick Brooks’ “The Mythical Man-Month” p. 102. Here it is for your edification:

Representation is the Essence of Programming

Beyond craftmanship lies invention, and it is here that lean, spare, fast programs are born. Almost always these are the result of strategic breakthrough rather than tactical cleverness. Sometimes the strategic breakthrough will be a new algorithm, such as the Cooley-Tukey Fast Fourier Transform or the substitution of an n log n sort for an n2 set of comparisons.

Much more often, strategic breakthrough will come from redoing the representation of the data or tables. This is where the heart of your program lies. Show me your flowcharts and conceal your tables, and I shall be continued to be mystified. Show me your tables, and I won’t usually need your flowcharts; they’ll be obvious.

It is easy to multiply examples of the power of representations. I recall a young man undertaking to build an elaborate console interpreter for an IBM 650. He ended up packing it onto an incredibly small amount of space by building an interpreter for the interpreter, recognizing that human interactions are slow and infrequent, but space was dear. Digitek’s elegant little Fortran compiler uses a very dense, specialized representation for the compiler code itself, so that external storage is not needed. That time lost in decoding this representation is gained back tenfold by avoiding input-output. (The exercieses at the end of Chapter 6 in Brooks and Inversion, “Automatic Data Processing” include a collection of such examples, as do many of Knuth’s exercises.)

The programmer at wit’s end for lack of space can often do best by disentangling himself from his code, rearing back, and contemplating his data. Representation is the essence of programming

References:

Wikipedia: Unix_philosophy

Eric Steven Raymond: The Art of Unix Programming: Basics of the Unix Philosophy