Pipe dreams

I've been thinking about pipes recently, the kind Doug McIlroy invented with Ken and Dennis back at the dawn of Unix. Pipes demanded some interesting programming approaches: do one job only, and do it well; a text-level API for programs that are easy to explain, debug, save; have the kernel support lightweight processes, not the monsters of VMS and the Microsoft descendants, etc.

I've gone a long way with the pipe/script/sed/grep/awk programming style (we need a name for this.) Lumeta's code was about 60,000 lines of code, about half of that C and the rest scripts, last time I checked a few years ago. The company has "updated" much of this code, and I understand that it runs slower.

This would be a great topic for a short book, which I am considering. Sort of an updated K&P for scripting in a modern world. Short, fun, some stories, lots of samples, some new filters, etc. I may well do it.

But I have been thinking about pipes because the computing world is becoming multicore, concurrent programming is either trivial or very very hard, and pipes use multicores naturally, with no diminishment of comprehension, locks, mutexes, and so on. Of course, only certain classes of problems are easy to solve this way, but if Java or C++ is what you know for programming, these solutions will not be obvious to you.

Did Doug think about multiprocessors and pipes? I bet so, though they were rare back then.

Finally, I wonder how modern Unix-like systems handle pipes these days. Does the kernel sometimes take data right out of the sending processes's write buffer and straight into the receiver's read buffer? Does the data always go into a kernel buffer, as I believe it used to? Does it tend to remain in primary or secondary cache, and that's efficient enough? Does this all need a kernel rethink in the face of looming multi-cores?

Should awk be rethought and parallelized? My CPUs spend a lot of time in awk.

Most of this is postmature optimization for me. I use it like I got it, and let the CPU do the work, whether it is the single core at home, or the eight cores at work.