Magic Pipes (by alaric)
One of the neat things in Unix is that you can make 'shell pipelines' of commands, from a suite of tools that come with most Unix systems, by feeding the output of one command into the input of another.
For example, the du tool lists the space taken by each file or subdirectory beneath you if you run it like du -sk *. My home directory yields:
608 SN9C325 Series WebCam Driver OSX 1[1].0.zip
1776 SONiX USB 2[1][1].0 PC Camera Driver OSX 1.1.zip
4 alaric-adventure
113120 chicken-scheme
152 dwm
39306 kitten-technologies.co.uk
13462 n2n
3472 netbsd.lzma
76 pdb
398596 perplexity-pkgsrc
22296 projects
2352 sonix201_for_MAC.rar
8 test
2 test.c
15892 test.ugarit
38 tmp
2 ugarit.conf
2 ugarit2.conf
But often I want to see them sorted by size. The sort tool can do that - if invoked as sort -n it will read lines in, find a number at the start of each line, and sort the lines by those numbers. So I can feed the output of du -sk * into sort -n with one command, du -sk * | sort -n, where the | tells the shell to pipe the output of the first command into the second. Lo, the output of the command is:
2 test.c
2 ugarit.conf
2 ugarit2.conf
4 alaric-adventure
8 test
38 tmp
76 pdb
152 dwm
608 SN9C325 Series WebCam Driver OSX 1[1].0.zip
1776 SONiX USB 2[1][1].0 PC Camera Driver OSX 1.1.zip
2352 sonix201_for_MAC.rar
3472 netbsd.lzma
13462 n2n
15892 test.ugarit
22296 projects
39306 kitten-technologies.co.uk
113120 chicken-scheme
398596 perplexity-pkgsrc
If I wanted it largest first, I could use sort -nr instead to reverse the sort.
Now, the set of tools that come with Unix for making up such pipelines tend to be line-oriented, and have relatively crude support for operations within the line - sorting by a number midway through each line, as might be found in a complex table, is hard work, especially as there's a number of ways the table could be formatted (fixed width columns, CSV with any of a number of quoting conventions, etc).
And line-oriented tools are a pain when dealing with files written in formats that deserve the title "language", such as HTML, XML, or source code, as they tend to have complex structure that can be rearranged over lines to suit the asthetic tastes of the programmer.
Indeed, such a case came up on IRC lately, with somebody wanting to quickly analyse the dependencies between Chicken eggs, which are directories with certain specially-named metadata files inside that list the other eggs they require. He cobbled something together with grep that found most of the dependencies, but doing it properly and handling dependency lists that went onto multiple lines and so on would need proper s-expression parsing.
But a thought came to me. Why not put together some shell commands to match the line-oriented grep, sort, sed, awk and friends, but treating input and output files as lists of s-expressions? They'd be trivial to write.
Here's an initial idea for a set:
filteraccepts a Scheme expression on the command line, and usesevalto evaluate it on each s-expression read from stdin in turn (binding the read s-expression toINPUT), and outputs the s-expression if the expression evaluates to a true value.echo "a b c (foo)" | filter "(pair? INPUT)"would output(foo).mapaccepts a Scheme expression on the command line, and likewise usesevalto evaluate it on each s-expression read from stdin in turn (binding the read s-expression toINPUT), and outputs the result of the evaluation.echo "(1 2) (3 4)" | map "(car INPUT)"would output1 3.foldaccepts two Scheme expressions on the command line, referred to askonsandknil, and evaluatesknilwithevalto obtain the initial accumulator value. It then evaluateskonsin turn for each s-expression read from stdin, withINPUTbound to the read s-expression andSTATEbound to the current accumulator value, and makes the result of the evaluation be the new accumulator value. When standard input reports EOF, the final accumulator value is output.echo "1 2 3" | fold "(+ INPUT STATE)" "0"would output6(the sum of the input numbers).sortaccepts two Scheme expressions on the command line, but both are optional, with defaults. One we shall callextractwhich defaults toINPUT, the other we shall callcomparewhich defaults to(< A B). It reads all the s-expressions from standard input into a list, then sorts it, usingcompareas the comparison function, withAbound to the result of evaluatingextractwithINPUTbound to the first s-expression being compared, andBbound to the result of evaluatingextractwithINPUTbound to the second s-expression being compared. The resulting list is then output. If we use-cto introduce the comparison function and-eto introduce the extraction function, thenecho "(food \"cheese\") (death \"burning\") (cheese \"edam\")" | sort -e "(cadr INPUT)" -c "(string<? A B)"would output(death "burning") (food "cheese") (cheese "edam")- sorting by the second element of each list as a string.flattenreads s-expressions from its input, and if they are proper lists, outputs each member of the list in turn, otherwise outputs the whole s-expression. Note that it does no recurse down into lists, it only flattens the top level.echo "1 (2) (3 4) (5 (6))" | flattenwould output1 2 3 4 5 (6).groupaccepts a Scheme expression on the command line, which is the grouping function. It reads an s-expression from the input and evaluates the grouping function withINPUTbound to the input s-expression, and saves the result as the "current group value", and puts the input s-expression into a list called the "current group". It then iterates over the remaining s-expressions in the input, again evaluating the grouping function for each; if the result is different to the current group value then the current group is output as a list s-expression, then the new group value becomes the current group value and the current line becomes the only member of the new current group. Soecho "1 2 foo bar 3 4" | group "(symbol? INPUT)"would output(1 2) (foo bar) (3 4).
Clearly, for security, we'll need to just use plain R5RS read to input s-expressions - no special read syntax that might execute arbitrary code can be allowed, so a sandboxed readtable will be needed in implementations like Chicken that have a lot of special read syntax. But the full read can be used for parsing the Scheme expressions supplied as arguments, in order to take advantage of the nice syntactic sugar they provide (as it's arbitrary code we're reading anyway).
As well as the arguments listed above, there should probably be arguments that all the tools accept, to set things like input and output character encodings for strings, to supply a Scheme expression to evaluate before anything else happens to define or load any utility functions, and to choose between print as the output function (with a newline after each s-expression) or a pretty-printer, etc. Also, anything the Scheme expressions the user provides output to standard-output-port should be sent to stderr so they appear on the console rather than ending up in the output stream.
Can anyone think of any important tools I've missed? map does the job of much of sed, awk, cut, etc. in line-based shell pipelines. (consider how the match pattern-matching macro inside a map compares to sed...) I'm wondering about an extract that uses some abbreviated syntax (something like an xpath?) to extract a subexpression out of every s-expression - there could be multiple matches, so it should output a list of matches for every input s-expression, meaning that inputs that have no matches are represented by an empty list in the output. If this information is not required, a quick extra flatten in your pipeline will discard it.
Perhaps we need some tools to convert from non-sexpr notations and back again - perhaps a parse tool that takes a regexp and a Scheme expression, matches each input line against the regexp, then outputs the result of evaluation the Scheme expression with $1 etc. bound to the groups within the regexp, and another tool that outputs nothing to stdout itself, but evaluates a command-line-supplied Scheme expression upon each line with standard-output-port actually sending to stdout, so people can use things like the fmt egg to format s-expressions into raw text output (or just display for simple needs).
Maybe the next step would be s-expression-outputting versions of ls and ps to make them easier to handle 🙂

By Ben, Thu 25th Jun 2009 @ 11:51 am
Check out Windows PowerShell. Instead of returning text, it returns objects, which can then be operated on by the next command in sequence.
Actually looks quite shiny.
By @ndy, Tue 30th Jun 2009 @ 7:54 am
sort -t, -k3
In the cheese example, what is cadr? | sort -e "(cadr INPUT)" -c "(string<? A B)" Should it be cdr?
By alaric, Tue 30th Jun 2009 @ 9:41 am
Ah, cadr is a shorthand function for (car (cdr ...))
If INPUT is (a b), then (car INPUT) is a, (cdr INPUT) is (b), and (cadr INPUT) is (car '(b)) which is just b.
By alaric, Thu 13th Aug 2009 @ 10:12 am
foof (author of the aforementioned fmt egg) made a good suggestion on IRC: As well as supporting the
-eflag for a sort expression, for the common case of-e "(foo INPUT)", we can have a shorthand of-k foo, wherefoois any function (eg,cadr).By Michal, Thu 6th Sep 2012 @ 8:47 am
Hello I have found name of my webcam in your example. This is only one place where I can see something related to the camera and OSX. Do you know anything about driver for the camera for MAC OSX? Please give me a hint where I can look for driver.