"This is the Unix philosophy. Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams because that is a universal interface." - Doug McIlroy, creator of Unix pipelines
The humble pipe character, |
, is easy to overlook in long chains of Unix-style commands. But what the pipe enables – easy communication between independent programs – is essentially what made it possible (for better or for worse), for Unix to have the toolbox that it has.
Via Doug McIlroy:
Make each program do one thing well. To do a new job, build afresh rather than complicate old programs by adding new features… Expect the output of every program to become the input to another, as yet unknown, program.
Before reading all the ins and outs of standard input and standard output, here's a quick review of the syntax.
The echo
command, rigorously following the Unix philosophy of "do one thing, and do it well", takes the text given to it and outputs it to screen:
user@host $ echo Hey World
Hey World
However, by piping echo
's output into a new program, such as tr
, I can filter and transform the text.
And by redirecting echo
's output, I can add to an existing file (using >>
) or create a new file (using >
).
How to pipe:
user@host $ echo Hey World | tr '[:upper:]' '[:lower:]'
hey world
user@host $ echo Hey World | tr '[:upper:]' '[:lower:]' | tr ' ' '9'
hey9world
As a GIF:
How to redirect into a file:
user@host:~$ cat newfile.txt
cat: newfile.txt: No such file or directory
user@host:~$ echo "Here is some text" > newfile.txt
user@host:~$ cat newfile.txt
Here is some text
user@host:~$ echo "and some more text" >> newfile.txt
user@host:~$ cat newfile.txt
Here is some text
and some more text
user@host:~$ echo 'Oops just new text now' > newfile.txt
user@host:~$ cat newfile.txt
Oops just new text now
user@host:~$
As a GIF:
And of course, there's nothing stopping us from piping and redirecting
echo "Pipes and arrows" | wc > words_counted.txt
By default, curl downloads a webpage and sends it to standard output, i.e. which, by default, is your computer monitor:
curl www.example.com
As with any other kind of standard output, this can be redirected to a file:
curl www.example.com > myexample.html
Or piped directly into another utility. Here, grep is reading from standard input, which the output of curl is piping into:
curl www.example.com | grep 'Example'
We can of course take the standard output from grep and redirect it into a new file:
curl www.example.com | grep 'Example' > grep_example.txt
So what does not using standard input/output look like?
With the -o
option, we can specify a filename for curl to save to.
curl www.example.com -o myexample.html
The grep program, when not reading from standard input, can take a filename as an argument; grep will open the file itself and process the data:
grep 'Example' my_example.html
These are simple examples, but I just wanted to get the syntax and functionality grounded in. The rest of this guide covers a little more jargon and syntax. But if you can just accept the ability to directly send the output of one program into the other, then you'll understand why thinking in the "Unix way" – powerful, single-purpose tools – is a very powerful and elegant system.
One of the most significant consequences of pipes in Unix is that Unix programs, whenever possible, are designed to read from standard input (stdin) and print to standard output (stdout).
These jargony terms refer to streams of data, standardized as plain text. When McIlroy refers to text streams as "universal interface", he means that when programmers think in terms of text, they have to worry much less about how to get programs, or devices, to work together, making it much easier to build complex data software with tools as "basic" as cat
, grep
, and sort
:
From the Linux Information Project:
The introduction of standardized streams represented a major breakthrough in the computer field when it was incorporated into the original UNIX operating system more than three and a half decades ago, because it eliminated the very complex and tedious requirement of having to adjust the output of each program according to the specific device or program to which it was being sent.
At this point, we ourselves may not have written anything that we consider "software" or "complex". But even if that were the case, knowing that plain text streams are the default (and often best) way for programs to talk to each other will be key in understanding how to create complex, useful software.
So many of the programs and utilities in Unix-land read from stdin and print to stdout that it's helpful to define stdin and stdout through examples of programs that don't use it. In other words, these programs were not meant to pipe the results of their actions straight into another program, or onto your display.
The directory-creating mkdir
program is an obvious example:
user@host:~$ mkdir one_dir two_dir three_dir
When mkdir
executes, it simply creates three directories. There's nothing for it to output, except error messages. So it doesn't make sense for it to have something to send along to another program. Similarly, mkdir
isn't intended to read a bunch of text output from another program and create directories (though there are certainly ways to do that).
The program unzip is another such example. It'd be nice to curl down a zip file and pass it straight to unzip, which would bypass the need to save the zip file:
user@host:~$ curl http://example.com/some.zip | unzip
But that won't work. By default, unzip
does send to stdout in the list of files it unzipped:
user@host:~$ unzip wh-listings.zip
Archive: wh-listings.zip
inflating: 0.html
inflating: 1.html
inflating: 10.html
inflating: 100.html
However, with the -p
option, unzip
will send the contents of the files into standard out.
If you've been using curl, you know that it will download a file and dump it into your screen. A similar tool, wget, will by default save the results of the download into a file:
user@host:~$ wget http://www.example.com
--2015-01-19 14:10:15-- http://www.example.com/
Resolving www.example.com (www.example.com)... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to www.example.com (www.example.com)|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Saving to: 'index.html'
100%[===================>] 1,270 --.-K/s in 0s
2015-01-19 14:10:15 (118 MB/s) - 'index.html' saved [1270/1270
# check out the saved file
user@host:~$ ls
index.html
Why does wget provide the convenience of saving to file by default? Because wget was designed for more than just downloading single URLs, but for spidering/mirroring entire sites. You may recall the recent data-gathering exploits of a security geek named Edward Snowden; wget was his tool of choice.
Note: wget is a great tool, but there's a reason I haven't shown many examples of using it: few of our tasks require mirroring a whole site. And if you do mirror a site, you still need to scrape it for data anyway. The use of curl better reflects our focused intentions in collecting data. If you do use wget, be aware that you may accidentally copy much, much more than you intended.
By default, the device that stdin reads from is your keyboard.
For example, the mail
program, by default, will prompt the user to fill in the Subject and Cc: options, and then take input from the user's keyboard to fill the body of an email message, including new lines, until the user presses Ctrl-D:
dun@corn30:~$ mail dun@stanford.edu
Subject: Hey there
Hey dan,
Just wanted to see how Standard Input works
Sincerely,
Dan
Cc:
It's easier to see this interaction via animated GIF (note that I mistakenly pressed Ctrl-X instead of Ctrl-D in the recording):
Use the left-angle-bracket, <
, followed by a filename, to open that file and pass its contents into stdin:
user@host:~$ wc < words.txt
Note that many utilities, such as wc (word count), have been designed to open a filename that is passed in as an argument, so that using <
is optional:
user@host~$ wc words.txt
The left-angle-bracket is often seen with while loops, in which a filename is passed into the read program, and the do
/done
block is executed for every line in the file.
while read data_line
do
echo "Line: $data_line"
done < file.txt
Because of the way Bash (mis)handles lines that have a space in their name, using a read-while loop, as above, is considered more reliable and preferable to this:
user@host:~$ for data_line in $(cat file.txt)
> do
> echo "Line: $data_line"
> done
By default, the device that stdout is sent to is your display monitor. For example, when you use the curl downloading tool to download a file (such as a webpage), it will, by default, send the contents of that downloaded file to stdout – i.e. your screen:
user@host:~$ curl http://www.example.com
<!doctype html>
<html>
<head>
<title>Example Domain</title>
<meta charset="utf-8" />
<meta http-equiv="Content-type" content="text/html; charset=utf-8" />
<meta name="viewport" content="width=device-width, initial-scale=1" />
<style type="text/css">
body
# and so forth...
Here's a GIF of that result – as you can see, it's typically not desirable to output the contents of a large file onto your display monitor, as there is too much text to read at once:
With a pipe, I can send the output of curl into a program that filters the data: head, to show just the first few lines, or tail, to show the last few lines (I use the -s
option for curl to silence the progress indicator):
user@host:~$ curl -s http://www.example.com | head -n 4
<!doctype html>
<html>
<head>
<title>Example Domain</title>
user@host:~$ curl -s http://www.example.com | tail -n 4
<p><a href="http://www.iana.org/domains/example">More information...</a></p>
</div>
</body>
</html>
user@host:~$
As a GIF:
Remember the mail
program which prompted me to fill in the subject, Cc, and body field for an email?
Let's see what happens when I redirect echo directly into mail:
user@host:~$ echo this is stdout | mail dun@stanford.edu
And that's it! mail
doesn't even bother to ask me to fill in a subject or other optional fields. Check out the GIF recording:
When you want to create a file from standard output, use the right-angle bracket, >
, to redirect output into a new file:
user@host:~$ grep hey file.txt > heys.txt
As we've seen before, curl has the -o
option to specify a filename to redirect its output to. However, since curl obeys the practice of using stdout, we could just use the redirection operator:
user@host:~$ curl www.example.com > example.html
Warning: By default, the file at the pointy-end of the redirection operator gets destroyed if it already exists.
The use of double-right-angle-brackets, >>
, will also redirect stdout, but will either create a new file, or, if the file exists, append to it.
user@host:~$ for num in $(seq 1 100); do
> echo num >> numbers.txt
> done
Warning: A common source of misery is using >
when you meant to use >>
command < input-file > output-file
TLDP.org has a comprehensive list of the many ways stdout and stdin (and stderr, standard error) can be redirected. For the purposes of this course, I try to keep things pretty simple.
Many examples on this site are feature a "useless use of cat":
user@host:~$ cat words.txt | head -n 10
This is considered a "useless" use of cat
because we should use the stdin-redirection operator:
user@host:~$ head -n 10 < words.txt
Or we could just use head
and pass words.txt
as an argument:
user@host:~$ head -n 10 words.txt
However, I don't mind the occasional useless use of cat
when it reinforces the concept of stdout/stdin and text streams passing from one program to another, even if cat
being a bit useless. Also, it can be easier to conceptualize the stream moving left to right.
Wikipedia has more examples of useless cat
s
Note: When cat
is doing its job, i.e. operating on multiple files, then it will have a difference. Compare the output from using cat
here:
user@host:~$ cat *.txt | grep 'x'
versus:
user@host:~$ grep 'x' *.txt
If you like videos, Software Carpentry has a nice tutorial on pipes and filters: