Basics of grep

The fastest way to search text from the command-line

The grep tool is more than 40-years old and is ubiquitous (with some variations) across Unix systems. Its full name, global regular expression print, obscures its simple yet powerful purpose: to "search a file for a pattern"

Basic usage

The most simple invocation involves two arguments: the pattern and the target file. The following:

grep hello somefile.txt

– will print all lines that have the word "hello" in them.

Like other Unix tools, grep will accept shell expansions. For example:

grep hello *.txt

– will return all lines containing "hello" from all files (in the current directory) with a .txt extension.

Note: When grep is called on more than one file, as in the above case, the output will also prepend the name of the file in which the match was found:

a.txt:I say hello
a.txt:you say hello
b.txt:we all say hello

Reading from standard input

And like most Unix tools, grep will read data that is piped in from another command-line tool. For example, perhaps you want to filter a file through two grep calls. The following will return all the lines from .txt files that have hello and world in them:

grep hello *.txt | grep world

Common options

For the following example, let's imagine a file named ham.txt with these lines:

To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
The slings and arrows of outrageous fortune,
Or to take arms against a sea of troubles,
And by opposing end them? To die: to sleep;
No more; and by a sleep to say we end
The heart-ache and the thousand natural shocks
That flesh is heir to, 'tis a consummation
Devoutly to be wish'd. To die, to sleep;
To sleep: perchance to dream: ay, there's the rub;
For in that sleep of death what dreams may come

Case insensitivity

The -i option will match words regardless of capitalization:

grep "and" ham.txt

Output: ~~~ The slings and arrows of outrageous fortune, No more; and by a sleep to say we end The heart-ache and the thousand natural shocks ~~~

Now with -i:

grep -i "and" ham.txt

Output: ~~~ The slings and arrows of outrageous fortune, And by opposing end them? To die: to sleep; No more; and by a sleep to say we end The heart-ache and the thousand natural shocks ~~~

Matching a list of expressions

If you have a separate file of text patterns, the -f option lets you specify that file. The grep will consider each line in that file as a pattern to match against the target file.

So if words.txt looks like this:

opposing
thousand

grep -f words.txt ham.txt

Output: ~~~ And by opposing end them? To die: to sleep; The heart-ache and the thousand natural shocks ~~~

Matching the inverse

Adding the -v flag will return all non-matches. The following would return all lines that did not have the letter 'e' in them:

grep e -v ham.txt
# This would also work:
grep -v e ham.txt

Output:

Whether 'tis nobler in the mind to suffer
And by opposing end them? To die: to sleep;
Devoutly to be wish'd. To die, to sleep;

Displaying just the match

By default, grep displays the entire line in which a match is made:

grep 'the' ham.txt

To be, or not to be: that is the question:
Whether 'tis nobler in the mind to suffer
And by opposing end them? To die: to sleep;
The heart-ache and the thousand natural shocks
To sleep: perchance to dream: ay, there's the rub;

However, If you want to see only the match, use the o flag:

grep -o 'the' ham.txt

Output:

the
the
the
the
the
the
the

Note in the output, each match of the is shown, whether it is in standalone the or in Whether. Obviously, this isn't very helpful by itself. Which is why we combine it with a regular expression, as seen below:

Extended regular expressions

The topic of regular expressions is worth a lesson on its own. Think of them as pattern-matching-on-steroids. When doing extensive searches, you rarely are looking for exact words. Instead, you'll find yourself wanting to look for certain patterns, such as:

All 6-letter words
Words that begin with 't' and end with 'e'
Lines that have no more than 3 words

Regular expressions is a "mini-language" that lets you express such custom matching. Regular-Expressions.info is a pretty good (and comprehensive) place to start. But we'll cover them in another tutorial.

By using the -E option and then a text-string, grep will act on any regular expression syntax in that text-string.

For example, to find all words that are either the, or have the in them, use -E to specify the pattern, combined with -o to show just the match:

grep -oE '\w*the\w*' ham.txt

the
Whether
the
them
the
there
the

Again, the regular expression syntax is its own lesson, but \w*the\w* can be translated into: "Find the text matching the word 'the' with any number of alphanumerical characters before or after 'the'"

To find all words that begin with the letter "s", either upper or lowercase, we use the -i flag and the following regular expression (the \b stands for "word boundary", i.e. the beginning of a word):

grep -oiE '\bs\w+' ham.txt

suffer
slings
sea
sleep
sleep
say
shocks
sleep
sleep
sleep

More resources

The Linux Information Project has a great primer on grep.
Software Carpentry covers grep in its "Finding Things" tutorial
The Geek Stuff's 15 Practical Grep Command Examples In Linux / UNIX
A little history on the 40-year-life of grep.