General purpose command-line tools

This list contains the set of tools used throughout this course. This listing is not meant to be comprehensive, but to serve as a refresher of what tools exist, and then it's up to you to look up the documentation and to think of ways to combine the tools for your specific situation.

basename Extract just the filename from a filepath

bc A calculator that reads from standard input

cat Concatenates files together

cd Change directory

cp Copy files

csvfix Parse CSV files

curl Transfer a URL

cut Cut out selected portions of lines

date Print or parse date strings

echo Print arguments to standard output

grep Print lines matching a pattern

head Print only the first few lines of a text stream

history Show the last executed commands

hostname Print the name of the computer you're currently on

iconv Converts between character sets

jq A command-line JSON parser

kill Send a signal to a running process

less Paginate long text streams

ls List directory contents

man Show documentation for a command

mkdir Make a directory

mv Move or rename files

nano Interactive text editor

printf Format and print data

ps Show a snapshot of current processes

pup Parse HTML from the command line

pwd Print the name of your working directory

read Read a line from standard input

rm Remove files

sed Stream editor for complex transformation of text

seq Print a sequence of numbers

sleep Suspend execution for a period of time

sort Sort lines of text

tail Print only the last lines of a text stream

touch Create an empty file or update its timestamp

tr Translate characters in a text stream

uniq Print only unique lines of text

unzip Extract files from a zip archive

wc Print the line, word, and byte counts of a text stream

wget Easy web crawling

whoami Print your username

zip Add files to a compressed archive

basename - Extract just the filename from a filepath

Standard usage

basename ./hello/there/cat.jpg

Output:

cat.jpg

Get a filename and remove its suffix with -s

basename -s '.jpg' ./hello/there/cat.jpg

Output:

cat

It works on URLs too

url="http://www.compciv.org/files/images/topics/scraping/http-cats.jpg"
fname=$(basename $url)
curl $url > $fname

bc - A calculator that reads from standard input

Standard usage

echo '100 / 3' | bc

Output:

Use the -l, --mathlib option to get floating point results

echo '100 / 3' | bc -l

Output:

33.333333333333336

cat - Concatenates files together

Adding two (or more) files together

cat file1.txt file2.txt

Output:

line from file 1
line from file 1
line from file 2

Unnecessary (but fine, if it helps you to read pipeline from left to right) use of cat

cat onefile.txt | grep 'hi'

Add a Heredoc-style string into a file

Heredocs are helpful for working multi-line complex strings, such as raw HTML.

cat > basic.html<<'EOF'
  <html>
  <head>
    <title>My first "Web Page"</title>
  </head>
  <body>
    <h1>A headline</h1>
    <p>Check out the
       <a href="http://www.nytimes.com">New York Times</a>
    </p>
  </body>
</html>
EOF

cd - Change directory

Change into a directory

cd some/path

Change into home directory

cd ~

Change to parent directory

cd ..

Change into the system’s root

cd /

Change into the system’s /tmp directory

cd /tmp

cp - Copy files

Standard usage

cp source_file.txt new_file.txt

Force copy: overwrite files without prompting

cp -f source_file.txt existing_file.txt

Make a copy of a directory with the -r option

cp -r some_dir/ new_dir/

Copy something into your home directory

cp something.txt ~

Copy all files with a .txt extension into a sub-directory

cp *.txt some_dir

csvfix - Parse CSV files

This utility provides the ability to parse text files in which the values/columns are delimited by commas, or a delimiter of your choice. Because of the possibility that CSV files contain multi-line data (and, oh, the lack of a standard that will foil even the most skilled greppers), it is recommended that you use CSVFix when dealing with delimited-text data.

The list of subcommands is long; if you need to do something specific, check the CSVFix docs and you’ll probably find what you need.

To install on corn.stanford.edu, after having set your PATH to include ~/compciv_bin:

    wget https://bitbucket.org/neilb/csvfix/get/version-1.6.zip
    unzip version-1.6.zip && rm version-1.6.zip
    cd neilb-csvfix-e804a794d175
    make lin
    cp ./csvfix/bin/csvfix ~/bin_compciv/

For the purpose of some of the examples, example.csv contains the following:

      Name,Quantity,Cost
      Apple,35,2.00
      Orange,67,1.95
      Durian,9,12.00

Use the echo subcommand to print the CSV in a standard format to stdout

csvfix echo example.csv

Output:

"Name","Quantity","Cost"
"Apple","35","2.00"
"Orange","67","1.95"
"Durian","9","12.00"

Use the -osep operator to change the delimiter of CSV data when printing to stdout

csvfix echo -osep '|' example.csv

Output:

"Name"|"Quantity"|"Cost"
"Apple"|"35"|"2.00"
"Orange"|"67"|"1.95"
"Durian"|"9"|"12.00"

Select and rearrange order of the columns with the order subcommand

csvfix order -n 3,2,1 example.csv

Output:

"Cost","Quantity","Name"
"2.00","35","Apple"
"1.95","67","Orange"
"12.00","9","Durian"

Select, rearrange order by column name with order -fn

csvfix order -fn Cost,Name example.csv

Output:

"Cost","Name"
"2.00","Apple"
"1.95","Orange"
"12.00","Durian"

Sort the data by a column with the sort subcommand and using the -rh option to include the header

csvfix sort -rh -f 1 example.csv

Output:

Name,Quantity,Cost
"Apple","35","2.00"
"Durian","9","12.00"
"Orange","67","1.95"

Force csvfix to only double-quote fields when necessary with -smq option

csvfix -smq order -f 3,2,1 example.csv

Output:

Cost,Quantity,Name
2.00,35,Apple
1.95,67,Orange
12.00,9,Durian

Force csvfix to use a specific delimiter with -osep for the output

csvfix -osep '@' order -f 3,2,1 example.csv

Output:

"Cost"@"Quantity"@"Name"
"2.00"@"35"@"Apple"
"1.95"@"67"@"Orange"
"12.00"@"9"@"Durian"

Sort the 3rd column, in descending numerical order

csvfix sort -rh -f 3:DN example.csv

Output:

Name,Quantity,Cost
"Durian","9","12.00"
"Apple","35","2.00"
"Orange","67","1.95"

Use printf to customize the output of the field values

csvfix printf -fmt "There are %s %s %f" example.csv

Output:

There are Name Quantity 0.000000
There are Apple 35 2.000000
There are Orange 67 1.950000
There are Durian 9 12.000000

Switch up the order of columns for printf with -f option

csvfix printf -f 2,1,3 -fmt "There are %s %ss and they cost %f each" example.csv

Output:

There are Quantity Names and they cost 0.000000 each
There are 35 Apples and they cost 2.000000 each
There are 67 Oranges and they cost 1.950000 each
There are 9 Durians and they cost 12.000000 each

Use the ifn option to remove the header from the output

csvfix printf -ifn -f 2,1,3 -fmt "There are %s %ss and they cost %f each" example.csv

Output:

There are 35 Apples and they cost 2.000000 each
There are 67 Oranges and they cost 1.950000 each
There are 9 Durians and they cost 12.000000 each

curl - Transfer a URL

This nearly-ubiquitous tool makes it possible to interact with Web sites and APIs. Check out its manual for its many options.

Download and print to standard output

curl http://www.example.com

Download and save to specified file name with -o, –output

curl http://www.example.com -o somefile.txt

Suppress status indicator and error messages with -s, --silent

curl http://www.example.com -s

Automatically follow redirects with -L

curl http://t.co/d -L

Fetch only the headers with --head, -I

curl http://t.co/d -I

Download and save to the basename of a URL with -O

This handy option will create a filename using the basename of a URL, i.e. the last segment of the URL path

curl http://www.example.com/stuff.zip -O

cut - Cut out selected portions of lines

Specify a delimiter with -d and which fields to show with -f

echo A,B,C,D,E | cut -d ',' -f 3,4

Output:

C,D

Cut out everything except the nth character with -c [n]

echo 'Hello world' | cut -c 7

Output:

Cut out everything except the range x to y with -c [x-y]

echo 'Hello world' | cut -c 3-7

Output:

ello w

Cut out everything before the nth character -c [n-]

echo 'Hello world' | cut -c 7-

Output:

world

Cut out everything after the nth character -c [-n]

echo 'Hello world' | cut -c -7

Output:

Hello w

date - Print or parse date strings

Standalone usage, display the date now

date

Output:

Sat Jan 24 10:52:28 PST 2015

Display the date by parsing a given string with -d, --date=

date -d '2013-01-03'

Output:

Thu Jan  3 00:00:00 PST 2013

Parse a relatively human-friendly date string

date -d 'Feb 9 1913'

Output:

Sun Feb  9 00:00:00 PST 1913

Format the current date as YYYY-MM-DD

date +%Y-%m-%d

Output:

2015-02-06

Format the output as YYYY-MM-DD

date -d 'May 15, 1974' +%Y-%m-%d

Output:

1974-05-15

Format the output as YYYY-MM-DD HH:MM:SS

date -d 'May 15, 1974 9:32 PM' '+%Y-%m-%d %H:%M:%S'

Output:

1974-05-15 21:32:00

Use -I, --iso-8601 as a shortcut for standard ISO YYYY-MM-DD format

date -d 'Sept 25, 2014 3:52:11 PM' -I

Output:

2014-09-25

Specify precision with -I[precision]

date -d 'Sept 25, 2014 3:52:11 PM' -Iseconds

Output:

2014-09-25T15:52:11-0700

echo - Print arguments to standard output

Print “something” to screen

echo something

Output:

something

Print a variable’s value to stdout

something='fun times'
echo $something

Output:

fun times

Print “something” into a pipe

echo something | tr '[:lower:]' '[:upper:]'

Output:

SOMETHING

Quickie concatenation of strings

a=apples
b=bongos
echo "$a AND $b"

Output:

apples AND bongos

grep - Print lines matching a pattern

This 40-year-old tool is one of the most famous and ubiquitous Unix programs, and perhaps the most commonly-used for searching for text.

Printing matching lines in a file

grep 'hello' file1.txt

Output:

hello world
say hello

Grepping multiple files, showing file names with the match

grep 'hello' file1.txt file2.txt

Output:

file1.txt:hello world
file1.txt:say hello
file2.txt:just a hello

Reading from standard input removes file descriptors

cat file1.txt file2.txt | grep 'hello'

Output:

hello world
say hello
just a hello

Case insensitive search with -i

grep -i 'HELLO' file1.txt

Output:

hello world
say hello

Showing non-matching lines with -v

grep -v 'hello' file1.txt

Output:

bye world
say bye

Using extended regular expressions with -E

grep -E '[0-9]{5}' file1.txt

Output:

Beverly Hills 90210

Printing only the match, not the entire line with -o

echo 'Hello world' | grep -o 'world'

Output:

world

Printing just the match made by a regular expression pattern (5 or more alphanumerical characters)

cat file1.txt | grep -oE '[[:alnum:]]{5,}'

Output:

hello
world
hello
world
Beverly
Hills
90210

Show the x lines before a match with -B x

grep -B 1 'Beverly' file1.txt

Output:

say bye
Beverly Hills 90210

Show the y lines after a match with -A y

grep -A 1 'hello world' file1.txt

Output:

hello world
say hello

Grep for a series of strings that are contained in a file with -f

grep -f things.txt file1.txt

Grep faster when you don’t need regular expressions with -F

grep -F 'word' file.txt

When grepping a list of files (not stdin), use -l to list all files that match the given term at least once.

grep -l 'word' *.txt

When grepping a list of files (not stdin), use -L to list all files that don’t contain the given term

grep -L 'word' *.txt

head - Print only the first few lines of a text stream

Print only the first x lines with -n [x]

cat *.txt | head -n 5

Read from a file instead of standard input

head -n 5 file1.txt

Print all the lines until the 5th-to-last-line, with -n [-x]

head -n -5 file1.txt

history - Show the last executed commands

Standard usage

history

Show past commands that involved `cat’

history | grep cat

Show just the most recent 10 commands

history | tail -n 5

Remove leading line numbers (as long as history is under 99,999 commands)

history | cut -c 8-

hostname - Print the name of the computer you're currently on

Standard usage

hostname

Output:

corn30.stanford.edu

iconv - Converts between character sets

For our purposes, iconv can be used to bypass the issues that arise from dealing with textual-data with unexpected character encodings. For example, emojis.

For more information, read The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)

Attempt a translation of non-ASCII characters to ASCII

This is useful for converting accented characters, such as é and ô to their non-accented equivalents.

echo Béyôncæ | iconv -t ASCII//TRANSLIT

Output:

B'ey^oncae

Ignore all non-ASCII (i.e. standard American-English) characters

This command will sometimes give you an error message. If so, refer to the usage below of iconv

cat somefile.txt | iconv -t ASCII//IGNORE

Force the conversion of UTF-8 characters to ASCII

cat somefile.txt | iconv -c -f utf-8 -t ascii

jq - A command-line JSON parser

This is a tool not part of standard Linux distributions but is extremely handy for working with JSON data.

jq has its own parsing language and methods, both for extracting data and for outputting new data structures.

The jq manual is the most comprehensive reference for how jq works, but you can refer to this basic tutorial for the basic concepts.

Simply parse and pretty-print

echo '{"name": "Dan"}' | jq '.'

Output:

{
  "name": "Dan"
}

Select an object’s attribute

echo '{"name": "Dan"}' | jq '.name'

Output:

"Dan"

Select multiple attributes

echo '{"name": "Dan", "age": 45}' | jq '.name, .age'

Output:

"Dan"
45

Print raw-output with -r, --raw-output

echo '{"name": "Dan", "age": 45}' | jq -r '.name, .age'

Output:

Dan
45

Select an element from an array

echo '["a", "b", "c"]' | jq '.[1]'

Output:

"b"

Select attributes from an array of objects

echo '[{"name": "Dan", "age": 42}, {"name": "Bob", "age": 55}]' |
  jq '.[] | .name'

Output:

"Dan"
"Bob"

kill - Send a signal to a running process

Terminate a process with a given PID of 1234 (use ps aux to find PID)

kill 1234

Terminate all processes that you are allowed to terminate

kill -9 -1

less - Paginate long text streams

Show a text stream one page at a time

cat *.txt | less

ls - List directory contents

Default listing of files

ls

List all files, including hidden files with -a, --all

ls -a

Show a long list with file attributes with -l

ls -l

man - Show documentation for a command

Basic usage

man cat

mkdir - Make a directory

Make a single directory

mkdir my_sub_dir

Make multiple directories

mkdir apples oranges pears

Make a directory and all its parent directories with -p

mkdir -p a/path/to/a/new/subdir

Make a subdirectory inside your home directory

mkdir ~/new_dir

Make a subdirectory inside /tmp

mkdir /tmp/new_dir

mv - Move or rename files

Rename a file

mv old_name.txt new_name.txt

Rename/move a file even if new name exists with -f

mv -f old_name.txt new_name.txt

Ask before overwriting an existing file with -i

mv -i old_name.txt new_name.txt

Move something into your home directory

mv somefile ~

Move all files with a .txt extension into a sub-directory

mv *.txt some_dir

nano - Interactive text editor

Open (or create) a file and enter interactive-editing mode

nano file.txt

printf - Format and print data

The printf command is like echo, just much more powerful and versatile. The Bash Hackers Wiki has a nice page on it.

With printf, you pass in at least two arguments:

A string containing a sort of template for text, with special syntax for placeholders.
A string (or several strings) that are then inserted into the placeholders of the first argument.

There are a bewildering array of syntax placeholders. The examples will try to cover the basics.

Basic usage

By default, printf will not print a newline character at the end, causing the output to butt up against the prompt.

Like this: My name is Danuser@host:~$

printf 'My name is %s' 'Dan'

Print a new line at the end with ‘\n’

The stands for ‘new line’

printf 'My name is %s \n' 'Dan'

Work with multiple arguments

printf 'My name is %s %s. \nI am %s.\n' 'Dan' 'Man' 'happy'

Output:

My name is Dan Man.
I am happy.

Printing out an HTML string

printf '
  <h1>Hello %s</h1>
  <p>
      <a href="%s">%s</a>
  </p> \n' 'Stranger' 'http://www.thestranger.com/' 'A news site'

Output:

<h1>Hello Stranger</h1>
<p>
    <a href="http://www.thestranger.com/">A news site</a>
</p>

Using a Heredoc-style string in a variable

See the example for the read command for more information on heredocs

Note: if you want to preserve the newlines in some_html, you have to double-quote it, i.e. printf "$some_html"

read -r -d '' some_html <<'EOF'
<h1>Hello %s</h1>
<p>Here is a kitten:</p>
<img src="http://placekitten.com/g/%s/%s">
\n
EOF

printf $some_html 'Cat Lover' 500 300

Output:

<h1>Hello Cat Lover</h1>
<p>Here is a kitten:</p>
<img src="http://placekitten.com/g/500/300">

ps - Show a snapshot of current processes

List all processes belonging to the current user and session

ps

Output:

PID TTY          TIME CMD
1434 pts/65   00:00:00 sleep
1532 pts/65   00:00:00 ps
25247 pts/65   00:00:00 bash

List all processes running on the system

ps aux

List all of your processes by filtering for your user ID (this is what you most frequently want to do)

ps aux | grep $(whoami)

pup - Parse HTML from the command line

The pup tool is inspired by the jq JSON-parsing tool, but is used for parsing HTML with HTML/CSS selectors.

curl www.example.com | pup 'a'

Output:

<a href="http://www.iana.org/domains/example">
 More information...
</a>

curl www.example.com | pup 'a attr{href}'

Output:

http://www.iana.org/domains/example

curl www.example.com | pup 'a text{}'

Output:

More information...

pwd - Print the name of your working directory

(When inside your own home directory)

pwd

Output:

/afs/.ir/users/y/o/your_home

read - Read a line from standard input

The read command is often used to handle reading text streams line-by-line – which is not something that some_var=$(cat some.txt) will do by default.

It’s especially helpful in combination with a while loop and for assigning Heredocs, i.e. multi-line strings that are too complex to delimit with quotation marks, to variables.

For the most part, we want to use the -r option, which prevents backslashes from doing their normal thing of escaping characters.

Useful links:

For the examples below, assume example.txt contains:

    README.txt
    42
    Documents and Settings
    index.html
    Dogs and Cats.html

Read each line from a file and pass it into a while loop

while read -r x; do
  echo "Opening...$x"
done < example.txt

Output:

Opening...README.txt
Opening...42
Opening...Documents and Settings
Opening...index.html
Opening...Dogs and Cats.html

Read each line from a command and pipe into a while loop

curl -s http://www.example.com | while read -r some_line; do
  echo "This is a line:  $some_line"
done

Output:

This is a line:  <!doctype html>
This is a line:  <html>
This is a line:  <head>
This is a line:  <title>Example Domain</title>

Read each line from a command and pass it into a while loop, right to left

To read the output from a command, wrap it up between <( and ) (as opposed to $( and ))

while read -r x; do
  echo "Opening...$x"
done < <(cat example.txt | grep 'html')

Output:

Opening...index.html
Opening...Dogs and Cats.html

Save a multi-line Heredoc into a variable and do not interpret special Bash symbols

This will be the most common pattern we follow when creating HTML templates within Bash.

Heredocs make it easy to describe a multi-line string without worrying about whether you’ve used the right number of quote marks.

This particular example is derived from this excellent StackOverflow Q&A.

This example, with the use of 'EOF', prevents things like $ from being interpreted by Bash.

The use of the option -d '' tells read to keep on reading even after the first newline

Basically, see the read -r -d '' as the boilerplate to memorize.

read -r -d '' some_variable <<'EOF'
<html>
  <head>
    <title>My first "Web Page"</title>
  </head>
  <body>
    <h1>A headline</h1>
    <p>Check out the
       <a href="http://www.nytimes.com">New York Times</a>
    </p>
  </body>
</html>
EOF

rm - Remove files

Remove a file

rm somefile.txt

Remove all the files in the current directory

rm *

Remove all the files in the current directory but ask for confirmation with -i

rm -i *

Remove a file and do not ask for confirmation or show errors with -f

rm -f somefile.txt

Remove a file even if it is an empty directory with -d

rm -d somedir

If the given filename is a directory, remove it and everything inside of it with -r

rm -r somedir

Wipe out your computer (i.e. making a typo while doing rm -rf is very bad)

rm -rf /

sed - Stream editor for complex transformation of text

sed is a very powerful program that basically has its own language, and thus has books and websites devoted to it.

For our purposes, we can focus solely on its substitution command (Bruce Barnett describes it as “The esssential command”), which allows us to transform text with far more power than we can with just tr.

Basic substitution using the s subcommand

echo 'hello world' | sed s/hello/bye/

Output:

bye world

Repeat the substitution for every match with the g flag

echo 'hello world bye world' | sed s/world/people/g

Output:

hello people bye people

Make matches based on extended regular expressions with -E option

echo 'Beverly Hills 90210' | sed -E s/[0-9]{3}/q/

Output:

Beverly Hills q10

An example of regex capturing groups and backreferences

"echo 'Beverly Hills 90210' | sed -E 's/([0-9]+)/I love \1 a lot/'"

Output:

Beverly Hills I love 90210 a lot

seq - Print a sequence of numbers

Print numbers 1 to 3

seq 1 5

Output:

sleep - Suspend execution for a period of time

Sleep for 10 seconds

sleep 10

Sleep for 5 days (only in GNU Unix, not OSX)

sleep 5d

sort - Sort lines of text

Sort in ascending alphabetical order

sort lines.txt

Output:

100
9
A
a
b

Sort in reverse order with -r

sort -r lines.txt

Output:

b
a
A
9
100

Sort numbers based on numerical value with -n

sort -n lines.txt

Output:

A
a
b
9
100

Sort lines based on a column q with -k [q] and a delimiter f with -t [f]

sort -k 3 -t ',' lines.csv

Output:

C,D,Y
A,B,Z

tail - Print only the last lines of a text stream

Print only the last x lines with -n [x]

cat *.txt | tail -n 5

Read from a file instead of standard input

tail -n 5 file1.txt

Skip the first line in a file with -n [+2]

tail -n +2 file1.txt

touch - Create an empty file or update its timestamp

Update file’s accessed/modified time, or create it if it doesn’t exist

touch somefile.txt

tr - Translate characters in a text stream

Replace one character for another

echo Hello world | tr 'o' 'a'

Output:

Hella warld

Replace multiple characters

echo Hello world | tr 'lo' 'xo'

Output:

Hexxa warxd

Normalize all whitespace characters (including newlines) to spaces

txt="Hello,
world"
echo $txt | tr '[:space:]' ' '

Output:

Hello, world

Delete a character, such as a space character, with -d

echo Hello world | tr -d ' '

Output:

Helloworld

Translate lower-case characters to upper-case using character classes

echo Hello world | tr [:lower:] [:upper:]

Output:

HELLO WORLD

Remove all punctuation

echo 'Hello, world!' | tr -d '[:punct:]'

Output:

Hello world

uniq - Print only unique lines of text

Print just unique lines, but only if input is sorted

uniq somefile.txt

Output:

oranges
apples
oranges
kiwis
apples

Used in conjunction with sort

sort somefile.txt | uniq

Output:

apples
kiwis
oranges

Print unique values and frequency of occurrence with -c option

sort somefile.txt | uniq -c

Output:

2 apples
1 kiwis
3 oranges

unzip - Extract files from a zip archive

Basic unzipping

unzip some.zip

Use -o option to overwrite existing files without prompting user

unzip -o some.zip

Extract files and pipe their contents to stdout with -p option

unzip -p some.zip

Extract only specific files and pipe their contents into a new file

unzip -p stuff.zip 14.txt 42.txt > file.txt

wc - Print the line, word, and byte counts of a text stream

Print line, word, and character count

wc somefile.txt

Output:

6 8 55 somefile.txt

Print just the line count with -l

wc -l somefile.txt

Output:

6 somefile.txt

Print just the word count with -w

wc -w somefile.txt

Output:

8 somefile.txt

Print just the character count with -c

wc -c somefile.txt

Output:

55 somefile.txt

Count the lines from standard input to avoid showing filename

cat somefile.txt | wc -l

Output:

wget - Easy web crawling

Like curl, wget can be used to download individual files from the Web. However, it contains a suite of features geared towards batch downloads, i.e. web crawling. wget was recently known as being a Low-Cost Tool to Best [the] N.S.A..

And similar to curl, wget has a mountain of documentation worth reading.

Here are some examples from the official docs. I also like The Geek Stuff’s list of wget examples

Download a single file and save to a default filename

Unlike curl, wget does not send downloaded content to stdout by default. Instead, it derives a base filename to save to the current working directory.

For example, wget en.wikipedia.org/wiki/Hello will save to a file named Hello. If the target is a directory (i.e. with a trailing slash, e.g. wget en.wikipedia.org/wiki/), it will save to index.html

Also by default: if the default filename already exists, wget will create a new, numbered variation, e.g. index.html.1

wget www.example.com

Output:

--2015-06-18 05:09:00--  http://www.example.com/
Resolving www.example.com... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to www.example.com|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Saving to: ‘index.html’
100%[======================================>] 1,270       --.-K/s   in 0.002s
2015-06-18 05:09:00 (743 KB/s) - ‘index.html’ saved [1270/1270]

Redirect to stdout

wget -O - www.example.com

Output:

[content of the webpage]

100%[======================================>] 1,270       --.-K/s   in 0.001s
2015-06-13 04:52:45 (1.53 MB/s) - written to stdout [1270/1270]

Download files only if newer than existing files

With this option, wget will set the downloaded file’s timestamp based on the web server’s Last-Modified header. On subsequent downloads using -N, wget will fetch a file only if it is newer than the existing file. Read the full docs at gnu.org: Time-Stamping Usage

wget -N www.example.com

Output:

--2015-06-18 05:06:39--  http://www.example.com/
Resolving www.example.com... 93.184.216.34, 2606:2800:220:1:248:1893:25c8:1946
Connecting to www.example.com|93.184.216.34|:80... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1270 (1.2K) [text/html]
Server file no newer than local file ‘index.html’ -- not retrieving.

Recursively download links

This is where wget starts to get fun – and dangerous. The recursive option will cause wget to download not just the target page, but all URLs linked to from that page. This includes URLs of things like images and stylesheets

By default, it will save all of the files into a directory named after the site domain.

It should go without saying that this can be a massive operation if you aren’t careful.

From the documentation on Recursive Download:

Recursive retrieval of HTTP and HTML/CSS content is breadth-first. This means that Wget first downloads the requested document, then the documents linked from that document, then the documents linked by them, and so on. In other words, Wget first downloads the documents at depth 1, then those at depth 2, and so on

wget -r www.stanford.edu

Output:

[a wall of output showing that every file linked to from the Stanford homepage has been downloaded]

2015-06-18 05:17:43 (4.26 MB/s) - ‘www.stanford.edu/about/history/images/hero-seq.jpg’ saved [520408/520408]

FINISHED --2015-06-18 05:17:43--
Total wall clock time: 12s
Downloaded: 147 files, 9.2M in 4.6s (2.00 MB/s)

Specify the number of layers (i.e. the depth) for a recursive crawl.

By default, a recursive crawl with wget will go 5 layers deep, i.e it will download all the links from the first page. Then it will visit each of those links and download their links, and so on, five layers deep.

Setting this value to 1 will only download URLs linked from in the target page. Setting it to 0 is shorthand for an infinite number of layers to crawl. Be careful.

wget -r -l 1 www.stanford.edu

Output:

[long list of files downloaded]

FINISHED --2015-06-18 05:26:43--
Total wall clock time: 2.2s
Downloaded: 42 files, 1.0M in 0.4s (2.73 MB/s)

Download only files with a specified extension

Use in conjunction with -r. Extremely helpful if a webpage contains links to a set binary files you want to collect, without collecting everything else, such as links to other webpages.

wget -r -A .jpg www.stanford.edu

Output:

[a wall of output for every jpg on the homepage]
2015-06-18 05:20:34 (4.49 MB/s) - ‘www.stanford.edu/about/history/images/hero-seq.jpg’ saved [520408/520408]

FINISHED --2015-06-18 05:20:34--
Total wall clock time: 3.4s
Downloaded: 32 files, 4.7M in 2.1s (2.26 MB/s)

Mirror an entire site

Again, be careful. From the docs:

Turn on options suitable for mirroring. This option turns on recursion and time-stamping, sets infinite recursion depth and keeps FTP directory listings. It is currently equivalent to -r -N -l inf --no-remove-listing.

wget -m www.example.com

Snapshot a single page

This is a variation I use when I just want to preserve a single page and all of its visual elements, similar to how sites like archive.is work.

See a full description of the flags and options in this gist.

wget -E -H -k -K -nd -N -p -P /tmp/wikipedia https://en.wikipedia.org/wiki/Main_Page

Output:

[wall of output of downloaded files]
FINISHED --2015-06-18 05:48:04--
Total wall clock time: 2.7s
Downloaded: 20 files, 154K in 0.3s (563 KB/s)
Converting /tmp/wikipedia/Main_Page.html... 28-310
Converted 1 files in 0.003 seconds.

Mirror a subdirectory

Use of the –no-parent flag prevents going higher than the specified sub directory

wget -m -P -e robots=off --no-parent http://www.example.com/whatsup

whoami - Print your username

Standard usage

whoami

Output:

your_sunet_id

zip - Add files to a compressed archive

Add all the .txt files in current directory to a zip archive

zip alltext.zip *.txt