More analysis of trends in American baby-naming

More practice with text filters to find interesting trends in the SSA baby name data.

Due: Tuesday, January 27
Points: 5 (Extra Credit)

Because we’re relatively familiar with the U.S. Social Security Administration’s data on most popular baby names, this is more practice with Unix text utilities and shell scripting to quickly find answers to trivial questions about how Americans name their babies.


  • A project folder named homework/ssa-baby-name-fun

    In your compciv repo, create a subdirectory (from the command-line) named: homework/ssa-baby-name-fun

  • to download the SSA baby data

    The script, when executed, should download and unpack both the zip files for SSA baby names nationwide and by state.

  • - find names that went out of style

    Given two years, x and y, the script should return an alphabetized list of names, by gender, that appear in the nationwide data for year x but not in y.

    Sample usage:

          bash 1880 2013

    Sample output:

  • - Find names unique to a state

    Given a state abbreviation and a year, the script would return a list of names, by gender, that appear only in the given state for that year. The output list should include: the gender, the name, and the number of babies with that name for that state and year.

    Sample usage:

        bash IA 2013

    Sample output:

  • Hints


    mkdir -p ./data-hold/{names-by-state,names-nationwide} 
    cd data-hold/names-by-state && curl -O && unzip -o && rm && cd ../..
    cd data-hold/names-nationwide && curl -O && unzip -o && rm && cd ../..
    cd data-hold/names-nationwide
    sort "yob$1.txt" | cut -d ',' -f 1,2 > y1.txt
    sort "yob$2.txt" | cut -d ',' -f 1,2 > y2.txt
    grep -Fvf y2.txt y1.txt 
    rm y1.txt
    rm y2.txt
    cd ..
    cd ..

    # grabbing just the filenames of all the states except $state
    fnames=$(ls ./data-hold/names-by-state/*.TXT | grep -v $state )
    # Now compiling  all the names in all the OTHER states and getting just the 
    # unique combinations of name and gender, that were found in $year
    cat $fnames | grep ",$year," | cut -d ',' -f 2,4 | sort | uniq > data-hold/tmp.txt
    # get all the names in this $state
    # find only the rows that belong to that year
    # then do a grep -v -f of the combined names found in the previous step
    cat "./data-hold/names-by-state/$state.TXT" | \
        grep ",$year," | \
        cut -d ',' -f 2,4,5 | \
        grep -vF -f data-hold/tmp.txt

    sort "data-hold/names-nationwide/yob$1.txt" | cut -d ',' -f 1,2 > "data-hold/lost-$1.txt"
    sort "data-hold/names-nationwide/yob$2.txt" | cut -d ',' -f 1,2 > "data-hold/lost-$2.txt"
    grep -Fvf "data-hold/lost-$2.txt"