This page serves as the overview for the 4-part homework project on using computational techniques to automate the collection, parsing, and filtering of data related to lobbying activity and the United States Congress.
(This overview and the assignments are still under construction. Expect a due date of early March)
The title of this project is inspired by The Colbert Report's 435-part series, Better Know a District.
This mini-project is meant to be both a review of programming concepts and an example of how simple (but brute-force-powered) data-filtering techniques can (and should) be applied to interesting information problems, which is basically the theme of this entire course.
The abstract goal of this project: match a list of names to another list (e.g. lobbying activity) that contains names and see if the matches contain anything interesting.
But what is "interesting"? That word means entirely different things, depending on whether you are a government official, journalist, public advocate, academic, or hedge-fund analyst.
Off the top of my head, for the scope of this assignment, a shortlist of possibly interesting things:
So what is interesting is not a computational problem. But gathering the information and filtering it is most definitely a problem for a computer to solve for us. Thanks to the hard work and advocacy of groups such as the Sunlight Foundation and the Center for Responsive Politics, as well as the many policymakers and journalists who effected change in response to scandal and controversy, we have datasets that can be relatively easy for the computer to compile, leaving us to deal with the interesting work of finding something interesting in the data.
While the programming needed to make a computer collect and filter the data is relatively trivial (you might be able to do all of it under 50 lines, in Bash), the collection and filtering of data is not trivial. As you'll soon see, one: there's a lot of data, and two: the origin and purpose of each dataset present various challenges when trying to join them for cross-referencing and analysis.
In other words, even though this data is public and easily accessible, it is not easily usable. In this project, we'll see how much we can make it usable.
While this is thematically just one project, I've broken up the data-collection parts into their own assignments, as they do deal with different data domains and challenges. And also to keep you from trying to cram this entire assignment in the night before (TBA, but probably early March).
Each part of this project will have its own page (TBA). For now, the tasks are divided into:
Be sure to check out the web interface for the Senate lobbying database, as well as the information and guidance on the Lobbying Disclosure Act
The end result of steps 1 through 3 is to create data files that can be used in Step 4. The data files will basically be an arrangement of fields common to all the datasets, and fields particular to each dataset but useful to keep track of:
last_name | first_name | date | description | another_description | something_else |
---|---|---|---|---|---|
For example, from the Senate expenditure reports, the parsed data fields might look like this:
DOE | JOHN N | 2012 | SECRETARY OF THE SENATE - LEGISLATIVE SERVICES Funding Year 2012 SALARIES, OFFICERS AND EMPLOYEES, SENATE | REPORTER OF DEBATES | 50,100.25 |
---|---|---|---|---|---|
Virtually every technique required to collect and filter the data, we've used in past assignments, including:
In particular, this project most resembles the challenges of the death rows assignment, in which you're gathering related data from different sources (and formats) and reconciling them.
One of my favorite segments from The Colbert Report is Better Know a District, in which he does his part to introduce America to our many representatives, letting us know their accomplishments and other important issues, such as their grasp of the Ten Commandments and whether cocaine is fun.
Despite the inherent risk to legislators, the "434-part" series managed to air more than 80 segments. But it just underscores how little we know of each of our sitting legislators. And also, how many of them are there.
And we even know less of past Congressmembers. From 2005 to 2014, when "Better Know a District" first aired, about 350 House members – and 80 Senators – have left Congress. Some of them are retired and others found work to do. But there's no LinkedIn for former members of Congress. Which isn't surprising, since they aren't being paid by taxpayers to act on our behalf, and so there is less interest in how former Congressmembers choose to spend their time.
However, what if they spend their time on things that impact the American public in interesting ways? And what if that work is based off of, or helped by, the work they did while under the taxpayers' employ? Well, then this becomes an interesting data problem.
More notes to come…
OpenSecrets Revolving Door Project by the Center for Responsive Politics - some of the results of our homework project will resemble, on a smaller scale, the OpenSecrets project that tracks where Congressmembers and other public servants end up on K Street. It is an excellently-researched and presented project, so look at it as an example of what you can do with the data and the analysis.
Take the Money and Run for Office, This American Life, March 30, 2012
How revolving door lobbyists are taking over K Street, by Lee Drutman and Alexander Furnas, The Sunlight Foundation, Jan. 22, 2014
The Trouble With That Revolving Door, by Thomas B. Edsall for the New York Times, Dec. 18, 2011.
A Revolving Door Where Lobbying Rules Don't Apply, by Dan Morgan for The Washington Post, July 21, 1997
Members of Congress trade in companies while making laws that affect those same firms - By Dan Keating, David S. Fallis, Kimberly Kindy and Scott Higham. This is an article that deals with a non-lobbying angle, but the premise is the same: look at two different Congress-related datasets and find something interesting.
All cooled off: As Congress convenes, former colleagues will soon be calling from K Street, by Sunlight Foundation and Center for Responsive Politics
Rep. Gutierrez pays Chicago lobbyist with tax dollars by USA Today
Rep. Luis Gutierrez cuts ties with Chicago lobbyist by USA Today
Study shows revolving door of employment between Congress, lobbying firms, by T.W. Farnam, Washington Post, Sept. 13, 2011
Lobbyists call bluff on 'Daschle exemption', by Chris Frates, Politico
Registered lobbyists are mostly compliant - but what about the unregistered ones?, by Sunlight Foundation
Law Doesn’t End Revolving Door on Capitol Hill, by Eric Lipton and Ben Protess
2013 GAO Report: Observations on Lobbyists' Compliance with Disclosure Requirements, by U.S. Government Accountability Office
This is probably what the project folder structure will look like:
compciv
|___homework
|__congress-lobbying/
|___expenditures/
|___helper.sh
|___parser.sh
|___parsed_house_expenditures.psv
|___parsed_senate_expenditures.psv
|___data-hold
|____senate
|____house
|___post_employment/
|___helper.sh
|___parser.sh
|___parsed_historical_congress_legislators.psv
|___parsed_post_senate_employment.psv
|___parsed_post_house_employment.psv
|___data-hold
|____senate
|____house
|___public_filings/
|___helper.sh
|___parser.sh
|___parsed_lobbying_filings.psv
|___data-hold/
More notes to come…