A winter elective on programming and journalism for the Stanford Computational Journalism Lab
Monday and Wednesday, 2:15 to 3:45PM
Building 200, Room 303 [CourseExplorer link]
Instructor: Dan Nguyen | @dancow | dun @ stanford
Office hours: Tuesday and Thursdays, 2 to 4PM. Or by appointment.
Piazza: The class discussion board is hosted on Piazza. Feel free to ask questions and collaborate on there.
Agenda for March 8, 2015 - More incomplete notes.
Agenda for March 1, 2015 - Intro to machine learning and Bayesian fun.
Agenda for Feb. 25, 2015
Agenda for Feb. 16, 2015
|Collecting Dallas Officer-Involved Shootings
Collect and parse the Dallas Police Department's officer-involved shooting data and make an interactive map.
|Wednesday, March 11||20|
Write a program to auto-detect broken links
|Tuesday, March 10||5|
|The Celebrity (Tw)It List
Finding out who the most-followed users follow on Twitter.
|Tuesday, March 10||5|
|Draft proposal of a final project
Use your computational methods to solve a computational problem of your own choosing.
|Tuesday, February 24||1|
|Build face-grep in Python
Taking the Unix philosophy to Python and computer vision object-detection algorithms.
|Friday, February 20||5|
|Listing the BuzzFeed listicles
Practicing web-scraping and regexes on BuzzFeed listicle titles
|Tuesday, February 17||5|
|Analyzing Tweets in CSV form
Connect to the Twitter API, download a user's tweets as CSV, and count frequency of hashtags and words.
|Friday, February 13||5|
|Firsts in American baby-naming
Even more practice with text filters, this time to find when baby names first became known.
|Tuesday, February 10||3|
|Collecting and analyzing job listings from the USAJobs.gov API
Ask what you can do for your country, and what your country can pay you.
|Friday, February 6||10|
|Using baby names to classify names by gender
Use the SSA baby name data to make a naive filter for guessing the gender of a name.
|Tuesday, February 3||5|
|Death Row rows parsing
Collect and aggregate data from three different states' death row listings.
|Friday, January 30||10|
|Basic if-else practice
Practice the logic of if-elif-else conditional branching
|Friday, January 30||5|
|Exploring Congressional Twitter data as JSON
Basic JSON parsing exercise using what Congress tweets.
|Tuesday, January 27||5|
|More analysis of trends in American baby-naming
More practice with text filters to find interesting trends in the SSA baby name data.
|Tuesday, January 27||5|
|Parsing the White House Press Briefings as HTML
Data analysis of all the words used in the White House press briefings
|Thursday, January 22||10|
|Managing baby names and data projects with Github
A sampler project that demonstrates how your code and data should be organized for minimal head-smashing.
|Friday, January 16||5|
|Basic word analysis of the White House Press Briefings
After collecting the list of WH Briefings, it's time to get each briefing.
|Friday, January 16||10|
|Collecting the White House Press Briefings
The first step in analyzing web data is to just collect the webpages.
|Wednesday, January 14||10|
Setting up our programming toolbox and environment.
|Wednesday, January 7||10|
Computational Methods in the Civic Sphere (COMM 113/213) examines why some information problems are computational – and why others are not – in the context of journalistic enterprise and its wide variety of information problems: research, data collection, data cleaning, statistical analysis, information design, information retrieval, verification, publication, and mass distribution.
We will study real-world problems in journalism and data science, and we will also attempt to solve them. We will study programming, because many of these problems can be substantially solved through programming. But we will also learn why not every problem can or should be fitted to a mechanical algorithm.
Students who successfully complete this class will inevitably learn a wide array of tools and techniques. But gaining a useful skillset is only a coincidental outcome. Our main goal is to learn how to think, and to understand how a computer can complement, but not replace our ability to make decisions.
|Attendance and pop quizzes||20%|
|Extra Credit||10 to 20%|
This class heavily emphasizes problem solving. Homework assignments are often structured as mini-projects, but with definite right and wrong answers.
Extra credit projects will be generated on a regular basis.
Much of this classwork is based on the concept of flexibility and abstraction, including the ability to confidently write programs that run on their own, independently of our interaction, and on a variety of machines, from our personal laptops to cloud servers. And virtually every topic I cover in lecture will be posted online.
The obvious question you should have is: why even show up for class?
For camaraderie, perhaps, such as relief from spending too much time in front of an electronic screen. But also, to discuss the concepts and to bounce ideas and get feedback for new projects, or different avenues of exploration. This is easier done in a group, face-to-face, so be prepared to show up as if this were any traditional lecture.
The required textbook will be Data Science at the Command Line, published in 2014 by Jeroen Janssens in 2014. It works both as a handy technical reference and a book full of interesting data science explorations.
These are a few journalism-related projects that I have in mind when I think about the use of computational problem solving. I'm hoping that students will not only be able to understand why these projects were conceived and how they work, but also implement them in part.
(Note: This is not-at-all a complete list of worthwhile journalism projects, but a partial list of projects that can more easily dissected and studied.)
If you are taking this class, or are just following along, here are some setup steps for our work environment: