Week 7
Class notes
Presidents Day
The agenda for Feb. 18, 2015:
With time shortened, I've decided to make a few alterations to the planned schedule.
- Will resume looking at machine-learning/data concepts next week, such as natural language processing with Python.
- The instructions for "Better Know a Former Congressmember with Grep" have yet to be finished though I will be significantly simplifying the assignment (including recording a screencast of one of the sections), because data-cleaning is such a pain-in-the-ass on its own and not within the full-scope of this class. So it has been reduced in value of points and amount of work, though I'll probably save it for another course.
- To make up for this, I've created a new assignment: Collecting Dallas Officer-Involved Shootings. The data-domain is a little more straightforward, though it requires you to practice your API-fetching skills and will make you think about data-publishing for a bit.
- Keep thinking about your final project, the difficulty and scope of which can be similar to the Dallas-OIS homework.
I haven't begun to cover the scope of all the command-line tools that might be helpful to you for a final project. But here's a few:
- tesseract - extract and process text from an image (i.e. scanned documents)
- pdftotext - we've used this before, and this is necessary when data/information is published inside a PDF
- avconv - a huge number of tools and methods for working with videos, including combining, splitting, trimming, gififying, colorizing, etc.
- convert - a huge number of tools to work with images. If you split up a video into image files via
avconv
, then you can feed them into convert
. The montage
program I demonstrated is from the same family of tools.
- Check out the documentation for the t Twitter tool to see the different kinds of data you can easily collect.