These notes are in flux. Sorry.
(some more notes on ngrams have been split off to a separate article. More to come)
Machine learning is the science of getting computers to act without being explicitly programmed.
- Introduction to Stanford professor Andrew Ng's Machine Learning, on Coursera
As you get better at programming, you might find yourself actually getting lazier. And this is the natural, nay, the preferred order of things. To quote from the glossary of Larry Wall's, Programming Perl (Second Edition), regarding the virtues of Laziness and Impatience:
Laziness: - the quality that makes you go to great effort to reduce overall energy expenditure. It makes you write labor-saving programs that other people will find useful, and document what you wrote so you don't have to answer so many questions about it. Hence, the first great virtue of a programmer. Also hence, this book. See also impatience and hubris. (p.609)
Impatience - The anger you feel when the computer is being lazy. This makes you write programs that don't just react to your needs, but actually anticipate them. Or at least pretend to. Hence, the second great virtue of a programmer. See also laziness and hubris. (p.608)
So even if you don't yet have the next billion-dollar startup idea, just being able to delegate a menial information task to a computer, whether it be web-scraping or grepping or auto-tweeting the news – this is motivation enough for programming.
At the ground level, we can consider machine learning as just the next level of laziness and impatience. A program usually has to wait on the human to tell it when and how to act. If we could teach the program to act, to know what to do without hand-holding, then we, the humans, are freed from the work of hand-holding.
Sentiment analysis is a pretty good machine learning problem:
The number of possible known unknowns and unknown unknowns in sentiment analysis is what makes it a ripe problem for machine learning. Automated decision-making may be far from perfect, but it's sure better than trying to enumerate all the possible scenarios by hand – and so we're back to the virtues of laziness and impatience.
(for instance, Stanford's NaSent system, which claims a state of the art 85% success rate in classifying sentiment of individual sentences.)
https://www.kaggle.com/c/word2vec-nlp-tutorial/details/part-1-for-beginners-bag-of-words
Bag of words classifiers can work well in longer documents by relying on a few words with strong sentiment like ‘awesome’ or ‘exhilarating.’ However, sentiment accuracies even for binary positive/negative classification for single sentences has not exceeded 80% for several years. For the more difficult multiclass case including a neutral class, accuracy is often below 60% for short messages on Twitter (Wang et al., 2012).
From a linguistic or cognitive standpoint, ignoring word order in the treatment of a semantic task is not plausible, and, as we will show, it cannot accurately classify hard examples of negation. Correctly predicting these hard cases is necessary to further improve performance.
Stanford's NaSent (short for Neural Analysis of Sentiment), is an approach that considers the structure of an entire sentence. The Stanford parser was used to extract more than 215,000 unique phrases from a dataset of 11,855 single sentences. Each of these phrases were given an intensity rating, by humans, of positive or negative sentiment.
Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank - by Stanford University's Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher Manning, Andrew Ng and Christopher Potts.
It pushes the state of the art in single sentence positive/negative classification from 80% up to 85.4%. The accuracy of predicting fine-grained sentiment labels for all phrases reaches 80.7%, an improvement of 9.7% over bag of features baselines
Mining of Massive Datasets by Jure Leskovec, Anand Rajaraman, Jeff Ullman, including Chapter 1 - Data Mining, Chapter 3 - Finding Similar Items, Chapter 12 - Large-Scale Machine Learning
Data Smart: Using Data Science to Transform Information into Insight
Introduction to Information Retrieval by Christopher D. Manning, Prabhakar Raghavan and Hinrich Schütze, including Chapter 6: Scoring, term weighting and the vector space model Chapter 13: Text classification and Naive Bayes
An Introduction to Statistical Learning with Applications in R by Gareth James, Daniela Witten, Trevor Hastie and Robert Tibshirani
Natural Language Processing with Python by Steven Bird, Ewan Klein, and Edward Loper
Tom Mitchell's Machine Learning courses at Carnegie Mellon
Alex Holehouse's notes for the Fall 2011 session of the Stanford Machine Learning course
CS 229 Machine Learning - Autumn 2014
Stanford Machine Learning on Coursera. The on-demand version