Cholick.com - MSE Project

MSE Project

April 5, 2011 by Matt Cholick

I thought for a bit and I've settled on exploring twitter and data mining/machine learning for my master's project. Once I started looking around I was surprised at the sheer volume of academic papers related to twitter. There are quite a few. Folks have written about all sorts of interesting bits of information that can be teased from twitter streams. I even saw a paper suggesting doing away with political polling and replacing that with twitter analysis.

This week's reading list:

Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon
An Approach Based on Tree Kernels for Opinion Mining of Online Product Reviews
Differences in the Mechanics of Information Diffusion Across Topics-Idioms, Political Hashtags, and Complex Contagion on Twitter
Everyone's an influencer- quantifying influence on twitter
Identifying Topical Authorities in Microblogs
Information Credibility on Twitter
Sentiment Knowledge Discovery in Twitter Streaming Data
Twitter Sentiment Classiﬁcation using Distant Supervision
Twitter as a Corpus for Sentiment Analysis and Opinion Mining
We know who you followed last summer- inferring social link creation times in twitter
Who Says What to Whom on Twitter

After researching data mining algorithms for a bit, it started to make more sense. Twitter is a data miner's dream. The sheer volume of information is a fantastic asset. The data is 'clean' too in a lot of ways. There's nothing like html markup or other junk in the data. The shortness forces content focus. The follower/following relationships allow influence graphing. Users pre-label their data with hash tags. Developers really couldn't design a better data gathering tool for mining if they tried.

I'm excited. This is going to be some pretty fun work. I suspect, though, that I'll feel somewhat differently about how I share things on twitter at the conclusion.