MSE Project

by Matt Cholick

I thought for a bit and I've settled on exploring twitter and data mining/machine learning for my master's project. Once I started looking around I was surprised at the sheer volume of academic papers related to twitter. There are quite a few. Folks have written about all sorts of interesting bits of information that can be teased from twitter streams. I even saw a paper suggesting doing away with political polling and replacing that with twitter analysis.

This week's reading list:

  • Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon
  • An Approach Based on Tree Kernels for Opinion Mining of Online Product Reviews
  • Differences in the Mechanics of Information Diffusion Across Topics-Idioms, Political Hashtags, and Complex Contagion on Twitter
  • Everyone's an influencer- quantifying influence on twitter
  • Identifying Topical Authorities in Microblogs
  • Information Credibility on Twitter
  • Sentiment Knowledge Discovery in Twitter Streaming Data
  • Twitter Sentiment Classification using Distant Supervision
  • Twitter as a Corpus for Sentiment Analysis and Opinion Mining
  • We know who you followed last summer- inferring social link creation times in twitter
  • Who Says What to Whom on Twitter

After researching data mining algorithms for a bit, it started to make more sense. Twitter is a data miner's dream. The sheer volume of information is a fantastic asset. The data is 'clean' too in a lot of ways. There's nothing like html markup or other junk in the data. The shortness forces content focus. The follower/following relationships allow influence graphing. Users pre-label their data with hash tags. Developers really couldn't design a better data gathering tool for mining if they tried.

I'm excited. This is going to be some pretty fun work. I suspect, though, that I'll feel somewhat differently about how I share things on twitter at the conclusion.