MSE Project
by Matt CholickI thought for a bit and I've settled on exploring twitter and data mining/machine learning for my master's project. Once I started looking around I was surprised at the sheer volume of academic papers related to twitter. There are quite a few. Folks have written about all sorts of interesting bits of information that can be teased from twitter streams. I even saw a paper suggesting doing away with political polling and replacing that with twitter analysis.
This week's reading list:
- Adapting information bottleneck method for automatic construction of domain-oriented sentiment lexicon
- An Approach Based on Tree Kernels for Opinion Mining of Online Product Reviews
- Differences in the Mechanics of Information Diffusion Across Topics-Idioms, Political Hashtags, and Complex Contagion on Twitter
- Everyone's an influencer- quantifying influence on twitter
- Identifying Topical Authorities in Microblogs
- Information Credibility on Twitter
- Sentiment Knowledge Discovery in Twitter Streaming Data
- Twitter Sentiment Classification using Distant Supervision
- Twitter as a Corpus for Sentiment Analysis and Opinion Mining
- We know who you followed last summer- inferring social link creation times in twitter
- Who Says What to Whom on Twitter
After researching data mining algorithms for a bit, it started to make more sense. Twitter is a data miner's dream. The sheer volume of information is a fantastic asset. The data is 'clean' too in a lot of ways. There's nothing like html markup or other junk in the data. The shortness forces content focus. The follower/following relationships allow influence graphing. Users pre-label their data with hash tags. Developers really couldn't design a better data gathering tool for mining if they tried.
I'm excited. This is going to be some pretty fun work. I suspect, though, that I'll feel somewhat differently about how I share things on twitter at the conclusion.