Predicting information diffusion in Twitter
Posted by Scott on May 18, 2010
Jiang Yang, rockin’ grad student at Michigan and former MSR intern, and I will be showing a pair of short papers on information diffusion in Twitter at the upcoming ICWSM conference. The first paper examines person and tweet characteristics to see what predicts aspects of information diffusion. Predictors included things like the number of posts and number of mentions for a user, and whether a tweet contained a link. The outcome variables were the speed (how quickly does information travel through the network), scale (how many nodes at the first degree are affected), and range (how many hops in the network) of information diffusion, nice visualized courtesy of Jiang’s design skills:
Using about a month’s worth of “spritzer” feed content, Jiang built regression models to describe the percent of variance the different characteristics account for in each aspect of information diffusion across a number of sample topics. For speed and range Jiang suggested using a Cox proportional hazards regression model, which is often used for survival analysis. In the case of speed of information diffusion for example, this allowed us to describe how the different characteristics predict how information diffusion will die off over time. For the scale of diffusion (number of child nodes affected) we used a standard regression model. Here are the results from that model, showing correlation coefficients for each predictor:
One take-away is that the historical rate with which a user is mentioned (Log(nMentioned) in the above table) generally is the best predictor across all three measures. In fact, the correlations with scale of diffusion reach the .5 and .6 range. More nuanced findings include discussion about how the topic stage during which the tweet happens (i.e., early or late in the life cycle of a topic) can have significant impact, but not always in the same direction (that is, tweets earlier in a topic do not always have the greatest diffusion).
The second paper, btw, compares diffusion and network structure in Twitter to that of a blogging network. Here is a graphic comparing the two: