Least Squares

just trying to minimize error

How My Neighborhood Reacted to the Opening of A Homeless Shelter

Posted by Scott on May 19, 2011

Last fall* a homeless shelter operated by the SHARE organization opened in my neighborhood of Wallingford in Seattle. Wallingford is a left-leaning Seattle neighborhood, the kind of neighborhood that supports causes and organizations like homeless shelters almost without saying. Or does it?

The details of the shelter were found to be unpalatable to some neighbors. Mainly they opposed the lack of oversight on the shelter (it’s self-governing), the lack of background checks on residents of the shelter (SHARE adamantly opposes background checks on its shelter residents), and the fact that there is a preschool also in the church that was to house the shelter. Also, the opening of the shelter was announced less than two weeks before the shelter was to open.

Can’t put your neighborhood where your ideology is? The discussion raged at several “townhall” type meetings, and to a surprising extent on our neighborhood blog, Wallyhood. The blog really played a critical role in raising awareness, spreading information, and in providing a discussion forum.

I thought it would be interesting to poke at the blog comments to see what they said about how people participated in this discussion. To do so, I grabbed all the posts and comments from the month of September, separated out the posts about the shelter (there were 5 of them: 1, 2, 3, 4, 5) and then compared the comments on the Share posts to those on the remaining posts.

First, there were 90 posts total, yet the number of comments on the 5 Share posts (484) was far greater than the number of comments on the 85 No Share posts (327). How about the distributions of commenters to comments?

These curves look fairly similar, but actually they’re pretty different. The Share curve is driven much more by a few extreme posters, while the No Share curve is a much “softer” distribution. In fact, if you fit these as power laws (neither of these are great fits, but just for illustrative purposes), the alphas are quite different: 1.79 for No Share versus 2.12 for Share. So, the typical commenting pattern is more evenly distributed than is the commenting pattern for the Share topic.

OK, so a small group is really driving the conversation about this topic. What is the tone of what they are saying? To check this, I ran the text of the comments through LIWC:

This shows how different types of words were used in the Share versus No Share post comments. For instance, people used more ‘anger’ words in the Share post comments. The height of the bar indicates the relative difference in use of the word type between the two post types; the bars are ordered left to right (and color coded) by effect size (Cohen’s d): the most Share on the left to the most No Share on the right. Only word types with effect sizes of .4 or greater are shown.

The Share posts generated comments that were more angry, social, “other” oriented (he/she/they), and more about health. Typical posts are more about leisure and about space and time (likely describing events and happenings in the neighborhood). It’s worth noting, though, that not all the other posts are fluffy noncontroversial posts – there are plenty of discussion-worthy topics.

Overall, this was a big event in the neighborhood and it was really interesting to see just how critical the blog was. This issue literally blew away records for number of comments on Wallyhood posts. When you look at the data, it appears that while the tone of the “discussion” was angry and very other oriented, there appears to be a general tenor (as evidenced by the large number of posts and posters) but also a clear effect of a smallish number of people. I dont’ think this pattern is readily apparent when you are just reading the posts.

* Yes, in Twitter-time, last fall is like the Mesozoic. However, in terms of major events in a neighborhood, last fall isn’t all that distant. One interesting feature of hyperlocal blogs is that they reflect a time scale that’s very relevant to people’s lives, but gets overlooked in other forms of social media.


Posted in Uncategorized | Leave a Comment »

Visual Attention and Microblog Consumption

Posted by Scott on April 13, 2011

“attention is the new pagerank”


I love this idea. For two reasons actually. First, it nicely captures a change in the way people consume information. We are now firmly entrenched in an “information stream” world that is at least augmenting the previously dominant modality of explicit information look-up via traditional search. What we attend to in that stream determines which pieces of information effect us. Second, it implies a primacy for human cognition over algorithm. That’s not inherently a positive (for me, anyway), but it does change the equation, so to speak, about the mediators of information on our lives.


To that end, we wanted to know just how much cognitive attention people give their information feeds, and how that attention relates to memory, perceived value of the content, and to attributes of the information environement itself. To study all this, we tracked the eye gaze of people while they read their Twitter feeds, and then followed up with various questionnaire items and a memory test. The full results are available in an upcoming ICWSM paper by myself and Kristie Fisher. Here is a short(er) summary. [BTW, the usual caveates are in the paper, but in particular, note that this was based on a sample of active, “professional” type Twitter users. Other groups, like teens, could show very different information consumption patterns.]


Heat map


Basic Numbers
People devote about 3 seconds to reading a tweet, and it’s hard to not give a tweet at least some attention. 15% of tweets are rated as highly interesting, and very few tweets would receive a measureable behavior such as a retweet. Memory for tweets was poor. We gave people a recall task, immediately after reading their tweets, on which they simply had to indicate which tweets they had seen only a few minutes earlier. Half they had seen, and half they hadn’t, so chance was 50%. On average, people scored below 70%.


Basic numbers


Do attention, interest, and memory align?
Ideally, the stuff you attend to is the same stuff you think is most interesting and are most likely to remember. Does this happen? Kind of, but imperfectly. Thankfully, highly rated content is an fact looked at longer and better remembered. Still, these data are relatively noisy.


Comparing measures


What properties of the information environment affect attention?
It turns out that many properties of tweets that we think of as conveying signal (e.g., a retweet is something someone thought was important enough to pass along) or expand the information quotient (e.g., including a link) actually decrease attention, interest, or memory. For instance, tweets with links we looked at almost a second less than tweets without links. Retweets were rated significantly less interesting than non-retweets. Tweets from people who tweet frequently are looked at more than a second less than tweets from people who don’t tweet frequently. Finally, tweets from personal contact are more likely to be remembered than those from organizations or celebrities.



Why does all this matter?
Significant research (in quantity and importance) addresses aspects of information contagion: viral videos, meme propagation, and so on. Returning to the importance of human cognition in the ‘attention is the new pagerank’ idea, these data provide some evidence for the role the individual consumer of (social) media plays in these processes. First, there is a considerable amount of content that people think is highly interesting that receives no measureable behavior, and thus is not captured by studies of information flow in networks. On the other hand, even content that is actually seen may not be remembered, thus reducing the effective reach of the information. Second, primary vehicles for information diffusion in Twitter -retweeting, frequency of tweeting, and link sharing- all show counterproductive results: retweets are not seen as more interesting, frequent tweeting reduces attention for any individual tweet, and including links decreases visual attention to a tweet. These suggest design directions that we discuss in the paper (e.g., better link previewing so consumers can decide more quickly and accurately if the link is worth clicking).


Posted in Uncategorized | Leave a Comment »

Entropy-based tweet sampling

Posted by Scott on March 29, 2011

Problem: There are 100,000+ tweets from the past couple of days on the disaster in Japan, and you want the find the “best” 10. What do you do?

This is a very real problem if you want to search social media. There is no pagerank, no algorithm for determining the best content. What does “best” even mean in this context? Most recent? Most retweeted? Originating from the most authoritative sources (assuming you could determine who those were)? This is also a potentially very helpful problem to solve: how useful would it be for search engines to return the best social media results for any given topic?! (Note that current approaches focus on recent content and content with heavily shared links; see Bing Social and Google Realtime.)

This is the challenge Munmun De Choudhury took on last summer during her internship at MSR, working with Mary Czerwinski and me. A pair of papers on the work is set to come out in upcoming the HyperText and ICWSM conferences (early drafts here and here, respectively). Here I provide a short summary, but there are many more details in the papers.

We broke the problem down into two pieces: sampling and end user cognition. Intuitively, we want to find the sample of tweets on a given topic that the user finds most engaging, informative, and memorable. I’ll break down each piece in turn:

Sampling: A popular topic in Twitter will see tens to hundreds of thousands of tweets over the course of a couple of days. These tweets originate from people all over the planet, provide all sorts of angles on a topic (e.g., the political versus the economic perspective), and so on. How do you sample from such an incredibly diverse media source? Here Munmun had her first stroke of brilliance: leverage the diversity by sampling based on it. That is, by characterizing the set of tweets by its level of information diversity (entropy), it can then be sampled to meet a desired level of diversity. From a usage scenario standpoint, a user could then specify a level of diversity (ranging from highly homogenous to highly heterogeneous).

To do this, each tweet was characterized numerically in terms of the following attributes: whether it is a retweet, is a reply, and/or contains a link, it’s timestamp, location (timezone) of the author, thematic categorization (politics, sports, technology, etc.), number of followers and followees of the author, and degree of activity of the author (number of tweets). Doing this for every tweet on a topic generates the information space for tweets over that topic that can then be sampled according to a desired level of diversity.

That makes sense, but it’s still a huge amount of items to wade through. Might there be a way to prune without losing valuable information? Here was Munmun’s second stroke of genius: what if we treat tweet streams like signals that can be compressed? As we show in the ICWSM paper, using a Haar wavelet transform, she was able to eliminate about half of the set of tweets on a topic with minimal loss of the information space. Yes! From the remaining set of tweets, she employed a greedy sampling algorithm that sampled 10 tweets that best matched a specified level of diversity.

This sampling process, then, follows three high level steps: 1) characterize the set of tweets on a topic as an information space by quantifying each tweets along a number of attributes, or dimensions, 2) prune the information space in such a way as to not lose much of that information space, 3) from the remaining items, iteratively create a sample of 10 items that matches a desired level of diversity. Here is an example from the tweets about last summer’s oil spill in the Gulf of Mexico. Our method is ‘PM’ at the bottom. MTU represents tweets with the most tweeted URLs, which was the second best method in our tests.

Example Results

It turns out that for a given level of diversity, there is pretty much one set of 10 items that meets that level. Here we show that regardless of the starting “seed” tweet, the resulting set of 10 tweets for a given level of diversity is very similar. Note the slight increase in variation at the middle levels of information diversity (because there are more ways the tweet space can be combined -more degrees of freedom- to achieve levels of diversity toward the middle of the diversity scale):

Plot: different seed tweets

We go on to show that Munmun’s sampling process better matches target levels of diversity, across a wide range of topics and across levels of diversity.

OK, but how do you know these are good samples? PageRank offers an objective measure for which result is the best for a given search term. In the absence of such a metric, we turned to user cognition. That is, our reasoning was that the best social media results would be informative, engaging to read, and actually be remembered by users. A user study showed that Munmun’s method (PM) generated tweet samples that were largely better than a number of baselines (B1-3) along these user cognition measures:


Summary: How do you sample a highly diverse media space like Twitter for the best content, and how do you define “best”? In this work we approached this problem from a entropy-based sampling and user cognition standpoint, with promising results.

Posted in Uncategorized | 2 Comments »

Lies, damned lies, and data visualization

Posted by Scott on August 30, 2010

I have a ton of respect for information vizualization. It’s captivating, intuitive, and informative, all wrapped together in an awesome ‘art meets science’ package. There are so many people doing information visualization brilliantly and it’s a super exciting field to watch.  

Worth reflecting on, however, is the current state of mainstream data viz. The infographic, having graced the cover of USA Today for some time now, seems to be getting more sophisticated and we find ourselves in an era of some sort of amalgam of infographics, infoviz, and “data science”. How many charts, graphics, or other graphical depictions of data do you see every day? Here are three reasons why the data visualization we see today is more potent than the ‘statistics’ referred to in the ‘lies and damed lies’ quote popularized by Mark Twain (in addition to the sheer volume and size of audience, both of which have got to be orders of magnitude larger than in Twain’s day):

1. Seeing is compelling and humans are cognitively lazy Let’s start with the basics: Visualized data is… well, visual and often much more compelling. As an example, which of the following representations of self-reported smokers by state (courtesy of ManyEyes) is more interesting:



I grabbed pretty much the first example I saw on Many Eyes, but the difference is stark. The map is endlessly interesting, while your eyes glaze over by about Arkansas when reading the list. Couple that with the fact that humans are cognitively lazy and unlikely to take the time to parse the list and the map is the clear winner. Now this is basically a good thing, except:

2. Lack of context/perspective (and humans are cognitively lazy) David McCandless recently gave a very inspiring TED talk about data visualization, the most important part of which for me was where he showed that while the military budget for the U.S. was indeed the largest in the world, but when framed as a fraction of GDP, it’s actually 8th in the world. Or consider this recent Chart of the Day from Silicon Alley Insider, with the headline announcing that teens text every 10 minutes when awake:

That big green bar for people under 18 certainly pops. The text actually clarifies that this includes messages sent and received, which cuts in half the perception the headline implies, that teens send a text every 10 minutes (not to mention that you can text more than one person at a time, and thus messages received are likely far higher than messages sent). More misleading, the numbers focus on the average over time (1 text every 10 minutes), while the usage pattern likely is not that constant – texting probably happens in bursts when coordinating, etc., and then not at all during other times. This lack of depiction of the actual temporal patterns of the texting is what I mean by lack of context or perspctive. This is not to say teens aren’t heavy texters, but it’s not as clear cut as the figure, and certainly not as clear cut as the headline. Again, though, we’re cognitively lazy and unlikely to wade through the details. (Note: this same lack of context applies to traditional statistics as well, which often fail to include covariates, and so on.)

3. No established metrics for significance In inferential statistics there are generally established values for different tests, such as the well known p < .05, but many more around effect sizes, confidence intervals, and so on. Infographics never say, “have a look, but keep in mind there’s a 60% likelihood this could have occured by chance.”  Here is an example, from a recent USA Today:

You look at this and think that more African-Americans follow baseball than whites or Hispanics. Actually if you assume they sampled 100 people of each ethnicity and then test the results with a chi-square for independence (which I did), this is a non-significant effect. In fact, it’s not even close to significant. Even if this infographic did come with such a disclaimer, it would be like the lawyer who says a bunch of stuff that she knows will be stricken from the record, but that she knows the jury can’t help but process. (BTW, for these percentages to reach significance you would need close to 1000 people of each ethnicity sampled, but the infographic doesn’t report the sample size.)

Statistics are about extracting meaning from numbers. So is data visualization. The question, and this is at the heart of the Twain quote, is to what extent can we trust what they are saying. Is there an infographic equivalent of p < .05?

Posted in Uncategorized | 3 Comments »

Top Twitter Authors for Topic ‘God’

Posted by Scott on August 20, 2010

Yesterday I tweeted about an NPR piece on the neuroscience of religious experience. This got me wondering about people who tweet about ‘god’, so I ran ‘god’ through this algorithm we’ve been working on for finding topical authorities in Twitter. Here are the results, along with a few observations.

Notes/Caveats: This was computed using one ‘authoritativeness’ method (developed largely by Aditya Pal, with a bit of chiming in from me). There are many ways to do this, each of which would likely yield a different result. Without going into the details, our method is not a graph-based solution (though we do incorporate some graph features), it looks only at the most recent 5 days of Twitter, and does not include latent topics. So, if a person hasn’t tweeted the word ‘god’ in the last 5 days, they won’t be included. Also, people use the word ‘god’ all the time in non-religious ways and thus you end up with people like @stewie_griffin and @chazsom3ers on the list, who I simply ignore.


  1. RevRunWisdom
  2. chazsom3ers
  3. VanNessVanWu
  4. MaxLucado
  5. ihatequotes
  6. jaesonma
  7. CSLewisDaily
  8. UGOdotcom
  9. RickWarren
  10. Stewie_Griffin
  11. TheLoveStories
  12. DaRealAmberRose
  13. JoyceMeyer
  14. DeepakChopra
  15. FunnyOrFact

Some (very non-scientific) observations

In terms of basic numbers, these folks (ignoring the couple of non-religious accounts) have an average of 257k followers, ranging from 12k to 1.3m, though most are in the 100-200k range (removing the person with 1.3m followers lowers the average to 127k). They have an average of 19.3 tweets containing the word ‘god’ over the past 5 days, ranging from 7 to 53, though most are very close to 20 (or 4 per day).

The results seem to fall into a couple of categories, but in general, the dominant trend is around the sharing of inspirational quotes and words of wisdom or encouragement, with a healthy dose of business-oriented media savvy throw in. People like @joycemeyer are clearly leveraging Twitter.

Religious hipsters/musicians

@RevRunWisdom (Run from Run DMC; 1.3M followers)  is the number one result and really falls more in the category of what I snarkily call the ‘self-help section of Twitter’ (see below). He mainly posts quotes, using lots of hashtags like #anxietyfree and #powerprayer. There’s little in the way of personal content, and you get the feeling he uses Twitter kind of like preaching – it’s about conveying a messge, not about his personal life. Overall though, he’s a celeb who feels very ‘real’.

Also very hip looking, but much less of a celeb is @jaesonma (12k followers): If you click through to his webpage, it’s about ‘God, Culture, Mission.’ From what I can tell, this is a slick, hip, social media-savvy, way to spread his message. @VanNessVanWu (27k followers) fall into this same category: contemporary religious soul musician.

Ministers The business of god

This is interesting. If the democratization of preaching afforded by Twitter lets the religious hipster get 12k followers, how about actual ministers? Oops, we don’t know because there aren’t any on the list, really. The closest is @MaxLucado (100k followers) who is a minister, but also an author, and whose tweets seem to swing between insprirational quotes and updates about his book tour. @RickWarren (“Location: I live in the State of Grace”; 136k followers) runs pastors.com. His tweets are mainly inspirational quotes, and the whole thing looks very well-meaning, though very aware of the power of Twitter to reach a large audience. Similarly, @JoyceMeyer clearly uses Twitter in a very media/business savvy manner, with lots of promotion of her inspiration empire. Finally, we have Deepak Chopra, though somewhat surprisingly he only has 265k followers (@RevRunWisdom has 5 times as many).

Reading these folk’s tweets makes me wonder about the balance of spreading inspiration and marketing a business. My guess is this approach is super successful: people follow for the inspriation and also get updates on tour info, promotions, etc. It seems well intentioned, but there’s no doubt it’s also a business.

The self-help section of Twitter

Just like at your local Barnes and Noble, there is quite the demand for inspiration and encouragement on Twitter, and the character limit seems perfect for quote sharing. Authors like @cslewisdaily, @ihatequotes, and @thelovestoreis (“Location: Your heart”) do virtually nothing but quote sharing. I’d love to know what percentage of Twitter is quote sharing.

Posted in Uncategorized | Leave a Comment »

MSR Folks Working in Social Computing

Posted by Scott on August 18, 2010

It was pointed out to me recently that MSR’s web presence for our collective social computing effort is painfully out of date. Ah, group web pages – serious diffusion of responsibility. In a small effort to remedy the situation (without, of course, actually taking the time to fix the group web pages), here is a list of some of the people and groups working in the social media/computing space as of summer 2010.

Note that while I provide short descriptions, each person’s work is considerably more multifaceted than that. Also, this list could be a touch shorter if you restricted your definition of social computing, but it would be much longer if you broadened to include areas like CSCW, telepresence, and HCI. Finally, I’m sure I missed people, especially outside the Redmond lab.

Redmond Lab

Adaptive Systems and Interaction (ASI, CLUES)


Machine Learning and Applied Statistics

Natural Language Processing


Text Mining, Search, Navigation

New England Lab

China Lab

  • Chen Zhao (SNSs in China, enterprise social computing)

India Lab


Outside MSR

  • FUSE – Lots of innovation in social media going on there.
  • Bing – Bing is also really innovative and has great people. Check www.bing.com/social , which should continue to get more sophisticated and more interesting. Also, check the twitter and hyperlocal blog map mashups from Matt Hurst and company.


Posted in Uncategorized | Leave a Comment »

Predicting information diffusion in Twitter

Posted by Scott on May 18, 2010

Jiang Yang, rockin’ grad student at Michigan and former MSR intern, and I will be showing a pair of short papers on information diffusion in Twitter at the upcoming ICWSM conference. The first paper examines person and tweet characteristics to see what predicts aspects of information diffusion. Predictors included things like the number of posts and number of mentions for a user, and whether a tweet contained a link. The outcome variables were the speed (how quickly does information travel through the network), scale (how many nodes at the first degree are affected), and range (how many hops in the network) of information diffusion, nice visualized courtesy of Jiang’s design skills:

Speed, Scale, and Range

Using about a month’s worth of “spritzer” feed content, Jiang built regression models to describe the percent of variance the different characteristics account for in each aspect of information diffusion across a number of sample topics. For speed and range Jiang suggested using a Cox proportional hazards regression model, which is often used for survival analysis. In the case of speed of information diffusion for example, this allowed us to describe how the different characteristics predict how information diffusion will die off over time. For the scale of diffusion (number of child nodes affected) we used a standard regression model. Here are the results from that model, showing correlation coefficients for each predictor:

One take-away is that the historical rate with which a user is mentioned (Log(nMentioned) in the above table) generally is the best predictor across all three measures. In fact, the correlations with scale of diffusion reach the .5 and .6 range. More nuanced findings include discussion about how the topic stage during which the tweet happens (i.e., early or late in the life cycle of a topic) can have significant impact, but not always in the same direction (that is, tweets earlier in a topic do not always have the greatest diffusion).

The second paper, btw, compares diffusion and network structure in Twitter to that of a blogging network. Here is a graphic comparing the two:

Posted in Uncategorized | 1 Comment »

CHI 2010 “trip report”

Posted by Scott on April 16, 2010

Usually following conferences on Twitter is so-so: you get a vague sense of the talks and ideas, often conveyed by pithy quotes from speakers that have lost context because you aren’t there. This year’s CHI was was surprisingly informative from afar, or at least it felt that way. I started jotting down notes part way through – just scanning tweets every few hours and jotting things down- so here’s a rough journaling of what I got out of CHI through Wednesday the 14th. (Of course, @infomor says @grammarnerd has emprically demonstrated that reading about it is not as good as actually being there.)



  • The microblogging workshop was awesome! Topics ranged from tools for filtering and consuming social media (e.g., Eddi, FeedWinnower), microblogging in India, comparing Twitter to 19th century diary practices, Twitter use in different demographics in the US (e.g., Hollywood and Compton), using Twitter for self-reflection.
  • Best tweet: “@Louis16 OMG! populace storm Bastille #revolution”
  • Links: Eddi, Gene’s talk, workshop homepage, Google group for Twitter research, FeedWinnower, Twitter in (German) politics
  • Some of the people there: Michael Bernstein, Ed Chi, Gene Golovchinsky, Alice Oh, danah boyd, Lee Humphreys, Joan DiMicco, Bongwon Suh, Dejin Zhao, Julia Grace
  • Some of the people who wished they were there: me, Cliff Lampe, Sadat Shami
  • Photo:

Sessions, 4/12:

  • Opening plenary by Genevieve Bell – inspiring; frontier areas for research: religion, government, gender, sports, manners, non-connectedness; factoid: Isreal is the only place with TV-internet consumption parity
  • Skinput was the “talk of the day!”; “Holy outside the box thinking” >> nice work Chris Harrison and fellow MSRers Dan Morris and Desney Tan!
  • Nice talk by Gary Hsieh on effectiveness of paying for answers in Q&A

Sessions, 4/13

  • Tangible UI session: lots of innovation, Lumino = standout work
  • At home with computing = homiest session
  • Shared file systems = persistent thorn in side. We just aren’t good at labeling files for others.
  • HCI for all: well attended, great presentations by Shaowen Bardzell and Lilly Irani (“must reads”). “Postcolonial Computing” [paper]
  • Crisis informatics: nice ontology of crisis information (vieweg)

Sessions 4/14

  • Sharing session: Fred Stuztman on privacy and Facebook was terrific. People’s perceptions of their FB audience does not match their actual audience
  • Nice talk by Sean Munson on online polarization (work with Paul Resnick)
  • Great presentation from folks at Telefonica on Social Tagging Revamped
  • Jamie Teevan (w/ @katrina_) talk on using FB for Q&A: 90% of questions get answered, 25% in under half an hour (similar to results by Sadat Shami of question asking in microblogging)
  • Moira Burke gave an excellent talk on loneliness and Facebook [paper]
  • IBM’s Blog Muse: generate more participation by letting people suggest topics for others to blog about.
  • Brian Bailey gave a great talk about idea management systems at Microsoft


  • Atlanta is gorgeous this time of year
  • Flip Burger rocks!
  • Lucy Suchman rocks!
  • Havleti Indian food does not rock.
  • Contador to ride in Castilla y Leon classic (oops, cycling tweet)
  • Fake Cliff Lampe
  • Some rooms were too far away from others and some weirdly sized (e.g., madness in too small a room)
  • Centennial Olympic Park is a good place to run; in other places you might get shot at.
  • NodeXL generated Twitter network graph of CHI2010
  • Ed Chi and Niki Kittur color coordinate their shirts:

  • Ed Chi also gets the final line in this hilarious video about CSCW and CHI
  • BREAKING NEWS: Library of Congress to archive all public tweets
  • Alison Druin and Ben Bederson honored with the Social Impact Award
  • Upcoming CHIs: Austin, Paris, Toronto in 2012/13/14.



  • Panels: good in idea, poor in practice?
  • Sophistication of machine learning aspects of CHI (e.g., manimatrix)
  • Room Stream: Twitter streams per room (thanks to Sarita Yardi)
  • Google, YouTube, and PARC are hiring
  • We should (re)consider video recording/streaming
  • 2nd annual video showcase: more than just free popcorn! Here’s the highlight reel
  • Do we really need 3 reviewers for every paper? Let’s save some reviewing time says David Karger

Posted in Uncategorized | 3 Comments »

Things being predicted by social media

Posted by Scott on April 8, 2010

It seems like there’s been a flurry of predictive uses of social media, so I thought I’d compile the one’s I’ve seen recently that are reasonably successful. I’m sure there are more. There is diversity, even in this short list: entertainment, finance, politics – things with measurable outcomes decided by the general public (e.g., elections rather than sports). 

  1. American Idol
  2. The stock market
  3. Box office revenues
  4. Elections, UK
  5. Elections, Germany
  6. Political opinion polls, U.S.

Posted in Uncategorized | Leave a Comment »

Social Intellisense

Posted by Scott on March 31, 2010

A year and half or so ago we built this little prototype called Social Intellisense. The core idea was to make information as readily available as possible during authoring. We couldn’t think of anything more at-the-ready than literally in-the-flow-of-typing access to information, so we borrowed the intellisense interface concept and hooked it up to some web services.

The result is that you can drop a flickr photo into an email, insert a stock quote into a document, etc., without stopping typing. This is a little bit like Mozilla’s Ubiquity, minus the natural language bits and focused on an information access scenario. To round it out, we added generic storage so you could push content into shared information spaces. We have a short paper on this at ICWSM, so I’ll be showing it there in May, but I threw together a screencast I figured I’d post.

Oh, and as I say in the video, the time is right for something along these lines: info-snippets are readily available and often organized courtesy of tagging and other socially organized information systems. Social Intellisense is one way of doing it. What I don’t say in the video is that we spent several months working with Live Labs to build this into a browser plugin beta. Unfortunately, most of Live Labs got re-org’d, leaving our project hanging. Bummer, but I still like the idea.

[Note, the text in the video is tiny, but full screen viewing makes it readable.]

Posted in Software Prototypes | Tagged: , | 3 Comments »