The first time I used Amazon’s Mechanical Turk it was at a search engine startup, Powerset, and I used it to compare the quality of a few versions of our early internal algorithm with Yahoo and Google. We were thinking we would have to hire a team of people that would spend their entire day comparing the quality of results.
As an experiment, I set up a task with no quality control, put in about fifty bucks and let it run overnight. The data that came back was noisy but I was able to find meaningful differences between the search engines. Completely on my own. I didn’t have to get approval to hire people, put my experimental design through a committee and wait a month for the results to come back. I could design the experiment empirically, doing meta experiments on the data collection process itself.
Back when I was thinking about what machine learning papers to write at Stanford, the conversation always hinged on what kind of data sets were available. We’d go research what data was out there and then figure out what we wanted to do. We’d spend a ton of time wrangling data designed for one purpose into another. I think it’s the same in lots of disciplines that use data.
Here at Dolores Labs, we’ve built tools and processes to quickly and efficiently collect lots of data on Mechanical Turk and other places. I hope that this blog gives us a chance to play with our technology. Back when I made my first AMT jobs, I thought about all the crazy experiments I wanted to run. Overnight, could you figure out what airline carrier was the cheapest? Could you find the exact threshold where what most people call “red” becomes what most people call “orange”? Could you quantify the difference in sentiment between FOX news and NPR?
When I was in college, I had an art teacher who made everyone draw twenty pictures a day. I hope these experiments are like those pictures. Sloppy and fun and occasionally brilliant.
We’ve been brainstorming experiments that we’d like to run, but if there’s any data set that you’d like to suggest send us an email. Maybe we can make this deal: if you have a cool idea, we’ll collect the data for you, and you guest post a short analysis.
Our first experiments will be posted shortly, and many more to come. I hope you enjoy em!
Edward Vielmetti
03/18/08
hm, very cool.
i wonder what you’d get if you asked your turkers to give some details about where they are in the world. e.g. is it reasonable to quiz them all on their distance to the nearest starbucks, the nearest library, the nearest jail or prison? that would be a neat set to visualize…
another very good data set that would be good to have is to give someone a web page, and ask them to give you one (three, five) sets of search terms for which that page would be on good result for a search for that page.
caroline collins
03/18/08
I thought i noticed a time-of-day effect in postings about obama and clinton on a certain NY Times message board. people who posted during working hours seemed more obama-friendly; the opposite seemed true in the evening (more clinton-friendly.)
i guess there would be a way to have raters rate any given posting (time-stamp concealed) and then assign a shade of grey ( or red or blue) to denote a point on a specified obama-clinton continuum. then, simply make a tall column where each narrow row corresponds to the original postings, in order, but is represented by the color most frequently assigned by the raters. (or a mean if you prefer.) If the tall column looks like a gradient or two, there will have been a time of day effect.
I noticed the phenomenon on the day an article ran about Hillary’s red phone ad. Early posters were mostly vehement Hillary haters, or so it seemed, and it made me think the Hillary supporters might be at work, at the kind of jobs where they don’t have the luxury of reading and posting to the NYTimes web sites. Later in the day, after dinner, it did seem that the Hillary folk began chiming in.
I don’t trust my biased self to rate the postings, and there were almost seven hundred of them, so it would be cool to have the Turks do it.
Caroline
Bruce
03/19/08
How about making a thing like the Dolores color wheel, but with “perceived race” or with “perceived gender?” You can show Turks lots of photos (from facebook?) and have them guess race and/or gender, right? Maybe a two-dimensional plot with male-female on one axis and white-black on the other would be interesting, especially if it focused on the faces with less agreement (e.g., twelve Turks said X and eight Turks said Y).