Login

Login

The CrowdFlower Blog

The Case for Online Experimentation

by

Online labor markets dramatically lower the cost and hassle of conducting experiments. On Amazon’s Mechanical Turk, it is easy to run multiple experiments per week. Figuring out how to run experiments isn’t that hard, as there are already some nice tutorials available.

However, what I felt was missing from the field was a discussion of why, precisely, we can trust results from online experiments. This was the motivation for a new paper, jointly written with Dave Rand (who wrote up part of this study here on the Dolores Labs blog) and Richard Zeckhauser.

You can download the paper here.

While we make the practical and theoretical case for online experimentation, we believe that acceptance of online results as “valid” will come after people start seeing how easy and reliably one can replicate previous studies. This is why blogs like Experimental Turk and Deneme—both of which report results from AMT experiments—are so helpful. In our paper, we continue this process by replicating three results that are fairly well established.

In one experiment for the economists, we show—contra the usual intuition—that at least some Turkers are financially motivated, despite the very low stakes. After performing an initial text transcription task, workers were offered some randomly chosen amount of money to do an additional transcription. Results show the counts of people who agreed (“Yes”) and the counts of people who did not agree (“No”), by amount offered.

Turkers and Money

Nothing too surprising—offer to pay more and more workers will accept—but at this stage in the development of online experiments as a methodology, “surprising” would probably be bad news.

Anyway, the full paper is here. We’d love to get comments and feedback—it’s not too late to earn a place in our coveted “thanks” footnote!


Comments

  1. Charlotte Wickham

    At a quick skim it’s an interesting paper. My comments relate to the figures (because that’s generally what I look at first).

    Figure 1:
    You probably don’t want bars in this plot. I think its generally unwise to use bars for point estimates and especially here where your x-axis doesn’t even start at zero. How about dots plus error bars?

    Figure 2: Can I suggest the reordering of country as: Europe, Other, India, US?


  2. We recently used Mechanical Turk as part of a study on how users respond to different security warnings. You can read the paper here:
    http://dmolnar.com/papers/secdelay-weis2010.pdf

    To appear at the Workshop on Economics and Information Security 2010. In general, Mechanical Turk is a source of people for these kinds of “experimental economics” questions.

    I am also starting to look at Mechanical Turk for coding free-form data (e.g. web pages) to feed into an analysis. The hope is that Turkers could replace time consuming tasks such as reading a web page and determining if it is an advertisement for a certain product. This is a much better fit than surveys for the current Crowdflower interface, because it lends itself naturally to the notion of “gold” questions, to “trusted workers,” etc.


Leave a Reply

Comment