<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The CrowdFlower Blog &#187; Experiments</title>
	<atom:link href="http://blog.crowdflower.com/tag/experiments/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.crowdflower.com</link>
	<description></description>
	<lastBuildDate>Tue, 10 Jan 2012 20:00:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Designing Incentives for Crowdsourcing Workers</title>
		<link>http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/</link>
		<comments>http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/#comments</comments>
		<pubDate>Tue, 24 May 2011 19:19:45 +0000</pubDate>
		<dc:creator>Aaron Shaw</dc:creator>
				<category><![CDATA[Economics]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Human Behavior]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[behavior]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[data collection]]></category>
		<category><![CDATA[incentives]]></category>
		<category><![CDATA[motivation]]></category>
		<category><![CDATA[social science]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=2572</guid>
		<description><![CDATA[In a recent paper, presented at the ACM Conference on Computer Supported Cooperative Work (CSCW), John Horton, Daniel Chen and I used a large-scale experiment to test the effect of different incentive schemes on the quality of crowdsourcing work. The results surprised us. They suggest that workers perform most accurately when the task design credibly [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/" data-text="Designing Incentives for Crowdsourcing Workers" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/"></g:plusone></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/" data-counter="top"></script></div></div><p>In a <a title="Designing Incentives for Inexpert Human Raters, Berkman Center" href="http://cyber.law.harvard.edu/publications/2011/Designing_Incentives_Inexpert_Human_Raters">recent paper</a>, presented at the ACM Conference on Computer Supported Cooperative Work (CSCW), <a title="John Horton, oDesk" href="https://sites.google.com/site/johnjosephhorton/">John Horton</a>, <a title="Daniel Chen, Duke Law School" href="http://www.law.duke.edu/fac/chen">Daniel Chen</a> and <a title="Aaron Shaw, UC Berkeley &amp; Harvard" href="http://aaronshaw.org">I</a> used a large-scale experiment to test the effect of different incentive schemes on the quality of crowdsourcing work.</p>
<p>The results surprised us. They suggest that workers perform most accurately when the task design credibly links payoffs to a worker&#8217;s ability to think about the answers that their peers are likely to provide.</p>
<p style="text-align: center;">
<div id="attachment_2577" class="wp-caption aligncenter" style="width: 549px"><a href="http://www.flickr.com/photos/iyoupapa/"><img class="size-full wp-image-2577 " title="Horserace!" src="http://blog.crowdflower.com/wp-content/uploads/2011/05/3757438159_horserace-iyoupapa-altered.jpg" alt="Horserace!" width="539" height="264" /></a><p class="wp-caption-text">a horserace experiment! (photo cc-by-sa by iyoupapa)</p></div>
<p><span id="more-2572"></span></p>
<p>The idea for this study came out of our sense that, as social scientists, we had something unique to offer the existing research on human computation. <a title="AMT is fast, cheap, and good for machine learning data" href="http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/">Early</a> and <a title="&quot;Get Another Label?&quot; Ipeirotis et al. 2008" href="http://archive.nyu.edu/handle/2451/25882">influential</a> crowdsourcing research has focused on how to filter the judgments of the crowd to find the best answers. We wanted to know whether simple task-design changes could improve the quality of data coming into a crowdsourcing system in the first place.</p>
<p>To test this idea, we chose 14 different incentive schemes and framing techniques developed and validated across the social sciences and set up a horse race experiment to see which schemes/techniques would work best.</p>
<p>Consistent with our personal biases (John and Daniel are both economists, and I&#8217;m a sociologist), some of the schemes were financially oriented, some were social or psychological, and some were hybrids combining social and financial incentives. The details of all the schemes are included <a title="Designing Incentives for Inexpert Human Raters" href="http://cyber.law.harvard.edu/publications/2011/Designing_Incentives_Inexpert_Human_Raters">in the paper</a> (it&#8217;s a long list, and some of them are kind of involved), but it&#8217;s worth giving some examples.</p>
<p>On the financial end of the incentives spectrum, we had one condition we called &#8220;reward-accuracy,&#8221; which was pretty much what you&#8217;d expect: we told workers, &#8220;we&#8217;ll pay you a bonus if you get the answers right.&#8221; We also had one called &#8220;punishment-accuracy,&#8221; the gist of which you can deduce. On the purely social-psychological side, we had one we called &#8220;trust,&#8221; in which we told workers, &#8220;we&#8217;ll pay you for this job no matter how bad your performance, we trust that you&#8217;ll still make your best effort.&#8221;</p>
<p>One of the weirdest schemes turns out to be important, so I need to explain that one. Called &#8220;Bayesian Truth Serum&#8221; (BTS), it incorporates a design from the work of <a title="Drazen Prelec" href="http://econ-www.mit.edu/faculty/dprelec">Drazen Prelec</a>, a behavioral economist at MIT, who realized that research subjects could probably provide useful information regarding the expected distribution for subjective, qualitative questions (<em>nb</em>, the mechanics of how he does this are arcane in a way that is almost sure to delight the geeks among you, so I encourage you to <a title="Bayesian Truth Serum" href="http://econ-www.mit.edu/files/1966">read his paper</a>). Few of the details of <em>real</em> BTS are important, except that we incorporated the piece about asking workers to answer the questions themselves <em>and predict the distribution of other workers&#8217; responses</em>. We also told them we&#8217;d give them a bonus if their predictions were correct.</p>
<p>We then created a task that asked workers to answer five questions. In this case, the questions were drawn from another study examining participatory features of websites, for which we already possessed validated data collected by research assistants.</p>
<p>All workers answered the same five questions about the same website (<a href="http://www.kiva.org">www.kiva.org</a>) while being exposed to one and only one of the 14 incentive schemes (or a control condition of no scheme). Roughly 2,000 individuals participated in the study, resulting in over 100 subjects in each of the experimental conditions. (The statistics and science nerds out there will be pleased to know that both the drop-out rate and demographic covariates were distributed evenly across conditions.)</p>
<p>To measure worker performance, we used the research assistant responses as correct answers to the questions and then calculated the total number of matching answers (out of five) provided by each worker. The results (aggregated across all treatments) are plotted in a histogram below and show that the average worker answered just over two questions out of five correctly.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/aggperf/" rel="attachment wp-att-2578"><img class="aligncenter size-full wp-image-2578" title="Inexpert raters - Aggregate Performance" src="http://blog.crowdflower.com/wp-content/uploads/2011/05/AggPerf.png" alt="Aggregate performance histogram" width="280" height="280" /></a></p>
<p>&nbsp;</p>
<p>Then, in order to see how the treatments compared against each other relative to the control group, we calculated the mean correct response rate for each condition and conducted difference of means tests to see which of these means were significantly greater than the control group. The results of this comparison appear below (in a new plot that doesn&#8217;t even appear in the paper!):</p>
<p><a href="http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/inexpert-itt/" rel="attachment wp-att-2579"><img class="aligncenter size-full wp-image-2579" title="inexpert raters - ITT estimates" src="http://blog.crowdflower.com/wp-content/uploads/2011/05/inexpert-ITT.png" alt="ITT estimates per treatment" width="500" height="500" /></a></p>
<p>The orange dots show the value of the mean in each condition, and the blue bars illustrate the 95% confidence interval around that mean. The treatments are sorted by the size of the difference in means from the control. (More hard-core nerd stuff: the means are adjusted using Intent-To-Treat estimators).</p>
<p>From these results, we concluded that our horse race had two clear front-runners: the &#8220;Bayesian Truth Serum&#8221; (BTS) and &#8220;Punishment &#8211; disagreement&#8221; conditions, each of which improved average worker performance by almost half of a correct answer above the 2.08 correct answers in the control group. A few of the other financial and hybrid incentives had fairly large point estimates, but were not significantly different from control once we adjusted the test statistics and corresponding p-values to account for the fact that we were making so many comparisons at once (apologies if this doesn&#8217;t make sense — it&#8217;s yet another precautionary measure to avoid upsetting the stats nerds among you). In a tough turn for the sociologists and psychologists, none of the purely social/psychological treatments had any signficant effects at all.</p>
<p>Why do BTS and punishing workers for disagreement succeed in improving performance significantly where so many of the other incentive schemes failed? The answer hinges on the fact that both conditions tied workers&#8217; payoffs to their ability to think about their peers&#8217; likely responses. (We elaborate on the argument in more detail in the paper.)</p>
<p>Does this mean that we should give up on simple financial or social-psychological incentives? Probably not. The fact that we conducted the experiment on MTurk means that the deck may have been stacked against incentives like the &#8220;trust&#8221; condition I described earlier. Because requesters on MTurk have little oversight, workers are more likely to respond to financial incentives than stated promises. In this sense, the marketplace has structured the interaction between workers and requesters in a way that may limit the opportunities to harness motivations that are not linked to money in some explicit way.</p>
<p>You can <a title="Designing Incentives for Inexpert Human Raters" href="http://cyber.law.harvard.edu/sites/cyber.law.harvard.edu/files/Shaw-Horton-Chen_Designing_Incentives_Inexpert_Human_Raters_2011.pdf">download the full paper</a> to read more.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/05/designing-incentives-for-crowdsourcing-workers/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Visions and revisions</title>
		<link>http://blog.crowdflower.com/2010/10/visions-and-revisions/</link>
		<comments>http://blog.crowdflower.com/2010/10/visions-and-revisions/#comments</comments>
		<pubDate>Sat, 30 Oct 2010 00:46:42 +0000</pubDate>
		<dc:creator>Josh Eveleth</dc:creator>
				<category><![CDATA[Art]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Wisdom of Small Crowds]]></category>
		<category><![CDATA[art]]></category>
		<category><![CDATA[Hobbes]]></category>
		<category><![CDATA[Sandburg]]></category>
		<category><![CDATA[Shakespeare]]></category>
		<category><![CDATA[Writing]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=1734</guid>
		<description><![CDATA[Writing is easy. Just sit in front of a typewriter, open up a vein and bleed it out drop by drop. &#8211; Red Smith When I was in college, a professor I respected said that one of the best ways to demystify writing is to write like people you admire. Specifically, to find passages that [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/10/visions-and-revisions/" data-text="Visions and revisions" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/10/visions-and-revisions/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/10/visions-and-revisions/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/10/visions-and-revisions/"></g:plusone></div></div><blockquote><p>
Writing is easy. Just sit in front of a typewriter, open up a vein and bleed it out drop by drop.<br />
<span style="font-size: 13.3333px;">&#8211; Red Smith</span></p></blockquote>
<div id="attachment_1434" class="wp-caption alignnone" style="width: 330px"><img src="http://blog.crowdflower.com/wp-content/uploads/2010/10/Underwoodfive.jpg" width="320" height="240" /></a><p class="wp-caption-text">Underwood Five Typewriter</p></div>
<p>When I was in college, a professor I respected said that one of the best ways to demystify writing is to write like people you admire. Specifically, to find passages that you love, and try to revise them in your own words. This exercise proved invaluable. It allowed me to walk in their literary footsteps, shedding light on why they chose &#8212; or avoided &#8212; certain words, punctuation, and syntax.</p>
<p>With this in mind, I recently wondered whether you can crowdsource writing, specifically, revising.</p>
<p><span id="more-1734"></span></p>
<p>I posted a task through CrowdFlower that asked the crowd to rewrite four famous quotations, pithily, while preserving their meaning.</p>
<p><img src="http://blog.crowdflower.com/wp-content/uploads/2010/10/Screen-shot-2010-10-29-at-11.59.44-AM.png"></p>
<p>In one evening, I was able to get 20 revisions of each quotation from people across the country. I won&#8217;t summarize them all here, but I will pull a few highlights.</p>
<p><strong>Original Quotation 1 (from <em><a href="http://www.bartleby.com/100/138.31.118.html">Macbeth</a></em>, William Shakespeare):</strong></p>
<blockquote><p>
To-morrow, and to-morrow, and to-morrow,/ Creeps in this petty pace from day to day,/ To the last syllable of recorded time;/ And all our yesterdays have lighted fools/ The way to dusty death. Out, out, brief candle!/ Life&#8217;s but a walking shadow, a poor player/ That struts and frets his hour upon the stage/ And then is heard no more. It is a tale/ Told by an idiot, full of sound and fury/ Signifying nothing.</p></blockquote>
<p><strong>Revision 1:</strong></p>
<blockquote><p>
Time marches on, and everyone dies; life is meaningless.<br />
<span style="font-size: 13.3333px;">&#8211; Hatboro, PA</span></p></blockquote>
<blockquote><p>
Life creeps along and ends suddenly like the end of a bad play. The play is dramatic and had poor acting, and has no point or moral in the end.<br />
<span style="font-size: 13.3333px;">&#8211;Salt Lake City, UT</span></p></blockquote>
<p><strong>Original Quotation 2 (from &#8220;<a href="http://www.bartleby.com/100/160.2.html">The Leviathan</a>,&#8221; Thomas Hobbes):</strong></p>
<blockquote><p>
No arts, no letters, no society, and which is worst of all, continual fear and danger of violent death, and the life of man solitary, poor, nasty, brutish, and short.</p></blockquote>
<p><strong>Revision 2:</strong></p>
<blockquote><p>
The life of a man on his own is barbaric and degrading.<br />
<span style="font-size: 13.3333px;">&#8211; East Aurora, NY</span></p></blockquote>
<blockquote><p>
No art, letters or society. Worst of all, living in fear of being alone, poor and short.<br />
<span style="font-size: 13.3333px;">&#8211; Arlington, TX</span></p></blockquote>
<blockquote><p>
Life is all but a mere scam.<br />
<span style="font-size: 13.3333px;">&#8211;Overland Park, KS</span></p></blockquote>
<p><strong>Original Quotation 3 (&#8220;<a href="http://www.bartleby.com/124/pres31.html">First Inaugural Address</a>,&#8221; Abraham Lincoln):</strong></p>
<blockquote><p>
We are not enemies, but friends. We must not be enemies. Though passion may have strained it must not break our bonds of affection. The mystic chords of memory, stretching from every battlefield and patriot grave to every living heart and hearthstone all over this broad land, will yet swell the chorus of the Union, when again touched, as surely they will be, by the better angels of our nature.</p></blockquote>
<p><strong>Revision 3:</strong></p>
<blockquote><p>
We&#8217;re fools for fighting each other. We should co-operate, and let our example bring everyone together.<br />
<span style="font-size: 13.3333px;">&#8211; Plattsburgh, NY</span></p></blockquote>
<blockquote><p>
We must be friends, we should now forget each other. Our memories will always stay through our rough times and good times.<br />
<span style="font-size: 13.3333px;">&#8211; Tallahassee, FL</span></p></blockquote>
<p><strong>Original Quotation 4 (from &#8220;<a href="http://www.bartleby.com/165/1.html">Chicago</a>,&#8221; Carl Sandburg):</strong></p>
<blockquote><p>
&#8220;They tell me you are wicked and I believe them, for I have seen your painted women under the gas lamps luring the farm boys.&#8221;</p></blockquote>
<p><strong>Revision 4:</strong></p>
<blockquote><p>
Your reputation follows you and it is a bad one, makeup and lust.<br />
<span style="font-size: 13.3333px;">&#8211; Iola, WI</span></p></blockquote>
<blockquote><p>
This city isn&#8217;t a nice place, it&#8217;s full of hookers.<br />
<span style="font-size: 13.3333px;">&#8211; Milledgeville, GA</span></p></blockquote>
<p>The full data is available <a href="http://blog.crowdflower.com/wp-content/uploads/2010/10/f17005-2.csv">here</a>.</p>
<p>What&#8217;s your opinion? Can the revision process be crowdsourced?</p>
<p>I&#8217;d love to hear your thoughts.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/10/visions-and-revisions/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>And the crowd goes wild &#8230;</title>
		<link>http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/</link>
		<comments>http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/#comments</comments>
		<pubDate>Mon, 18 Oct 2010 15:00:29 +0000</pubDate>
		<dc:creator>Patrick Philips and Joseph Childress</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Sports]]></category>
		<category><![CDATA[sports]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=1559</guid>
		<description><![CDATA[As the NFL season loomed on the horizon back in August, there was a lot of fantasy football talk around the CrowdFlower office. Naturally, we decided to crowdsource a killer fantasy football team. We asked CrowdFlower workers to help us build a ranked list of the Top 75 players to guide our fantasy football draft. [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/" data-text="And the crowd goes wild &#8230;" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/"></g:plusone></div></div><p>As the NFL season loomed on the horizon back in August, there was a lot of fantasy football talk around the CrowdFlower office. Naturally, we decided to crowdsource a killer fantasy football team. </p>
<div id="attachment_1602" class="wp-caption alignnone" style="width: 210px"><a href="http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/fantasyfootballtackle-2/" rel="attachment wp-att-1602"><img class="centered" src="http://blog.crowdflower.com/wp-content/uploads/2010/10/fantasyfootballtackle1.jpg" alt="Fantasy football tackle" title="fantasyfootballtackle"/></a><p class="wp-caption-text">Fantasy football tackle. Photo Credit: dennis</p></div>
<p><span id="more-1559"></span></p>
<p>We asked CrowdFlower workers to help us build a ranked list of the Top 75 players to guide our fantasy football draft. We used pair-wise comparisons to determine rankings. Specifically, we presented workers with two players, asking them to pick which player they thought would be more valuable. </p>
<p>We used the Top 75 players from ESPN’s 2010 Fantasy Football Draft Kit<sup>1</sup>, matching each player with his 74 counterparts, giving us a total of 2,775 player pairs. We also included each player’s position and team, as well as a link to more detailed statistics.</p>
<div id="attachment_1563" class="wp-caption alignnone" style="width: 912px"><a href="http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/fantasyfootballtask/" rel="attachment wp-att-1563"><img class="centered" src="http://blog.crowdflower.com/wp-content/uploads/2010/10/fantasyfootballtask.png" alt="Fantasy football task on CrowdFlower" title="fantasyfootballtask"/></a><p class="wp-caption-text">Screenshot of fantasy football task.</p></div>
<p>After the job finished, we ordered the players according to number of head-to-head victories.</p>
<div id="attachment_1565" class="wp-caption alignnone" style="width: 670px"><a href="http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/player_rankings/" rel="attachment wp-att-1565"><img class="centered" src="http://blog.crowdflower.com/wp-content/uploads/2010/10/player_rankings.png" alt="player rankings" title="player_rankings" /></a><p class="wp-caption-text">Crowdsourced list of the Top 25 most valuable fantasy football players.</p></div><br />
(Download the full results <a href="http://blog.crowdflower.com/wp-content/uploads/2010/10/initial-ranking.xlsx">here</a>.)</p>
<p>We found that workers choose the player on the left 53 percent of the time, even though each match-up appears twice, once with Player A on the left and again with Player A on the right. With 5,500 data points, this is significant bias toward the player on the left. However, the final ranking doesn’t change even after accounting for this bias.</p>
<p>We settled on two likely explanations for the bias:</p>
<ol>
<li>The anchoring effect of putting something on the left (and as the first response option) may have caused more workers to select the first player. </li>
<li>Our <a href="http://crowdflower.com/docs/gold" "target=_blank">Gold</a> was slightly biased toward the player on the left, which we have previously seen<sup>2</sup> to have an effect on the overall distribution of the answers.</li>
</ol>
<p>As a first comparison of our crowd of football fans with ESPN’s Fantasy Football brain trust, we plotted each player by the difference between his two rankings. </p>
<p><div id="attachment_1561" class="wp-caption alignnone" style="width: 912px"><a href="http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/espn_vs_crowd/" rel="attachment wp-att-1561"><img class="centered" src="http://blog.crowdflower.com/wp-content/uploads/2010/10/espn_vs_crowd.png" alt="crowd rankings vs. espn rankings" title="espn_vs_crowd"/></a><p class="wp-caption-text">Players on the left were ranked higher by ESPN, while players on the right were ranked higher by AMT.</p></div>
<p>Will Chad Ochocinco and Matt Ryan vindicate the crowd? Stay tuned to find out.</p>
<hr />
1. <a href="http://games.espn.go.com/frontpage/ffldraftkit" "target=_blank">http://games.espn.go.com/frontpage/ffldraftkit</a><br />
2. <a href="http://www.ischool.utexas.edu/~cse2010/slides/le.pptx" "target=_blank">http://www.ischool.utexas.edu/~cse2010/slides/le.pptx</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/10/and-the-crowd-goes-wild/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Mechanical Proust: An automated crowd-written blog</title>
		<link>http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/</link>
		<comments>http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/#comments</comments>
		<pubDate>Fri, 17 Sep 2010 15:15:02 +0000</pubDate>
		<dc:creator>John Horton</dc:creator>
				<category><![CDATA[Art]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[art]]></category>
		<category><![CDATA[mechanical turk]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=1100</guid>
		<description><![CDATA[A few months ago, I started a blog written by Mechanical Turk workers. Eight times a day, on the hour, a script posts a personal question on MTurk. The questions are randomly selected from a subset of the Proust Questionnaire. Example questions include: What is your favorite food and drink? What is your idea of misery? [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/" data-text="Mechanical Proust: An automated crowd-written blog" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/"></g:plusone></div></div><div id="attachment_1434" class="wp-caption alignnone" style="width: 225px"><a rel="attachment wp-att-1434" href="http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/marcel_proust_1900/"><img class="size-full wp-image-1434" src="http://blog.crowdflower.com/wp-content/uploads/2010/09/Marcel_Proust_1900.jpg" alt="Marcel Proust" width="215" height="321" /></a><p class="wp-caption-text">Marcel Proust</p></div>
<p>A few months ago, I started a <a href="http://mechanicalproust.blogspot.com/" target="_blank&quot;">blog</a> written by Mechanical Turk workers. Eight times a day, on the hour, a script posts a personal question on MTurk. The questions are randomly selected from a subset of the <a href="http://en.wikipedia.org/wiki/Proust_Questionnaire">Proust Questionnaire</a>. Example questions include:</p>
<ol>
<li>What is your favorite food and drink?</li>
<li>What is your idea of misery?</li>
<li>Which natural talent would you most like to be gifted with?</li>
</ol>
<p><span id="more-1100"></span></p>
<p>The first worker to submit his or her answer automatically has their response anonymously posted on the blog, which I call &#8220;Mechanical Proust.&#8221; Workers are told that their responses will made public, but remain anonymous. I make no attempt to check responses before they are posted, nor do I check whether a worker has submitted in the past. (I enabled Google Ads to see if I could make the blog financially self-sustaining. No luck here so far.)</p>
<p><a rel="attachment wp-att-1154" href="http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/wiseass/"><img class="alignleft size-full wp-image-1154" src="http://blog.crowdflower.com/wp-content/uploads/2010/08/wiseass.png" alt="" width="217" height="134" /></a></p>
<p><span style="font-size: 13.3333px">What have I learned? Most of the responses are unsurprising but they are occasionally poignant or insightful. People are proud of their children. They regret dropping out of school. They want to live in Paris. They fear dying and being alone. They like chicken dishes, etc. Lots of workers are funny (see screenshot at left). </span></p>
<p><span style="font-size: 13.3333px"><br />
</span></p>
<h3>Technical Notes:</h3>
<p>A Python script posts the questions (using <a href="http://code.google.com/p/boto/">boto</a>), and <a href="http://en.wikipedia.org/wiki/Cron">cron</a> schedules them. The questions are presented as external HITs, with the &#8220;submit&#8221; button launching another Python script that posts the text response to the blog, using Google&#8217;s <a href="http://code.google.com/apis/blogger/docs/1.0/developers_guide_python.html">API</a>. If you want the code or want help making something similar, let me know.</p>
<h3>Next steps (taken by someone else):</h3>
<p>I&#8217;d like to see someone make a new version of this blog where blog readers could submit questions, which would go into a queue, with newer questions going at the bottom. Each new question would have up/down Reddit-type buttons, which could move questions up and down the stack. Questions could then be posted on MTurk LIFO-style.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/09/mechanical-proust-an-automated-crowd-written-blog/feed/</wfw:commentRss>
		<slash:comments>11</slash:comments>
		</item>
		<item>
		<title>Arcade Fire releases crowd-built and crowd-curated art project</title>
		<link>http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/</link>
		<comments>http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/#comments</comments>
		<pubDate>Tue, 31 Aug 2010 21:31:31 +0000</pubDate>
		<dc:creator>Joseph Childress and Josh Eveleth</dc:creator>
				<category><![CDATA[Art]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Arcade Fire]]></category>
		<category><![CDATA[art]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[music]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=1301</guid>
		<description><![CDATA[When you were a kid, did you ever want to be in a famous band? Now you can be part of the next best thing &#8230; and you can even tell your younger self about it. Arcade Fire has collaborated with Google and Chris Milk to set up a crowdsourced art project that anyone with [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/" data-text="Arcade Fire releases crowd-built and crowd-curated art project" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/"></g:plusone></div></div><p>When you were a kid, did you ever want to be in a famous band? Now you can be part of the next best thing &#8230; and you can even tell your younger self about it.</p>
<p><a href="http://www.arcadefire.com/">Arcade Fire</a> has collaborated with <a href="http://www.google.com/">Google</a> and <a href="http://www.chrismilk.com/">Chris Milk</a> to set up a crowdsourced art project that anyone with a computer can contribute to and even curate, from anywhere in the world. </p>
<p><span id="more-1301"></span></p>
<p>Here&#8217;s how it works:</p>
<p>By visiting &#8220;<a href="http://thewildernessdowntown.com">The Wilderness Downtown</a>&#8221; (best viewed in <a href="http://www.google.com/chrome">Google Chrome</a>) and entering your childhood address, you can travel down memory lane through the streets you grew up in, accompanied by Arcade Fire&#8217;s &#8220;We Used To Wait.&#8221; </p>
<p>You can then send your childhood self a note, a drawing, or other art, which gets incorporated into your personalized art project. If you like what you created, you can submit it to Arcade Fire, where it may end up as part of their concert tour.</p>
<p><a href="http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/arcadefire_postcard-2/" rel="attachment wp-att-1305"><img src="http://blog.crowdflower.com/wp-content/uploads/2010/08/arcadefire_postcard1.png" alt="arcade fire postcard" title="arcade fire postcard" width="385" height="260" class="aligncenter size-full wp-image-1305" /></a></p>
<p>Every submission to this crowdsourced art project will be curated in real time by CrowdFlower, whose workforce sifts through the messages and identifies stellar &#8212; or offensive &#8212; content.</p>
<p>Prior to crowdsourcing, a global art project like this (not to mention real-time curation and content moderation by human beings) would have been unthinkable. It&#8217;s yet another glimpse of the future you can send to the childhood you.</p>
<p>Please visit <a href="http://thewildernessdowntown.com/">http://thewildernessdowntown.com/</a> to view and contribute to the project. And if you want to help curate, visit <a href="http://crowdflower.com/judgments/mob/20094">http://crowdflower.com/judgments/mob/20094</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/08/arcade-fire-releases-crowd-built-and-crowd-curated-art-project/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>For love or for money? A list experiment on the motivations behind crowdsourcing work</title>
		<link>http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/</link>
		<comments>http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/#comments</comments>
		<pubDate>Thu, 05 Aug 2010 15:00:11 +0000</pubDate>
		<dc:creator>Aaron Shaw</dc:creator>
				<category><![CDATA[Economics]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Motivation]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[Judd Antin]]></category>
		<category><![CDATA[list experiment]]></category>
		<category><![CDATA[Mturk]]></category>
		<category><![CDATA[research]]></category>
		<category><![CDATA[social desirability]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=931</guid>
		<description><![CDATA[What motivates crowdsourcing workers to do what they do? According to some surveys, many of the workers say they&#8217;re just in it for the money. However, my friend Judd Antin and I recently ran what&#8217;s called a &#8220;list experiment&#8221; — an awesome twist on a traditional survey — and we found that the reality is [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/" data-text="For love or for money? A list experiment on the motivations behind crowdsourcing work" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/"></g:plusone></div></div><div id="attachment_933" class="wp-caption aligncenter" style="width: 501px"><a href="http://www.flickr.com/photos/mikelewis/2287255370"><img class="size-full wp-image-933 " title="Motivation" src="http://blog.crowdflower.com/wp-content/uploads/2010/07/Motivation_img.jpg" alt="" width="491" height="392" /></a><p class="wp-caption-text">Motivation in the workplace. Created by user: pescatello on flickr and licensed cc-by 2.0</p></div>
<div class="mceTemp" style="text-align: left;">What motivates crowdsourcing workers to do what they do? According to some surveys, many of the workers <em>say</em> they&#8217;re just in it for the money. However, my friend <a href="http://www.technotaste.com/" target="new">Judd Antin</a> and I recently ran what&#8217;s called a &#8220;list experiment&#8221; — an awesome twist on a traditional survey — and we found that the reality is much more complex.</div>
<p><span id="more-931"></span></p>
<p>A few weeks ago, I was talking about the motivations of crowdsourcing workers with Judd, who has already done <a href="http://technotaste.com/research" target="new">a ton of great work</a> looking at motivations for participation across a wide range of online environments. He is a recent Ph.D. from the <a href="http://ischool.berkeley.edu">UC Berkeley School of Information</a> and just joined <a href="http://research.yahoo.com/Judd_Antin" target="new">Yahoo! Research</a> as a social psychologist and research scientist in the Internet Experiences Group, so it was no surprise that he had a great idea about how to design an experiment to better understand crowdsourcing.</p>
<p>The most straightforward way to ask crowdsourcing workers why they do what they do is with a survey (e.g., <a href="http://pages.stern.nyu.edu/~panos/" target="new">Panos Ipeirotis&#8217;</a> fascinating <a href="http://behind-the-enemy-lines.blogspot.com/2010/03/new-demographics-of-mechanical-turk.html" target="new">recent informal survey</a> of MTurk workers.) However, you also might recall from <a href="http://blog.crowdflower.com/2009/12/ask-a-stupid-question/" target="new">one</a> or <a href="http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/" target="new">two</a> of my previous posts that I tend not to take survey results at face value.</p>
<p>Judd&#8217;s “list experiment&#8221; presents the subjects of a study with a list of several motivations and asks them to provide a count of the number of items in the list they agree with (rather than posing yes/no questions or checkboxes).</p>
<p>Here&#8217;s what that looked like once Judd had it set up in Crowdflower:</p>
<div id="attachment_935" class="wp-caption aligncenter" style="width: 884px"><a rel="attachment wp-att-935" href="http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/list_exper_screenshot/"><img class="size-full wp-image-935 " title="List_exper_screenshot" src="http://blog.crowdflower.com/wp-content/uploads/2010/07/List_exper_screenshot.png" alt="" width="874" height="262" /></a><p class="wp-caption-text">A screenshot from one version of our list experiment</p></div>
<p>We presented experimental treatment groups with four other permutations of the same list — each one missing one of the items — and aggregated the results across every group. This allowed us to estimate the proportion of respondents choosing each item in the list.</p>
<p>The advantage of the list experiment over the traditional survey format is that it doesn&#8217;t require anybody to explicitly say, &#8220;I crowdsource because it gives me a sense of purpose.” Indeed, it perfectly preserves the anonymity of individual user preferences, since the results that we generate are estimates based on summaries of behavior across the different treatment groups. The questions are less obtrusive and there&#8217;s no pressure to hide your true sentiments or conform to the expectations of others. List experiments are thus amazing tools to examine preferences that may be controversial or otherwise influenced by social pressures in some way.</p>
<p>Judd and I designed a pilot experiment with the list above and administered it to MTurk workers through Crowdflower. For the sake of comparison, we also included a control condition that asked Turkers the same questions in traditional, agreement-style survey form. To simplify things, we limited the responses to US workers only.</p>
<p>Comparing the results from the survey condition and the list experiment revealed some mind-blowing differences:</p>
<p><a rel="attachment wp-att-936" href="http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/list_exper-comparisonresults/"><img class="aligncenter size-full wp-image-936" title="List_exper-ComparisonResults" src="http://blog.crowdflower.com/wp-content/uploads/2010/07/List_exper-ComparisonResults.png" alt="" width="672" height="672" /></a></p>
<p>Note the discrepancy between some of the paired bars. Whereas 97% of the Turkers in the control group agreed with the statement &#8220;I am motivated to do HITs on Mechanical Turk to make extra money,&#8221; just 60% of the Turkers in the list experiment condition expressed the same preference.</p>
<p>Similarly, check out the difference between the agreement-style questions and list experiment results in the &#8220;for fun&#8221; category. Again, agreement statements elicit over-reporting when compared with the list experiment (although this time to a less extreme degree).</p>
<p>Our preliminary conclusions from this pilot study? The ideas of crowdsourcing for money and crowdsourcing for fun sound better than they actually are.</p>
<p>Another, slightly more science-y way to put this is that the workers in our study over-report the extent to which they are motivated by money and fun in response to agreement statements versus a list experiment, suggesting that they perceive these two factors to be socially desirable.</p>
<p>Understanding the cause of this <a href="http://en.wikipedia.org/wiki/Social_desirability_bias" target="new">social desirability bias</a> as well as its implications for crowdsourcing across different environments will require further research. In other contexts, social desirability bias (a.k.a. <a href="http://www.fivethirtyeight.com/2010/07/broadus-effect-social-desirability-bias.html" target="new">&#8220;the Broadus effect&#8221;</a>, if you read the amazing Nate Silver) has played a role in everything from elections to educational attainment. There&#8217;s no reason to believe it doesn&#8217;t affect the way people work and participate in various online environments as well.</p>
<p>Perhaps most interesting of all, our findings here further complicate the growing debate over how paid crowdsourcing ought to be <a href="http://cyber.law.harvard.edu/events/2010/02/zittrain" target="new">understood</a> and <a href="http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/" target="new">(potentially) regulated</a>. If a substantial proportion of workers aren&#8217;t actually on MTurk for the money, does that support the claim that we should regulate crowdsourcing along the same lines that we regulate other post-industrial sectors?</p>
<p>These are big questions that we should continue to probe through future studies and discussion. In the meantime, Judd and I re-ran our list experiment with a few minor adjustments and a much bigger sample. We&#8217;re in the process of writing up this larger version of the study for a conference submission and will post the full paper here as soon as we can.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/08/for-love-or-for-money-a-list-experiment-on-the-motivations-behind-crowdsourcing-work/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets</title>
		<link>http://blog.crowdflower.com/2010/05/breaking-monotony-with-meaning-motivation-in-crowdsourcing-markets/</link>
		<comments>http://blog.crowdflower.com/2010/05/breaking-monotony-with-meaning-motivation-in-crowdsourcing-markets/#comments</comments>
		<pubDate>Sun, 23 May 2010 22:06:54 +0000</pubDate>
		<dc:creator>Lukas Biewald</dc:creator>
				<category><![CDATA[Economics]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Motivation]]></category>
		<category><![CDATA[motivation]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=531</guid>
		<description><![CDATA[This is a guest post written by my friend Dana Chandler on how the context of a task motivates the person working on it.  He has a longer academic paper on the topic you can find at the bottom of this post.  It once again shows how traditional economic incentives can&#8217;t fully explain workers&#8217; behaviors [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/05/breaking-monotony-with-meaning-motivation-in-crowdsourcing-markets/" data-text="Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/05/breaking-monotony-with-meaning-motivation-in-crowdsourcing-markets/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/05/breaking-monotony-with-meaning-motivation-in-crowdsourcing-markets/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/05/breaking-monotony-with-meaning-motivation-in-crowdsourcing-markets/"></g:plusone></div></div><p>This is a guest post written by my friend Dana Chandler on how the context of a task motivates the person working on it.  He has a longer academic paper on the topic you can find at the bottom of this post.  It once again shows how traditional economic incentives can&#8217;t fully explain workers&#8217; behaviors on Mechanical Turk.</p>
<p><img src="http://assets.doloreslabs.com/blog/dana_chandler.jpg" alt="" /></p>
<p>Imagine for a moment that you were a turker from either the US or India, looking at the above image. You are given the task of clicking on the blue circular objects with red borders. What you see is only a fraction of the full image. Each image has 90 blue objects to identify. If you’re as good as the average worker, you’ll complete your first image in a little over five minutes and you’ll earn 10 cents (for an hourly wage of $1.20).</p>
<p><span id="more-531"></span></p>
<p>After your first image, you can either quit and take your 10 cents, or identify points on another image. Over the next four hours, you’ll have the chance to label as many images you want. But there’s a catch—you’ll only be paid 9 cents for the second image, 8 cents for the third, and so on, all the way down to 2 cents. This will lower the hourly wage even more.</p>
<p><!--more--></p>
<p>Before you even qualify for the task, you&#8217;ll have to spend five minutes watching a training video and passing a quiz. During the video, half of you will be given only basic work instructions on how to identify “objects of interest.” The other half will be given both instructions and cues of meaning: recognition for your contribution and an explanation of your task&#8217;s purpose<sup>1</sup>. The reason given here? To help researchers identify cancerous tumor cells.</p>
<p>We posted these HITs on MTurk in January, 2010. Almost 300 people from the U.S. and India accepted the task, becoming unknowing participants in our experiment examining MTurk worker motivation. It is commonly believed (and other researchers have verified with demographic surveys) that Indian workers are more motivated by pecuniary concerns and that US turkers are primarily doing tasks for leisure or other non-pecuniary motives. Is this true?</p>
<p>In both countries, half of the turkers in the experiment were randomly assigned to label nondescript &#8220;objects of interest&#8221; without being given any context or greater purpose &#8212; they were our zero-context group. The other half, our meaningful group, were told they were helping researchers identify cancerous tumor cells. Which group of turkers do you think worked harder? You might be surprised.</p>
<p>Therefore, our experiment compared two groups with and without a clear wage motivation, to see if workers behave differently responded to meaningfulness in their tasks.</p>
<p><strong>Results</strong></p>
<p>We measured three metrics: &#8220;showing up&#8221;, the quantity of work, and the quality of that work. The first two metrics are straightforward. Showing up meant that you sat through our training video, passed our qualification test and helped label at least one image. Quantity of work was simply the number of images labeled.</p>
<p>We repeatedly told both groups of turkers that they needed to click on all points and as closely as possible to each point. Work quality was determined by the fraction of cells that a person clicked on (the recall) and the average distance between the “true center” of each cell and where the user clicked (the centrality).</p>
<p>Our most interesting finding was the extent to which a meaningful task (and giving recognition) motivated US workers, but not Indian workers, to complete a task. As any requester knows, attrition on MTurk is a real problem. We found that adding cues of meaning could motivate turkers to undergo training and label at least one image. In the US, adding cues-of-meaning raised the fraction of turkers who completed our task from, 92% of people who sat through our training video, took our quiz, and labeled an image showed up. This figure compares to only 83% of zero-context group (see figure which also has standard errors). In India, there was no difference between the groups and both groups had a 66% completion rate (attrition being higher due to possible language barriers, slow connection speeds, hardware issues, etc.).</p>
<p>However, once a person did some work, both treatment and control groups did a similar quantity of work: The cues-of-meaning group labeled 6.0 images and the zero-context group labeled 5.7 images. This difference was not statistically significant, so it suggests that once you get turkers to work on a task, they are motivated to label just as many images irrespective of the task’s meaningfulness. Notably, of the people who worked, Indians worked longer and labeled an average of 7.3 images vs. 5.2 in the US.</p>
<p>Surprisingly, all workers did an equally good job identifying points whether they had zero-context or whether they thought they were identifying tumor cells. The quality as measured by the fraction of points identified (the recall) or the average pixel distance (the centrality) was statistically insignificant irrespective of the task&#8217;s meaningfulness.</p>
<p>This finding has important implications for those who employ labor in crowdsourcing markets. Companies and intermediaries should develop an understanding of what motivates the people who work on tasks. Employers must think beyond monetary incentives and consider how they can reward workers through non-monetary incentives such as by changing how workers perceive their task. Alienated workers are less likely to do work if they don&#8217;t know the context of the work they are doing and employers may find they can get more work done for the same wages simply by telling turkers why they are working.</p>
<p><img src="http://assets.doloreslabs.com/blog/dana_chandler2.jpg" alt="" /></p>
<p>For more details of this study, please see our full academic paper at: </span><a href="http://danachandler.com/research" target="_blank"><span style="text-decoration: underline;">http://danachandler.com/index.php/research</span></a>. We welcome any comments and feedback.</p>
<p><span style="text-decoration: underline;">About the authors:<br />
</span>Dana Chandler is a researcher at the University of Chicago’s Becker Center where he works with Steven Levitt, author of Freakonomics. He previously worked as a management consultant at the Boston Consulting Group and at Aureos Capitol, a Colombian private equity company. He will begin his Ph.D. at MIT in the Fall.  Dana’s research interests include digital labor markets, development economics, and randomized experiments in companies. email: dchandler {at} uchicago {dot} edu</p>
<p>Adam Kapelner is currently earning his Ph.D. in Statistics at Wharton. Adam is the founder of  <a href="http://dictionarysquared.com" target="_blank">dictionarysquared.com</a> and the inventor of its vocabulary-learning  technology. While working as an undergraduate researcher at Stanford University, he helped engineer the open-source software, <a href="http://gemident.com" target="_blank">www.gemIdent.com</a>, that enables researchers worldwide to locate cells in microscopic images. GemIdent was recently extended to make use of MTurk for outsourcing of medical image identification. The extension, called <a href="http://distributeeyes.com" target="_blank">www.distributeeyes.com</a>, was  adapted to serve as the platform for this experiment. email: kapelner  {at} wharton {dot} upenn {dot} edu</p>
<p><span style="text-decoration: underline;">Acknowledgments:</span> We thank Professor Susan Holmes of Stanford University for allowing us to adopt DistributeEyes (funded under NIH grant #R01GM086884-02) for use in this study. We would also like to thank Panos Ipeirotis for kindly providing us with demographic and market data that we cite in our study. Lawrence Brown, Patrick DeJarnette, John Horton, Emir Kamenica, Steven Levitt, Susanne Neckermann, Jesse Shapiro, Jorg Spenkuch, Jan Stoop, Chad Syverson, Mike Thomas, Abraham Wyner, and seminar participants at the University of Chicago provided especially helpful comments. </span></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/05/breaking-monotony-with-meaning-motivation-in-crowdsourcing-markets/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Amazon Mechanical Turk Survey</title>
		<link>http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/</link>
		<comments>http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/#comments</comments>
		<pubDate>Wed, 12 May 2010 23:32:07 +0000</pubDate>
		<dc:creator>John Le</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[mechanical turk]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=267</guid>
		<description><![CDATA[The distributed distributed work meetup was this past Monday, and it would be an injustice to not have a blog post on the workers who make distributed work possible. Over the weekend we decided we would rerun and reexamine Panos Ipeirotis&#8217;s survey of turkers. Panos, by the way, has a great blog on crowdsourcing, Amazon [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/" data-text="Amazon Mechanical Turk Survey" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/"></g:plusone></div></div><p>The distributed distributed work meetup was this past Monday, and it would be an injustice to not have a blog post on the workers who make distributed work possible.  Over the weekend we decided we would rerun and reexamine <a href="http://behind-the-enemy-lines.blogspot.com/2010/03/new-demographics-of-mechanical-turk.html">Panos Ipeirotis&#8217;s survey of turkers</a>.  Panos, by the way, has a great <a href="http://behind-the-enemy-lines.blogspot.com">blog on crowdsourcing, Amazon Mechanical Turk, and other interesting topics</a>.  I highly recommend reading it for many Turk related experiments and studies.</p>
<p>We used mostly the same questions as Panos&#8217;s survey, which asked for:</p>
<ul>
<li>the Turker&#8217;s age (year of birth)</li>
<li>gender</li>
<li>educational level</li>
<li>income level</li>
<li>marital status</li>
<li>questions about their engagement on Turk</li>
<li>how often they Turk</li>
<li>income from Turk</li>
<li>why they Turk</li>
</ul>
<p><span id="more-267"></span></p>
<p>In contrast to Panos&#8217;s original survey which was run over a 3 week period, we ran this survey over the weekend in a 24 hour period.  Due to this abbreviated weekend running, there will most likely be greater confounding factors as well as stronger selection bias, i.e. groups who work on weekends as opposed to during the week.  To help mitigate the timing of the experiment and provide surveys close to uniformly over the 24 hours, we setup a script to release only 50 surveys an hour.  Responses generally came at a steady pace and Turkers available at each hour were represented.</p>
<p>Since we&#8217;ve examined <a href="http://blog.crowdflower.com/2010/02/why-people-participate-on-mechanical-turk-now-as-a-mosaic-plot/">Turker motivation</a> before, albeit hardly rigorously, I will mostly focus on the rise of India as well other location specific considerations in this post.</p>
<p>In comparison to Panos&#8217;s survey the greatest difference in the results was in the distribution of respondents&#8217; location. We found that India made up 46.85% of our respondents, while the US made up 42.7%.  In contrast, Panos&#8217;s survey (run over a 3 week period in February) had <a href="http://behind-the-enemy-lines.blogspot.com/2010/03/new-demographics-of-mechanical-turk.html">46.8% of respondents from the US, and 34.0% were from India</a>.  To determine if this difference was due to self reporting error, we checked self reported location vs geocoded IP location and found that they matched almost exactly.  There were 31 differences out of 1016 survey responses, but these differences could potentially be attributed to ambiguity in the question (&#8220;Where are you from?&#8221;).  Ultimately these differences between self report location and our geocoded IP location were not statistically significant.  This is an encouraging result, suggesting Turkers are overwhelmingly honest when answering survey questions about where they are from.</p>
<p><a rel="attachment wp-att-282" href="http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/survey_responses/"><img class="alignnone size-medium wp-image-282" title="survey_responses" src="http://blog.crowdflower.com/wp-content/uploads/2010/05/survey_responses.png" alt="" /></a></p>
<p>Viewing the responses over time suggests that Turkers work during non-sleeping hours.  The graph below shows this pattern.  If we were to repeat this experiment we&#8217;d want to run this task over a week or two instead of over 24 hours.  Because of this abbreviated time period the pattern is not as evident towards the end of the job where work rates declined across the board.</p>
<p><a rel="attachment wp-att-297" href="http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/survey_responses_by_time/"><img class="alignnone size-medium wp-image-297" title="survey_responses_by_time" src="http://blog.crowdflower.com/wp-content/uploads/2010/05/survey_responses_by_time.png" alt="" /></a></p>
<p>To better test the hypothesis that Turkers generally work during non-sleeping hours of their respective countries, below is a graph showing the distribution of work done for CrowdFlower on Mechanical Turk each hour of a day by continent (specifically Asia, Europe, and the &#8220;Americas&#8221;) over the course of 6 months.  In this 6 months we collected approximately 9 million judgments.  In the graph this means if Asia has a point at 6%, 4 AM GMT then 6% of judgments made in Asia came around 4AM GMT.</p>
<p><a rel="attachment wp-att-312" href="http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/percent_judgments_per_country_each_hour/"><img class="alignnone size-full wp-image-312" title="percent_judgments_per_country_each_hour_1" src="http://blog.crowdflower.com/wp-content/uploads/2010/05/percent_judgments_per_country_each_hour_1.png" alt="" /></a></p>
<p>For each continent the peaks roughly correspond to daytime while the valleys to nighttime, which is what we&#8217;d expect.  Asia&#8217;s peak seems to be more of a plateau, and this is likely due to in part to the number of timezones Asia encompasses.  In my post about <a href="http://blog.crowdflower.com/2010/04/task-localization/">task localization</a>, we saw (what is intuitively obvious) that workers&#8217; locales are an important factor in assessing quality, especially for language specific tasks.  On Mechanical Turk, to hit the right workforce for a language specific task, it is advisable to restrict available hits to certain times to limit the number of responses from countries whose native languages are not applicable.  We have to use this round about method for Mechanical Turk because you cannot restrict the workforce to a set of multiple countries, <a href="http://docs.amazonwebservices.com/AWSMturkAPI/2008-08-02/index.html?ApiReference_QualificationRequirementDataStructureArticle.html">either restrict work to one country or restrict work to all but one country</a>.</p>
<p>We&#8217;ve already noted that India represents a sizable and rapidly growing portion of the Turk workforce.  We are particularly interested in this rate of growth and the future trends in worker locales.  The next graph compares the US to India in terms of monthly volume of judgments completed on our jobs posted to Mechanical Turk in 2010.  The location information comes from geocoded IP adresses.</p>
<p><a rel="attachment wp-att-323" href="http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/proportion_of_work_from_amt/"><img class="alignnone size-full wp-image-323" title="proportion_of_work_from_amt" src="http://blog.crowdflower.com/wp-content/uploads/2010/05/proportion_of_work_from_amt.png" alt="" /></a></p>
<p>The above shows that on Mechanical Turk we&#8217;ve seen an increase in the proportion of our workers who are Indian since December.  This sample was collected over a relatively short period of time, and is definitely something we&#8217;ll want to monitor in the future.  Lastly, I want to emphasize that though this experiment is hardly rigorous and there are many more factors to analyze, Mechanical Turk as well as other vendors of work (Gambit, Samasource, LiveOps, etc.) are continuing to evolve and grow extremely rapidly, and consequently so grows the potential and possibilites for distributed work.  Next we&#8217;ll examine a survey of Gambit, then Samasource.</p>
<p>John</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/05/amazon-mechanical-turk-survey/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Ask A Stupid Question Part 2: Forced Choice vs. Checkboxes</title>
		<link>http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/</link>
		<comments>http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/#comments</comments>
		<pubDate>Mon, 01 Mar 2010 19:05:17 +0000</pubDate>
		<dc:creator>Aaron Shaw</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[data collection]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[methodology]]></category>
		<category><![CDATA[question formats]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/</guid>
		<description><![CDATA[What kinds of questions produce the best results in crowdsourcing tasks and surveys? To answer that question, I bring you another geeked-out blog post in which I pit the multiple choice (or forced choice) question against its bitter arch-rival, the check-all-that-apply (or checkbox) question.Both kinds of formatting can be useful when you want people to [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/" data-text="Ask A Stupid Question Part 2: Forced Choice vs. Checkboxes" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/"></g:plusone></div></div><p>What kinds of questions produce the best results in crowdsourcing tasks and surveys? To answer that question, I bring you another geeked-out blog post in which I pit the multiple choice (or forced choice) question against its bitter arch-rival, the check-all-that-apply (or checkbox) question.Both kinds of formatting can be useful when you want people to identify or categorize something(s) in a list. Check-all-that-apply seems to offer the added bonus of easily fitting an entire list into a single question, and thereby requiring less mental effort from respondents and (presumably) reducing response times. How do the two kinds of questions compare on answer quality, though?In my <a href="http://blog.crowdflower.com/2009/12/ask-a-stupid-question">first post</a> a few weeks ago, I talked about some of the reasons why response scales matter when you&#8217;re designing multiple choice questions for a survey or data collection task. In the comments, <a href="http://blog.crowdflower.com/2009/12/ask-a-stupid-question/#comment-1853">michael</a> raised an interesting point:</p>
<blockquote><p><em>Why use a scale at all? I would make those types of questions always open ended. Anyone who takes the survey has to think about how many hours they spend online anyway. That’s the first step. The second is fitting their estimate in one of the categories. Seems like unnecessary work for the participants.</em></p></blockquote>
<p>You can check out the <a href="http://blog.crowdflower.com/2009/12/ask-a-stupid-question/#comments">rest of the thread</a> to see michael&#8217;s idea in context as well as how other people (including me) replied.The discussion got me thinking more about multiple choice questions and some of the costs and benefits that they entail in comparison to other types of questions. As luck would have it, a few of the other questions that I included in my original experiment can provide additional grist for the mill.</p>
<p><span id="more-189"></span></p>
<p><strong>Question Format Smackdown!</strong></p>
<p><a title="checkboxes" href="http://blog.crowdflower.com/wp-content/uploads/2010/01/checkbox_screenshot.png"><img src="http://blog.crowdflower.com/wp-content/uploads/2010/01/checkbox_screenshot.png" alt="checkboxes" /></a></p>
<p><a title="forced choice" href="http://blog.crowdflower.com/wp-content/uploads/2010/01/forced_choice-screenshot.png"><img class="centered  alignnone" src="http://blog.crowdflower.com/wp-content/uploads/2010/01/forced_choice-screenshot.png" alt="forced choice" /></a></p>
<p>In order to test how each format affects responses, I asked workers in the Crowdlabor pools one (and only one) version of the following:</p>
<p>As you can see, the forced choice version is a little clunky because I had to separate each item as a separate question. Nevertheless, there&#8217;s no substantive difference between the two versions other than the answer choice format, which makes it possible to compare the results.I should explain why I included several <em>extremely popular</em> websites (Google) among the answer options as well as some slightly less well-traveled, but still popular sites (Times of India, New York Times). Basically, this was in order to avoid too many people having visited all the sites or none of the sites. If a lot of responses fell into either extreme, it would have been impossible to estimate the extent to which the two formats affected the outcomes.As with my response scale example, the groups that saw the two versions of the question did not vary widely on potentially confounding demographic covariates such as gender or country of residence.</p>
<p>Here&#8217;s a table showing the number of positive responses per format per site:</p>
<p><a title="sites-visited-table1" href="http://blog.crowdflower.com/wp-content/uploads/2010/01/sites-visited-table1.png"><img class="centered alignnone" src="http://blog.crowdflower.com/wp-content/uploads/2010/01/sites-visited-table1.png" alt="sites-visited-table1" /></a></p>
<p><span style="color: #000000; -webkit-text-decorations-in-effect: none;">And a plot to visualize the variations per site as a percentage of total responses per question format:</span></p>
<p><a href="http://blog.crowdflower.com/wp-content/uploads/2010/03/sites-visited_points2.png"><img class="alignnone size-full wp-image-261" title="visited points" src="http://blog.crowdflower.com/wp-content/uploads/2010/03/sites-visited_points2.png" alt="" width="693" height="462" /></a></p>
<p>With one exception (Orkut), forced choice formatting resulted in more people saying they had visited every single site in the list.</p>
<p><strong>Estimating the Effect</strong></p>
<p>In order to get a precise measurement of the effect of forced choice format vs. checkbox format, I reshape the data into cumulative counts and compare the distributions of total number of sites visited among people who saw the checkbox and forced choice versions respectively. Here&#8217;s the resulting table:</p>
<p><img class="centered aligncenter" src="http://blog.crowdflower.com/wp-content/uploads/2010/01/sites-visited-table2.png" alt="sites-visited-table2" /></p>
<p>A pair of density plots represents the same information in graphical form:</p>
<p><a href="http://blog.crowdflower.com/wp-content/uploads/2010/03/sites-visited_density2.png"><img class="alignnone size-full wp-image-262" title="visited density" src="http://blog.crowdflower.com/wp-content/uploads/2010/03/sites-visited_density2.png" alt="" width="693" height="1040" /></a></p>
<p>On each plot, I&#8217;ve highlighted the minimum, maximum, and mean (sparklines-style). The heavy left-leaning skew of the checkbox curve contrasts nicely with slightly right-leaning shape of the forced choice curve.From both the table and the density plots, it&#8217;s easy to see that the two question formats appear to have caused a substantial difference. The difference in means between the two distributions suggests that a respondent who saw the forced or multiple choice format identified (on average) one <em>additional</em> site they had visited which their peers who saw checkboxes did not identify.</p>
<p><strong>Should I hate Checkboxes?</strong></p>
<p>The demographic profile of the two groups was pretty similar, so the disparity in the results was almost certainly due to the question format. But why does the question format have such a powerful effect?Given the opportunity, checkbox respondents either failed to notice or ignored answer choices when they were not forced to provide a response to each one. As with numerical response scales, this is yet another example of how mental shortcuts can compromise data quality.This time around, a solution is pretty simple. All things being equal, you&#8217;re better off using forced choice formatting when you care about precise results. That said, things are never really equal and there will always be some reason you might want to consider making life faster/simpler for the people answering the questions. For example, if you&#8217;re asking people to choose tags or labels for something, the precision of each response might not matter very much and checkboxes would work just fine.</p>
<p><em>I used R for all the analysis and plots. I created the first plot using Hadley Wickham&#8217; <a href="http://had.co.nz/ggplot2">ggplot2</a> package. Contact me with requests for data or code at  aaron [at] doloreslabs [dot] com and leave your questions, complaints, or suggestions below.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/03/ask-a-stupid-question-part-ii-forced-choice-vs-checkboxes/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

