<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The CrowdFlower Blog &#187; Brendan O&#8217;Connor</title>
	<atom:link href="http://blog.crowdflower.com/author/brendano/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.crowdflower.com</link>
	<description></description>
	<lastBuildDate>Tue, 10 Jan 2012 20:00:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>AMT is fast, cheap, and good for machine learning data</title>
		<link>http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/</link>
		<comments>http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/#comments</comments>
		<pubDate>Tue, 09 Sep 2008 00:10:14 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Wisdom of Small Crowds]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/09/amt-fast-cheap-good-machine-learning/</guid>
		<description><![CDATA[Update 9/19: Final PDF version has been uploaded. See also the comments below for updates &#8212; our released data is already being used by others! We recently teamed up with Rion Snow, Prof. Dan Jurafsky, and Prof. Andrew Ng from the Stanford AI Lab to try using Amazon Mechanical Turk to generate data sets for [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/" data-text="AMT is fast, cheap, and good for machine learning data" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/"></g:plusone></div></div><p><b>Update 9/19:</b> <a href="http://blog.doloreslabs.com/wp-content/uploads/2008/09/amt_emnlp08_final.pdf">Final PDF version</a> has been uploaded.  See also the comments below for updates &#8212; our <a href="http://ai.stanford.edu/~rion/annotations/">released data</a> is already being used by others!</p>
<hr />
We recently teamed up with <a href="http://ai.stanford.edu/~rion/">Rion Snow</a>, <a href="http://www.stanford.edu/~jurafsky/">Prof. Dan Jurafsky</a>, and <a href="http://ai.stanford.edu/~ang/">Prof. Andrew Ng</a>  from the <a href="http://sail.stanford.edu">Stanford AI Lab</a> to try using Amazon Mechanical Turk to generate data sets for Machine Learning research.  Many AI tasks require a large amount of training data, and to build natural language systems, researchers traditionally pay linguistic experts for millions of annotations.  Search engine companies employ hundreds or thousands of annotators for their classification, ranking, and other statistically trained systems, but their data is private and is not available for research.  AMT is a potential tool to create high quality data sets accessible to everyone.</p>
<p>We rigorously tested the quality of AMT responses for several classic human language problems, and found that the quality was the same or better than the expert data that most researchers use.  We wrote a paper, <a href='http://blog.doloreslabs.com/wp-content/uploads/2008/09/amt_emnlp08_final.pdf' title='amt_emnlp08_accepted.pdf'>&#8220;Cheap and Fast &#8212; But is it Good?  Evaluating Non-Expert Annotations for Natural Language Tasks,&#8221;</a> that will be presented in an upcoming conference, <a href="http://conferences.inf.ed.ac.uk/emnlp08/">EMNLP-2008</a>.</p>
<p><b>Our findings:</b></p>
<p><b>1. Turker-generated data is good.</b>  AMT makes it easy to ask many people for judgments, so for several tasks, we looked at accuracy rates for how well the averaged Turker judgments correlate to the expert gold standard.  With more judgments per example, accuracy increases.  For comparison, on each graph the horizontal dotted line indicates the rate at which a single expert agrees with their gold standard.  Enough non-experts can match or often beat experts&#8217; reliability.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/09/k-acc3.png' title='k-acc3.png'><img class='centered' src='http://blog.doloreslabs.com/wp-content/uploads/2008/09/k-acc3.png' alt='k-acc3.png' /></a></p>
<p><b>2. Turker-generated data is cheap and fast.</b>  We can collect thousands of labels per dollar and per hour.  </p>
<p><span id="more-109"></span> </p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/09/costs.png' title='costs.png'><img class='centered' src='http://blog.doloreslabs.com/wp-content/uploads/2008/09/costs.png' alt='costs.png' /></a></p>
<p><b>3. Expert data enhances individual Turker data.</b>  First off, individual workers have differing accuracy rates:</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/09/worker-acc.png' title='worker-acc.png'><img class='centered' src='http://blog.doloreslabs.com/wp-content/uploads/2008/09/worker-acc.png' alt='worker-acc.png' /></a></p>
<p>So we implemented a statistical technique where we test their accuracy on a portion of the experts&#8217; gold standard data, then reweight votes by worker reliability.  This yields higher aggregated accuracy.  (Also see our related <a href="http://blog.doloreslabs.com/2008/06/aggregate-turker-judgments-threshold-calibration/">threshold calibration post</a>.)</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/09/goldcalib.png' title='goldcalib.png'><img class='centered' src='http://blog.doloreslabs.com/wp-content/uploads/2008/09/goldcalib.png' alt='goldcalib.png' /></a></p>
<p><b>4. Turker data enhances NLP systems.</b>  For one of the tasks, predicting the emotions elicited by a newspaper headline, we wrote a simple machine-learned classifier and trained it on the Turker data.  It easily outperforms one trained on expert data.  (There&#8217;s a subtle effect here; see the paper for details.)</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/09/classifier-perf.png' title='classifier-perf.png'><img  class='centered' src='http://blog.doloreslabs.com/wp-content/uploads/2008/09/classifier-perf.png' alt='classifier-perf.png' /></a></p>
<p>We&#8217;ll update this blog post with a link to the final version of the paper in the coming weeks.  Many thanks to our friend <a href="http://ai.stanford.edu/~rion/">Rion</a>, who spearheaded this collaboration.  The current version of the paper is here:</p>
<ul>
<li><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/09/amt_emnlp08_final.pdf'>Rion Snow, Brendan O&#8217;Connor, Daniel Jurafsky, Andrew Y. Ng.  &#8220;Cheap and Fast &#8212; But is it Good?  Evaluating Non-Expert Annotations for Natural Language Tasks.&#8221;  EMNLP-2008.</a></ul>
<p>[ This article is part of a series, <a href="/topics/wisdom/">Wisdom of Small Crowds</a>, on crowdsourcing methodology. ]</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Fleshmap: crowdsourcing sex</title>
		<link>http://blog.crowdflower.com/2008/08/fleshmap-crowdsourcing-sex/</link>
		<comments>http://blog.crowdflower.com/2008/08/fleshmap-crowdsourcing-sex/#comments</comments>
		<pubDate>Wed, 20 Aug 2008 22:05:52 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/08/fleshmap-crowdsourcing-sex/</guid>
		<description><![CDATA[We all know crowds can tell us the weight of an ox, but can they help us in bed? With artists Fernanda Viegas and Martin Wattenberg, we took pictures of the nude human body, and asked people online to rate how much they&#8217;d like to touch, or be touched at, different positions all over the [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/08/fleshmap-crowdsourcing-sex/" data-text="Fleshmap: crowdsourcing sex" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/08/fleshmap-crowdsourcing-sex/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/08/fleshmap-crowdsourcing-sex/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/08/fleshmap-crowdsourcing-sex/"></g:plusone></div></div><p>We all know crowds can tell us the <a href="http://en.wikipedia.org/wiki/The_Wisdom_of_Crowds">weight of an ox</a>, but can they help us in bed?</p>
<p>With artists <a href="http://www.fernandaviegas.com/">Fernanda Viegas</a> and <a href="http://www.bewitched.com/">Martin Wattenberg</a>, we took pictures of the nude human body, and asked people online to rate how much they&#8217;d like to touch, or be touched at, different positions all over the body.  Here is where men most like to be touched, front and back:</p>
<p><a href='http://fleshmap.com/touch/skintoskin.html' title='man_betouched.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/08/man_betouched.png' class='centered' /></a></p>
<p>We got this data by <a href="http://blog.doloreslabs.com/2008/08/survey-crowdsourcing-sex/">posting a survey</a> on the Amazon service, <a href="http://www.mturk.com/">Mechanical Turk</a> &#8212; an online marketplace for simple tasks &#8212; and offered to pay a few cents for a hundred ratings.  In a few days we collected more than 30,000 responses over 700 body positions, from about 280 different people.  The above image is from Fernanda and Martin&#8217;s <a href="http://fleshmap.com/touch/skintoskin.html">Skin to Skin</a> visualization, you can compare silhouette heatmaps of the data.  For example, from the woman&#8217;s back, here&#8217;s a comparison of where women want to be touched, versus where men (and some women) want to touch them.</p>
<p><a href='http://fleshmap.com/touch/skintoskin.html'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/08/female-back2.png' class='centered' /></a></p>
<p>Some patterns jump out &#8212; for example, the back of the neck seems to be a neglected area.  (Take note!)  But otherwise, desire to touch and be touched seem pretty well matched up.</p>
<p>What&#8217;s also interesting about this experiment is that, despite the subjectiveness of the task and large variability in the data, patterns still emerge.  To get some idea of how noisy the data is, here are rating histograms for several of the &#8220;desire to be touched&#8221; locations for women.  (You can explore all body locations with the <a href="http://fleshmap.com/touch/sorting.html">Sorting Out Desire</a> visualization.)</p>
<p><a href='http://fleshmap.com/touch/sorting.html' ><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/08/hists2.png' class='centered' /></a></p>
<p>We had from 20 to 50 responses for most positions.  While there are clear differences between these locations, there&#8217;s enough noise such that a large amount of data is necessary to discern the signal.  This is exactly the same finding we have seen with other types crowdsourcing work.  If we didn&#8217;t have access to such a large and available-on-demand pool of participants, we wouldn&#8217;t have gotten any coherent results at all.</p>
<p>Finally, we collected demographic data from the raters.  10% reported as gay, and 19% as bisexual.  Interestingly, straight men are the least likely to report wanting to be touched.  (Feel free to hypothesize why!)</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/08/picture-24.png' title='picture-24.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/08/picture-24.png' alt='picture-24.png' class='centered' /></a></p>
<p>We also asked for the age of the rater.  What ages have more desire?  Is there truth to the stereotype of a lecherous old man?  Our evidence is mixed.  Below are average ratings when answering &#8220;How good would it feel to touch this area?&#8221;  People in their twenties and thirties give higher ratings, but older men give higher responses than do women &#8212; even considering that men give higher responses overall.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/08/touch-by-age.png' title='touch-by-age.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/08/touch-by-age.png' alt='touch-by-age.png' class='centered' /></a></p>
<p>Many thanks to Fernanda and Martin for approaching us with this fun idea.  This has definitely been the most ludicrous task we&#8217;ve ever worked on :)  Take a look at this &#8212; and other great work &#8212; at their site, <a href="http:///www.fleshmap.com/">Fleshmap.com</a>.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/08/fleshmap-crowdsourcing-sex/feed/</wfw:commentRss>
		<slash:comments>17</slash:comments>
		</item>
		<item>
		<title>Survey: crowdsourcing sex</title>
		<link>http://blog.crowdflower.com/2008/08/survey-crowdsourcing-sex/</link>
		<comments>http://blog.crowdflower.com/2008/08/survey-crowdsourcing-sex/#comments</comments>
		<pubDate>Tue, 12 Aug 2008 16:07:32 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/08/survey-where-do-people-want-to-touch-and-be-touched/</guid>
		<description><![CDATA[On this blog we&#8217;ve used crowdsourcing techniques to study media bias, linguistics of color, social perceptions, information retrieval, etc. But this is thinking small. The ability to quickly and easily collect data from thousands of people on the Web should allow us to study a huge swath of human behavior. Therefore, the next obvious topic [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/08/survey-crowdsourcing-sex/" data-text="Survey: crowdsourcing sex" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/08/survey-crowdsourcing-sex/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/08/survey-crowdsourcing-sex/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/08/survey-crowdsourcing-sex/"></g:plusone></div></div><p><iframe src="http://assets.doloreslabs.com/jobs/fleshmap/splash.html" width="820" height="560" style="border:none"></iframe></p>
<p>On this blog we&#8217;ve used crowdsourcing techniques to study <a href="/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/">media bias</a>, <a href="/topics/colors/">linguistics of color</a>, <a href="/topics/faces/">social perceptions</a>, <a href="/2008/04/search-engine-relevance-an-empirical-test/">information retrieval</a>, etc.</p>
<p>But this is thinking small.  The ability to quickly and easily collect data from thousands of people on the Web should allow us to study a <i>huge</i> swath of human behavior.  Therefore, the next obvious topic is&#8230; human sexuality :-)  Here are two important questions many of us would like to know the answers to:</p>
<ul>
<li>Where do women enjoy being touched?
<li>Where do men enjoy being touched?
</ul>
<p>The Internet <a href="http://answers.yahoo.com/question/index?qid=20071231030144AApBGoe">has</a> <a href="http://www.answerbag.com/q_view/470432">many</a> <a href="http://answers.yahoo.com/question/index?qid=20061114190237AAXpyGx">opinions</a> <a href="http://answers.yahoo.com/question/index?qid=1006042612200">on</a> <a href="http://answers.yahoo.com/question/index?qid=20060712081649AAkkq8S">these</a> <a href="http://www.boingbook.com/journal/2008/1/3/how-to-touch-a-womans-breasts.html">questions</a>.  But wouldn&#8217;t it be nice to have a concise statistical-visual answer?</p>
<p>We&#8217;re on the job!  With visualization artists <a href="http://www.fernandaviegas.com/">Fernanda Viegas</a> and <a href="http://www.bewitched.com/">Martin Wattenberg</a>, we’ve pinpointed thousands of locations on a photograph of a naked body and asked respondents how much they like to touch, or would like to be touched, at that spot. (Fernanda and Martin are working on an art project on this theme, to be unveiled next week).</p>
<p>Are there places that men want to touch but where women do not want to be touched?  Vice versa?  Soon the world will know.</p>
<p>What do you think the results will look like?  We will be publishing the results next Monday, so follow this blog for updates!</p>
<p>-<a href="http://anyall.org">Brendan</a></p>
<p><img style="display:none" src="http://assets.doloreslabs.com/jobs/fleshmap/splash.jpg"></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/08/survey-crowdsourcing-sex/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wisdom of small crowds, part 3: another worker visualization</title>
		<link>http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/</link>
		<comments>http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/#comments</comments>
		<pubDate>Thu, 07 Aug 2008 23:55:32 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Wisdom of Small Crowds]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/</guid>
		<description><![CDATA[This is a follow-up to the previous post on individual workloads and rates. Here are the submission times and durations for every worker on the same graph. Each worker is one horizontal line. An assignment is started at a dot, and its duration is for the line segment extending to the right. The particular data [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/" data-text="Wisdom of small crowds, part 3: another worker visualization" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/"></g:plusone></div></div><p>This is a follow-up to the previous post on <a href="/?p=73">individual workloads and rates</a>.  Here are the submission times and durations for every worker on the same graph.  Each worker is one horizontal line.  An assignment is started at a dot, and its duration is for the line segment extending to the right.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/08/submission-durations-wide1.png' title='submission-durations-wide1.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/08/submission-durations-wide1.png' alt='submission-durations-wide1.png' /></a></p>
<p>The particular data set isn&#8217;t the same as in the previous post, but was for a similar task and exhibits a similar structure.  Worker rates substantially differ.  Some workers do a few HIT&#8217;s, but others work on as many as are available.  Some work rapidly with breaks (19, 36).  Some assignment durations are as long as 5-10 minutes (13, 37).  Some work very intermittently (29).</p>
<p>This view makes the parallelism of AMT apparent.  At any vertical timeslice you can see how many workers are active at that time.  The entire job ends on the right side when the available HIT&#8217;s run out.</p>
<p>[ This article is part of a series, <a href="/topics/wisdom">Wisdom of Small Crowds</a>, on crowdsourcing methodology. ]</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Wisdom of small crowds, part 2: individual workloads and rates</title>
		<link>http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/</link>
		<comments>http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/#comments</comments>
		<pubDate>Tue, 05 Aug 2008 18:00:10 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Wisdom of Small Crowds]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/</guid>
		<description><![CDATA[[ Update: see also another visualization of this. ] AMT&#8217;s great new interface makes it easy to download completion times for individual worker assignments. Therefore, it&#8217;s easy to visualize :) For a recent small job we did (250 HIT&#8217;s, 5 workers per HIT), here&#8217;s a graph of completion times per worker, over the entire 15 [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/" data-text="Wisdom of small crowds, part 2: individual workloads and rates" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/"></g:plusone></div></div><p>[ Update: see also <a href="http://blog.doloreslabs.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/">another visualization of this</a>. ]</p>
<p>AMT&#8217;s great <a href="http://venturebeat.com/2008/07/30/outsourcing-gets-easier-with-new-features-on-amazons-mechanical-turk/">new interface</a> makes it easy to download completion times for individual worker assignments.  Therefore, it&#8217;s easy to visualize :)  For a recent small job we did (250 HIT&#8217;s, 5 workers per HIT), here&#8217;s a graph of completion times per worker, over the entire 15 minute duration of the job.  Each assignment is a single point, graphed by when it was done versus how long it took.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/08/completion-times.png' title='completion-times.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/08/completion-times.png' alt='completion-times.png' /></a></p>
<p><span id="more-73"></span></p>
<p>Most workers come in, do a string of HITs, then leave.  Some do all of the HITs available.  There seem to be two distinct work modes.  Most people do lots of HITs in rapid succession.  But several of them work slowly (e.g. workers 8, 13, and 18, with more horizontal space between points), either spending more time on each assignment, or perhaps leaving then coming back.</p>
<p>This graph also illustrates a common trend we see: lots of the work gets done by &#8220;tail&#8221; workers; that is, people who do only a small amount of work.  This is where crowdsourcing really shines &#8212; it&#8217;s OK if individuals give you a small number of judgments, because you can aggregate across many of them.  The total &#8220;prolificness&#8221; of each worker was lightly skewed on this task &#8212; 50% of the work was done by 8 out of 37 workers.  Usually, we see a split more like 50% of the work being done by the top 10% of workers; this one had a more even distribution probably because it was small, so enthusiastic workers didn&#8217;t have an opportunity to do a very large number of HITs.</p>
<p>Another phenomenon: some workers have a downward trend in work time.  This could be learning to do the task faster, or it could be increased carelessness.  A quality analysis (along the lines of <a href="/?p=61">part 1</a>) can flesh this out.</p>
<p>The task was a fairly subjective image classification problem where positives are rare; purple points are &#8220;YES&#8221; responses.  Responding &#8220;YES&#8221; takes more time (presumably, more cognitive load) &#8212; average work times for YES vs NO responses are 22 vs 12 seconds, significant at t-test p<.001.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a></p>
<p>p.s. The graph is due to the R&#8217;s awesome <a href="http://www.statmethods.net/advgraphs/trellis.html">lattice package</a>.  It&#8217;s incredibly easy to use: not much more than <code>xyplot(WorkTimeInSeconds ~ SubmitTime | WorkerId)</code>.</p>
<p>[ This article is part of a series, <a href="/topics/wisdom/">Wisdom of Small Crowds</a>, which focuses on crowdsourcing methodology for Amazon <a href="http://www.mturk.com/">Mechanical Turk</a>-like systems. ]</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Wisdom of small crowds, part 1: how to aggregate Turker judgments for classification (the threshold calibration trick)</title>
		<link>http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/</link>
		<comments>http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/#comments</comments>
		<pubDate>Mon, 16 Jun 2008 00:13:03 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Wisdom of Small Crowds]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/06/turkers-as-an-ensemble-classifier-part-1-threshold-calibration/</guid>
		<description><![CDATA[[ This article is part of a series, Wisdom of Small Crowds, which focuses on crowdsourcing methodology for Amazon Mechanical Turk-like systems. ] We use Turkers to classify all sorts of data, by having several workers render judgments on each item. But what should we do when they disagree? Like any other human behavior, Turker [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/" data-text="Wisdom of small crowds, part 1: how to aggregate Turker judgments for classification (the threshold calibration trick)" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/"></g:plusone></div></div><p>[ This article is part of a series, <a href="/topics/wisdom/">Wisdom of Small Crowds</a>, which focuses on crowdsourcing methodology for Amazon <a href="http://www.mturk.com/">Mechanical Turk</a>-like systems. ]</p>
<p>We use Turkers to <a href="http://doloreslabs.com/services.html">classify all sorts of data</a>, by having several workers render judgments on each item.  But what should we do when they disagree?  Like any other human behavior, Turker judgments are noisy: sometimes there are mistakes, and sometimes the task is genuinely difficult or subjective, and there is no &#8220;right&#8221; answer.  Once we have a bunch of Turker judgments, we need to aggregate them &#8212; that is, use some sort of voting mechanism &#8212; to give as accurate a classification as possible.  It turns out that one simple trick, threshold calibration, can substantially improve accuracy, and can be tuned to the specifics of the problem.</p>
<p>Here&#8217;s an example.  A recent client of ours had a de-duping task: given a pair of similar articles, the task was to decide if they were &#8220;about the same topic&#8221; or &#8220;about different topics&#8221;.  This is just a binary classification problem; call these labels &#8220;YES&#8221; and &#8220;NO&#8221;.  To figure out how well Turkers could perform the task, we had our client provide us with a gold standard data set.  That is, for 135 examples, their experts did the task themselves and provided &#8220;gold&#8221; ground truth labels.</p>
<p>We used a very high number of workers per example (about 20).  For all 135 examples in the gold standard, the following graph plots them vertically by their &#8220;Turker confidence in YES&#8221; &#8212; that&#8217;s just the percentage of votes for &#8220;YES&#8221; among the 20 or so judgments for that particular example.  I&#8217;ve also colored each example with the experts&#8217; gold label.  You can see that this simple Turker data provides some statistical separation between the classes.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/06/vertthresh.png'><img class='centered' src='http://blog.doloreslabs.com/wp-content/uploads/2008/06/vertthresh.png' alt='Test set separation by Turker ensemble binary classifier' /></a></p>
<p>This graph also shows how to create a classifier from Turker votes.  We have to choose a confidence threshold for our classifier&#8217;s decision: above the threshold, say &#8220;YES&#8221;, and below say &#8220;NO&#8221;.  Unfortunately, Turkers aren&#8217;t perfect at modeling the experts: anywhere we place the threshold, errors occur.  However, some thresholds are better than others.  The threshold with the best accuracy is at 73% confidence &#8212; that is, a 73% super-majority voting rule &#8212; and it classifies instances correctly 90% of the time.  Furthermore, we can tune for different types of errors.  If we are particularly concerned with avoiding false positive errors, we can set a higher, more conservative threshold; or, if we want to find as many &#8220;YES&#8221; instances as possible, we can set a lower, more liberal threshold.</p>
<p>Here&#8217;s another chart that more carefully details the tradeoffs between true and false positives vs. true and false negatives.  For a particular decision threshold, it shows how it divides up the instances into the confusion matrix&#8217;s 4 categories of correct and incorrect decisions.</p>
<p><img class='centered' src='http://blog.doloreslabs.com/wp-content/uploads/2008/06/confusionbars.png' alt='Classifier performance on gold standard at different thresholds' /></p>
<p>A final note on why threshold calibration is important: For this task, the Turkers were considerably more liberal than the experts at deciding what a &#8220;YES&#8221; example was &#8212; experts marked only 36% of examples as &#8220;YES&#8221;, whereas a simple Turker majority voting rule marks 57% that way.  This is because the experts understood the full implications of the decision, which were substantial &#8212; various entries in their database and website would be merged, and users would be confused if they were exposed to a bad merge.  False positives had a very high cost.  The prompt for Turkers, by contrast, was fairly vague.  (In our experience, we generally find that good task design is a huge factor in getting better Turker accuracy.)  However, since Turker decisions noisily correlate with the experts, moving the decision threshold can help accuracy.  Here&#8217;s the threshold vs. accuracy graph:</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/06/thresh-acc.png' title='thresh-acc.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/06/thresh-acc.png' alt='thresh-acc.png' class='centered' /></a></p>
<p>Statistical analysis of Turker data can substantially improve accuracy performance, even with something as simple as choosing the best decision threshold.  This blog post only scratched the surface; there are a few more useful things to consider.  Stay tuned for Part 2 and hopefully many more!</p>
<p>A few more notes on Turker voting and threshold calibration:</p>
<p><span id="more-61"></span></p>
<p>An interesting question is the upper bound of possible performance on the task.  A good experiment to try is to have two experts independently perform a task, and check their agreement rate.  We should be satisfied if Turkers can match experts as reliably as experts match each other.  That is, for this task, if experts agree no more than 90% of the time, then Turkers perform the task as well as experts.  (We didn&#8217;t have this particular experiment done in this case, but I&#8217;d be very curious to see the results!)  In general, agreement rates can help indicate the difficulty of a task.  If expert agreement rates are low, it can be argued that the task is not very &#8220;real&#8221;.</p>
<p>The terminology of &#8220;true/false positives&#8221;, &#8220;true/false negatives&#8221;, &#8220;precision&#8221; and &#8220;recall&#8221; are all part of a statistics/machine learning mini-field of binary classifier evaluation.  Any statistical classifier that outputs a confidence value or ranking among instances (Naive Bayes, logistic regression, IR ranking, etc.) can be subject to this sort of threshold analysis.  A decent place to read more is the <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic">ROC</a> Wikipedia page.  ROC and Precision-Recall curves have long been used to show thresholding tradeoffs.  I think the above plots make it easier to interpret the basic information, but the more traditional graphs are also useful.  Here they are for this data (provided courtesy of the excellent <a href="http://rocr.bioinf.mpi-sb.mpg.de/">ROCR</a> package):</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/06/roc-pr.png' title='ROC and Precision-Recall curves'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/06/roc-pr.png' alt='ROC and Precision-Recall curves' class='centered' /></a></p>
<p>A nice overview of these topics can be found in <a href="http://nr.com/CS395T/lectures2008/17-ROCPrecisionRecall.pdf">these lecture notes</a> from <a href="http://www.nr.com/whp/">William Press</a>.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>FaceStat scales!</title>
		<link>http://blog.crowdflower.com/2008/06/facestat-scales/</link>
		<comments>http://blog.crowdflower.com/2008/06/facestat-scales/#comments</comments>
		<pubDate>Fri, 06 Jun 2008 22:03:15 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Faces]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/06/facestat-scales/</guid>
		<description><![CDATA[Before last weekend, our FaceStat website was chugging away with a small but loyal userbase: But on Sunday, an insane number of people suddenly decided to flock to our site. Let&#8217;s extend the previous chart by 2 days, then a little bit of y-axis auto-scaling says it all: Turns out the giant spike was due [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/06/facestat-scales/" data-text="FaceStat scales!" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/06/facestat-scales/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/06/facestat-scales/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/06/facestat-scales/"></g:plusone></div></div><p>Before last weekend, our <a href="http://facestat.com">FaceStat</a> website was chugging away with a small but loyal userbase:<br />
<img src='http://blog.doloreslabs.com/wp-content/uploads/2008/06/graph1.gif' class='centered' /></p>
<p>But on Sunday, an insane number of people suddenly decided to flock to our site.  Let&#8217;s extend the previous chart by 2 days, then a little bit of y-axis auto-scaling says it all:<br />
<img src='http://blog.doloreslabs.com/wp-content/uploads/2008/06/graph2.gif' class='centered' /></p>
<p>Turns out the giant spike was due to our being featured via a news article on Yahoo.com&#8217;s front page!</p>
<p>Of course, we had to frantically rearchitect the system and scale it under this deluge of traffic. You can read the blow-by-blow account of our crazy few days on <a href="http://www.lukasbiewald.com/?p=153">Lukas&#8217;s blog, here</a>.</p>
<p>The web startup community seems pretty interested in the mad scaling issues, so I&#8217;ll respond to some of the comments on Lukas&#8217;s blog below:</p>
<p><span id="more-58"></span></p>
<p>Yes, we&#8217;re pretty much using Rails.  We actually use an offshoot called <a href="http://merbivore.com/">Merb</a> &#8212; which is a bit more efficient &#8212; on top of <a href="http://code.macournoyer.com/thin/">Thin</a>.  We find that a Rails-like platform is invaluable for rapidly prototyping a new site, especially since we started FaceStat as a pure experiment with no idea whether people would like it or not, and with a very different feature set in mind compared to what it later became.  And it&#8217;s invaluable that <a href="http://www.vandev.com/">Chris</a> on our team is such a Ruby expert :).</p>
<p>However, the high-level platform really doesn&#8217;t matter compared to overall architecture: how we use the database (postgres), how much we cache (memcached/merb-cache), how we distribute load, how we deploy new systems (xen/slicehost), etc.  It&#8217;s hasn&#8217;t been trivial since FaceStat is write-heavy and performs fairly complex statistical calculations, and various issues remain.  But we are serving many users at nearly 100x our old load, so something must be going right &#8212; at least for now!</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com">Brendan</a></p>
<p>p.s. Thank you, Google Analytics, for the above charts.  Some day when I grow up, I hope I am wise enough to create an equally brilliant data visualization tool.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/06/facestat-scales/feed/</wfw:commentRss>
		<slash:comments>18</slash:comments>
		</item>
		<item>
		<title>Color flowers, networks, photos, and even 3D</title>
		<link>http://blog.crowdflower.com/2008/04/color-flowers-networks-photos-and-even-3d/</link>
		<comments>http://blog.crowdflower.com/2008/04/color-flowers-networks-photos-and-even-3d/#comments</comments>
		<pubDate>Fri, 25 Apr 2008 09:01:41 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Colors]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/2008/04/color-flowers-networks-photos-and-even-3d/</guid>
		<description><![CDATA[Lots of people have been making great new visualizations of our color names data. Here are 4 more that folks have sent us. Chris Harrison, a Ph.D. student at CMU HCI, combined our data with results from his own previous experiment, and created beautiful flower and spiral images. Unlike my and Martin&#8217;s color wheels, hue [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/04/color-flowers-networks-photos-and-even-3d/" data-text="Color flowers, networks, photos, and even 3D" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/04/color-flowers-networks-photos-and-even-3d/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/04/color-flowers-networks-photos-and-even-3d/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/04/color-flowers-networks-photos-and-even-3d/"></g:plusone></div></div><p>Lots of people have been making great new visualizations of our <a href="/2008/03/where-does-blue-end-and-red-begin/">color names data</a>.  Here are 4 more that folks have sent us.</p>
<p><a href="http://www.chrisharrison.net/">Chris Harrison</a>, a Ph.D. student at <a href="http://hcii.cmu.edu/">CMU HCI</a>, combined our data with results from his own previous experiment, and created <a href="http://www.chrisharrison.net/projects/colorflower/index.html">beautiful flower and spiral images</a>.  Unlike <a href="/2008/03/where-does-blue-end-and-red-begin/">my</a> and <a href="/2008/03/awesome-cloud-view-of-our-color-names-data/">Martin&#8217;s</a> color wheels, hue is scaled along the radius, creating a striking effect.</p>
<p><a href="http://www.chrisharrison.net/projects/colorflower/index.html"><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/04/flower2medium.jpg' class="centered" /></a></p>
<p>Next: <a href="http://arbitrarian.wordpress.com/2008/03/31/plotting-the-colors/">network and cluster diagrams</a> from <a href="http://www.duke.edu/~dbs9/">David Sparks</a>, Ph.D. student at Duke PoliSci.  The layout below was computed from a similarity metric on color names.  (I&#8217;m unclear whether it&#8217;s on labels or colors.)  Size of node corresponds to the label&#8217;s frequency.</p>
<p><a href='http://arbitrarian.wordpress.com/2008/03/31/plotting-the-colors/' title='network.jpg'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/04/network.jpg' alt='network.jpg' class="centered" /></a></p>
<p>All of the visualizations so far have had to map three-dimensional color points into a 2D space.  But <a href="http://www.neoformix.com/Projects/portfolio/index.html">Jeff Clark</a> in Toronto went ahead and wrote a <a href="http://www.neoformix.com/2008/ColorNamesExplorer.html">3D explorer &#8212; you fly around a space of the color labels</a>.  He built it with the excellent <a href="http://processing.org/">Processing</a> framework.</p>
<p><a href='http://www.neoformix.com/2008/ColorNamesExplorer.html' title='3d'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/04/3d.png' class="centered" /></a></p>
<p>Finally, yet another tack: instead of creating a picture with all the labels, why not fit labels to a picture?  Kristina Durivage, Chris Burg, and Scott Olson did that for an undergrad CS project at <a href="http://www.winona.edu/">Winona State University</a>.  Their software takes any image and overlays color names.  An example:</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/04/example2.jpg'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/04/example2.jpg' class="centered" border="0" /></a></p>
<p>Four new visualizations in a month &#8212; whew!</p>
<p>To look at all our color posts, check out <a href="http://blog.doloreslabs.com/topics/colors/">blog.doloreslabs.com/topics/colors</a>.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/04/color-flowers-networks-photos-and-even-3d/feed/</wfw:commentRss>
		<slash:comments>8</slash:comments>
		</item>
		<item>
		<title>Search engine relevance &#8211; an empirical test</title>
		<link>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/</link>
		<comments>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/#comments</comments>
		<pubDate>Thu, 03 Apr 2008 23:37:26 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=35</guid>
		<description><![CDATA[Search engines control the information we see and use. Their key component is a ranking algorithm that tries to determine the most relevant web pages for your query. How good are these algorithms? Publicly, there&#8217;s a lot of hype, while privately, all the big engines run proprietary quality evaluation efforts. But there&#8217;s virtually no real [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/" data-text="Search engine relevance &#8211; an empirical test" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/"></g:plusone></div></div><p>Search engines control the information we see and use.  Their key component is a ranking algorithm that tries to determine the most relevant web pages for your query.  How good are these algorithms?  Publicly, there&#8217;s a lot of hype, while privately, all the big engines run proprietary quality evaluation efforts.  But there&#8217;s virtually no real data out there for the rest of us.</p>
<p>Using <a href="http://www.mturk.com/">Mechanical Turk</a>, we can evaluate engine relevance.  We tried an experiment where we took five hundred queries and ran them against the top 4 English language web search engines: <a href="http://www.ask.com/">Ask</a>, <a href="http://www.google.com/">Google</a>, <a href="http://www.live.com/">Live</a>, and <a href="http://search.yahoo.com/">Yahoo</a>.  The queries were a random sample from a real-world set of search queries.  We had annotators rate the relevance of the top five results for each engine.  Our results:</p>
<p><a href="http://blog.doloreslabs.com/wp-content/uploads/2008/04/engine_comparison1.png" title="engine_comparison1.png"><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/04/engine_comparison1.png" alt="engine_comparison1.png" class="centered" /></a></p>
<p>Ask clearly performed the worst.  The other three engines were in a statistical tie.  Their ordering was Google, Yahoo, then Live, but the differences were miniscule: the top 3 engines all answer about 80% of queries effectively.</p>
<h3>What do these results mean?</h3>
<p>People often talk about Google as being the most relevant search engine, with the best algorithms and the like.  This study finds little evidence to support that.  Sure, our methods are preliminary and could be improved in any number of ways; we can probably shrink those error bars and find more statistical differences.  However, it is the case that for 500 typical queries, a rough but pretty objective measurement of search quality found that Google, Live, and Yahoo all performed about the same.</p>
<p>Note that these results don&#8217;t speak to the entire user experience.  To be able to compare between engines, we extracted only the core web results with their titles, urls, and snippets.  But a search engine also includes much more: the presentation, branding, video and image results, ads, etc.  We only tested the relevance of core web search.</p>
<p>Many more details below.</p>
<p><span id="more-35"></span></p>
<h3>How we&#8217;re measuring search relevance</h3>
<p>Evaluating search engine quality is a tricky task.  Here&#8217;s our first pass on the methodology.</p>
<p>We take a set of queries and run them against several search engines, scraping their web interfaces. We then show the query and results to human raters, asking them how relevant each result is.  It&#8217;s a blind test: they don&#8217;t know which engines the results came from.</p>
<p>Here is an example of what a rater sees:</p>
<p><a href="http://blog.doloreslabs.com/wp-content/uploads/2008/04/result_judgment_example.png" title="result_judgment_example.png"><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/04/result_judgment_example.png" alt="result_judgment_example.png" class="centered" style="padding-top: 5px; padding-bottom: 5px" border="0" /></a></p>
<p>The raters all come from <a href="http://www.mturk.com/">Amazon Mechanical Turk</a>, a distributed workforce.  We submit the above query/result judgment surveys to the AMT service, and pay its users &#8211; <a href="http://turkers.proboards80.com/">&#8220;Turkers&#8221;</a> &#8211; to do the relevance ratings.  (If you want to learn more, try looking at the <a href="http://doloreslabs.com/services.html">Dolores Labs FAQ</a>.)</p>
<p><strong>What queries are being used?</strong> We took a random sample from the AOL query log data set; these are actual queries that real users typed in to a search engine.  The AOL data set is remarkable for being pretty much the only publicly available, real-world data on web search behavior.  Of course, it&#8217;s infamous for very valid privacy issues.  We&#8217;re only using the part of it that doesn&#8217;t involve personal information &#8211; the raw queries, without user identification.  (This <a href="http://www.nytimes.com/2006/08/23/technology/23search.html">NYT article on the issue</a> is interesting.)</p>
<p><strong>What&#8217;s being measured?</strong> For each engine, we count the number of queries that had at least one &#8220;Highly Relevant&#8221; result within the first five results the engine returned.  This is a version of the &#8220;precision at 5&#8243; metric from information retrieval.  There are, of course, <a href="http://en.wikipedia.org/wiki/Information_retrieval#Performance_measures">many other methods</a> to explore.  We wanted a metric that was simple and easy to interpret.</p>
<p><strong>How were raters&#8217; judgments used?</strong> We had three raters per result, and basically took a simple majority vote.  We didn&#8217;t attempt to model individual annotator biases.</p>
<p><strong>Are these judgments trustworthy?</strong> The ratings are certainly noisy.  And sure, the workers have little training and are (somewhat) anonymous to us.  However, the relevance judgment task is fairly subjective and therefore inherently noisy.  Further, it&#8217;s arguably better to use untrained annotators, since this more closely mimics normal search users.  And finally, we&#8217;re finding some statistically significant, systematic differences between engines on a query set, with only extremely simple analysis &#8211; so <em>something</em> must be working right.</p>
<p><strong>What&#8217;s the statistical methodology?</strong> As dead simple as possible: 95% confidence intervals on the graph, and engine comparisons via paired t-tests.  These are all on that per-query precision-at-5 metric.  We think that with larger scale experiments, more fine-grained breakdowns, survey design improvements, better analyses, etc., we can flesh out more differences.</p>
<p><strong>What&#8217;s the &#8220;meta-engine upper bound&#8221;?</strong> That&#8217;s just how many queries had a &#8220;Highly Relevant&#8221; result on at least one engine.  So hypothetically, if you were to combine all the engines and select the best results for the top, it would perform at this upper bound.  This bound is overly high for a number of reasons (e.g. it&#8217;s artificially inflated by judgment noise and it assumes smart re-ranking); but it gives some idea how much the search engines could still improve.</p>
<p>Anyway, we&#8217;re thinking of doing more work along these lines if people are interested.  There are certainly big improvements that could be made; we&#8217;d love any feedback you have.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/04/search-engine-relevance-an-empirical-test/feed/</wfw:commentRss>
		<slash:comments>15</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing to find media bias: Hillary vs. Obama</title>
		<link>http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/</link>
		<comments>http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 03:41:52 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Media]]></category>
		<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=21</guid>
		<description><![CDATA[As anyone who follows political races knows, different sources can report the same event in very different ways. We took nearly six thousand recent articles over the past month about Clinton and Obama and sent them to Mechanical Turk to be classified as favorable or unfavorable for the respective candidates. We scraped the articles from [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/" data-text="Crowdsourcing to find media bias: Hillary vs. Obama" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/"></g:plusone></div></div><p>As anyone who follows political races knows, different sources can report the same event in very different ways.  We took nearly six thousand recent articles over the past month about Clinton and Obama and sent them to Mechanical Turk to be classified as favorable or unfavorable for the respective candidates.  We scraped the articles from <a href="http://news.google.com/">Google News</a> restricted to several sources, and threw in front page headlines from <a href="http://digg.com/news">Digg</a>.</p>
<p>Here is the graph for favorability scores, aggregated by source.  We found that Digg was far and away the most favorable for Obama.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-bysource3.png' title='obama-hillary-bysource3.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-bysource3.png' alt='obama-hillary-bysource3.png' class='centered' /></a></p>
<p>
The next graph tracks overall news favorability by date.  To provide some context, we compared it with the change in Obama stock on the <a href="http://intrade.com/">Intrade</a> prediction market.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-overtime.png' title='obama-hillary-overtime2.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-overtime2.png' alt='obama-hillary-overtime2.png' class='centered' /></a></p>
<p>More details after the jump:</p>
<p><span id="more-21"></span></p>
<p>We created our data set by doing two separate searches, one for &#8220;Barack Obama&#8221; and one for &#8220;Hillary Clinton&#8221;.  This did a pretty good job ensuring that results from Google News or Digg&#8217;s search facility demonstrated how the article was about the given candidate.  For each article, we showed the headline, search result snippet, and link to several Turkers.  They reported whether it was positive, neutral, or negative toward the candidate.</p>
<p>The favorability metric was created by averaging the ratings across articles.  Pro-Obama and anti-Hillary articles were both worth 1 point; anti-Obama and pro-Hillary both worth -1, and neutrals 0.</p>
<p>Therefore, if all articles are either positive towards Obama or negative towards Hillary, the rating is +100%; and vice-versa for -100%.</p>
<p>The data is very noisy.  The question of favorability is extremely tricky: it includes a combination of expectations, sentiment, and the objective events a newspaper chooses to report.  All of these are hard to reliably assess or even define.  (And whether anything you measure constitutes &#8220;media bias&#8221; is another complicated question!)</p>
<p>Despite all this philosophical intractability, the data must be showing something real, because we have a statistically sound result: the difference between Digg and the others was statistically significant (<a href="http://en.wikipedia.org/wiki/Student%27s_t-test">t-test</a>, p&lt;.001).  The differences within the mainstream media were not statistically significant.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a>, <a href="http://vandev.com/">Chris</a>, <a href="http://www.lukasbiewald.com/">Lukas</a>, <a href="http://mike-love.net/">Mike</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Awesome cloud view of our color names data</title>
		<link>http://blog.crowdflower.com/2008/03/awesome-cloud-view-of-our-color-names-data/</link>
		<comments>http://blog.crowdflower.com/2008/03/awesome-cloud-view-of-our-color-names-data/#comments</comments>
		<pubDate>Thu, 20 Mar 2008 22:04:39 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Colors]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=18</guid>
		<description><![CDATA[Martin Wattenberg at IBM Research took our color names data and made a cool new cloud view: Instead of plotting each individual color name like in the original, he grouped together identical names, took an average position, and sized the word by frequency. That&#8217;s why the more common names like &#8220;red&#8221; and &#8220;green&#8221; are large. [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/03/awesome-cloud-view-of-our-color-names-data/" data-text="Awesome cloud view of our color names data" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/03/awesome-cloud-view-of-our-color-names-data/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/03/awesome-cloud-view-of-our-color-names-data/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/03/awesome-cloud-view-of-our-color-names-data/"></g:plusone></div></div><p><a href="http://www.bewitched.com">Martin Wattenberg</a> at <a href="http://www.research.ibm.com/visual/">IBM Research</a> took our <a href="/?p=17">color names data</a> and made a cool new cloud view:</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/03/colorcloud3.png' title='Cloud view of the color names from Martin Wattenberg'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/03/colorcloud3.png' border='0' alt='Cloud view of the color names from Martin Wattenberg' class='centered' /></a></p>
<p>Instead of plotting each individual color name like in <a href="/?p=11">the original</a>, he grouped together identical names, took an average position, and sized the word by frequency.  That&#8217;s why the more common names like &#8220;red&#8221; and &#8220;green&#8221; are large.  This really helps readability (and, I&#8217;ll admit, the black background works a bit better :))</p>
<p>Thanks to Martin for sending this on!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/03/awesome-cloud-view-of-our-color-names-data/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Our color names data set is online</title>
		<link>http://blog.crowdflower.com/2008/03/our-color-names-data-set-is-online/</link>
		<comments>http://blog.crowdflower.com/2008/03/our-color-names-data-set-is-online/#comments</comments>
		<pubDate>Tue, 18 Mar 2008 18:30:00 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Colors]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=17</guid>
		<description><![CDATA[I just packaged and released the data set for our color names experiment. It has 10,000 color/label pairs. This is the download link. Read on for more details: I tried to generate the color patches in a way to get interesting colors. This of course is incredibly subjective. My main concern was to eliminate muddy [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/03/our-color-names-data-set-is-online/" data-text="Our color names data set is online" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/03/our-color-names-data-set-is-online/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/03/our-color-names-data-set-is-online/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/03/our-color-names-data-set-is-online/"></g:plusone></div></div><p>I just packaged and released the data set for our <a href="/?p=11">color names experiment</a>.  It has 10,000 color/label pairs.</p>
<p><a href="http://assets.doloreslabs.com/jobs/colors/doloreslabs-color-names-v1.zip">This is the download link</a>.  Read on for more details:</p>
<p><span id="more-17"></span></p>
<p>I tried to generate the color patches in a way to get interesting colors.  This of course is incredibly subjective.  My main concern was to eliminate muddy dark grays, which are very common when uniformly sampling over standard <a href="http://en.wikipedia.org/wiki/RGB">RGB</a> values.  (Perhaps I went too far &#8212; see the big donut hole in the color wheel plots.)  So the color patches were sampled from <a href="http://en.wikipedia.org/wiki/HSV_color_space">HSV</a> with uniform sampling over hue, but saturation and value biased high (normal distribution).  The exact code and parameters for this is included in the download.</p>
<p>The plots in the <a href="/?p=11">post</a> and the <a href="http://assets.doloreslabs.com/jobs/colors/explorer/">explorer</a> look like a color wheel with hue as the angle.  But actually they&#8217;re from running <a href="http://en.wikipedia.org/wiki/Principal_components_analysis">PCA</a> over the RGB values, using the first two principal components as x and y.  This was a very arbitrary decision, but seemed to make a nice visual effect.  There are many other reasonable ways to plot the data.</p>
<p>The data includes anonymized identity on the workers.  (The Mechanical Turk service makes all workers anonymous, but we anonymized yet again for releasing the data set.)  You can see that certain workers did a large number of annotations.  We have no demographic information for this one, sorry.</p>
<p>The files are:</p>
<ul>
<li><i>data.csv</i>, which contains the color/label pairs, also with rgb and hsv representations.</li>
<li><i>R.R</i>, which has some routines that were used to generate and plot the data.  It has examples of how to read and use the data, if you like to use <a href="http://www.r-project.org/">R</a>.</li>
<li><i>html.rb</i>, which with write_html() creates the <a href="http://assets.doloreslabs.com/jobs/colors/explorer/">explorer</a>.</li>
<li><i>sample-hit.html</i>, one of the web forms used for data collection.  There were 1000 forms with 10 colors each.  For each single form (&#8220;HIT&#8221;), exactly one annotator filled it out.  Individual annotators sometimes did multiple forms if they wanted to.
</ul>
<p>Let us know if this is useful, if you have any questions, or find something wrong with the download &#8212; either email or leave a comment here.  And if you do anything cool with this data, we&#8217;d really love to hear about it.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/03/our-color-names-data-set-is-online/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Where does &#8220;Blue&#8221; end and &#8220;Red&#8221; begin?</title>
		<link>http://blog.crowdflower.com/2008/03/where-does-blue-end-and-red-begin/</link>
		<comments>http://blog.crowdflower.com/2008/03/where-does-blue-end-and-red-begin/#comments</comments>
		<pubDate>Mon, 17 Mar 2008 22:49:42 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Colors]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=11</guid>
		<description><![CDATA[What would you call these colors? We showed thousands of random colors like this to people on Mechanical Turk and asked what they would call them. Here&#8217;s what they said: The above picture contains about 1,300 colors and the names for them that Turkers gave.  Each is printed in its color and positioned on a [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/03/where-does-blue-end-and-red-begin/" data-text="Where does &#8220;Blue&#8221; end and &#8220;Red&#8221; begin?" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/03/where-does-blue-end-and-red-begin/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/03/where-does-blue-end-and-red-begin/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/03/where-does-blue-end-and-red-begin/"></g:plusone></div></div><p>What would you call these colors?</p>
<table border="0" style="margin: 0 auto">
<tr>
<td style="background: #BAA44A; width: 60px; height: 40px"></td>
<td style="background: #73ED00; width: 60px; height: 40px"></td>
<td style="background: #8D891A;  width: 60px; height: 40px"></td>
<td style="background: #547F00;  width: 60px; height: 40px"></td>
<td style="background: #185A7A;  width: 60px; height: 40px"></td>
<td style="background: #5C0C7B;  width: 60px; height: 40px"></td>
<td style="background: #31E3DC;  width: 60px; height: 40px"></td>
<td style="background: #554F00;  width: 60px; height: 40px"></td>
<td style="background: #9F8932;  width: 60px; height: 40px"></td>
<td style="background: #837E1C;  width: 60px; height: 40px"></td>
</tr>
</table>
<p>We showed thousands of random colors like this to people on Mechanical Turk and <a href="http://s3.amazonaws.com/lab20/ae18a3141e0cd78369c043fda22c76bdd354ff48.html">asked</a> what they would call them.  Here&#8217;s what they said:</p>
<p><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/label-wheel2.gif" alt="label-wheel2.gif" class='centered' /></p>
<p>The above picture contains about 1,300 colors and the names for them that Turkers gave.  Each is printed in its color and positioned on a color wheel.  Just looking around, there sure seem to be different regions for different names.  But there are also rich sets of modifiers (&#8220;light&#8221;, &#8220;dark&#8221;, &#8220;sea&#8221;), multiword names (&#8220;army green&#8221;), and fun obscure ones (&#8220;cerulean&#8221;). To help look at all this, we also made a <strong><a href="http://assets.doloreslabs.com/jobs/colors/explorer/">color label explorer</a></strong>, so you can search for different terms and see different parts of the space.  If the link doesn&#8217;t work for you, here are a few examples:</p>
<table border="1">
<tr>
<td><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/explorer-screenshot-full.gif" alt="explorer-screenshot-full.gif" /></td>
<td><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/explorer-screenshot-pink.gif" alt="explorer-screenshot-full.gif" /></td>
</tr>
<tr>
<td><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/explorer-screenshot-dark.gif" alt="explorer-screenshot-full.gif" /></td>
<td><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/explorer-screenshot-baby.gif" alt="explorer-screenshot-full.gif" /></td>
</tr>
</table>
<p>This study is basically the same design as the famous <a href="http://www.icsi.berkeley.edu/wcs/">World Color Survey</a>, where anthropologists showed color patches to speakers of many different languages and asked for names, to <a href="http://books.google.com/books?id=gN0UaSUTbnUC&#038;pg=PA207&#038;lpg=PA207&#038;dq=berlin+kay+nativism&#038;source=web&#038;ots=3kqZqazDF_&#038;sig=PV8hnv9JAJn28OayFeIN5RAHzB0&#038;hl=en#PPA207,M1">test the universality of language</a>.  Of course, we have <a href="http://doloreslabs.com/services.html">mostly native English speakers</a>.  However, we can get much more data.  (The above picture and links use only a small percentage of all the colors and names we collected.)   There&#8217;s tons more that can be done.  Want to make a better visualizer?  Statistical analysis of colors to name terms?   Let us know and we should be able to get this data set online.</p>
<p>UPDATE 3/18: <a href="http://blog.doloreslabs.com/?p=17">I posted the data set</a>.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/03/where-does-blue-end-and-red-begin/feed/</wfw:commentRss>
		<slash:comments>94</slash:comments>
		</item>
		<item>
		<title>Less white people, more football: Sports Illustrated covers since 1954</title>
		<link>http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/</link>
		<comments>http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 17:36:33 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Media]]></category>
		<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=10</guid>
		<description><![CDATA[Human annotators are great at providing basic information about images. We were wondering if we could find something interesting about magazine covers. Stumbling upon 2800 Sports Illustrated cover images going back to 1954, we sent them to Mechanical Turk, asking people to identify the race and gender of the person featured (if any), and what [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/" data-text="Less white people, more football: Sports Illustrated covers since 1954" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/"></g:plusone></div></div><p>Human annotators are great at providing basic information about images.  We were wondering if we could find something interesting about magazine covers.  Stumbling upon <a href="http://www.coverbrowser.com/covers/sports-illustrated">2800 Sports Illustrated cover images going back to 1954</a>, we sent <a href="http://assets.doloreslabs.com/jobs/si_sample.html" target="_blank">them</a> to Mechanical Turk, asking people to identify the race and gender of the person featured (if any), and what sport was depicted. There are lots of interesting things in this data; this post will touch on just a few we’ve had time to whip together some graphs for.</p>
<p>Here is a historical graph of the frequency of how often people of different races appear on the cover of Sports Illustrated.  The story is simple and striking:</p>
<p style="text-align: center"><a href="http://blog.doloreslabs.com/wp-content/uploads/2008/03/race.png" title="race.png"><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/race.png" class='centered' /></a></p>
<p>Next: which sports get featured on the  cover?  Here’s a chart for several sports over that same time.</p>
<p align="center"><a href="http://blog.doloreslabs.com/wp-content/uploads/2008/03/sports.png" title="sports.png"><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/sports.png" class='centered' /></a></p>
<p>It might be possible to find links between the careers of famous athletes and rises and falls their sports’ popularity; for example, boxing peaks in the 70’s (Muhammad Ali?), basketball peaks in the 90’s (Michael Jordan?) and golf bounces back in the 90’s after a long decline (Tiger Woods?).</p>
<p>Many other sports appear in the data, too; for this chart, we made sure to pick the three most common, and a few other particularly interesting ones.  Percentages don’t add up to 100% because we didn&#8217;t plot all the other sports, including things like horse racing which used to be much more popular.  If you’re really curious, <a href="http://blog.doloreslabs.com/wp-content/uploads/2008/03/sports21.png">here’s the full chart of all sports we asked about</a>, including many of the smaller ones.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>

