<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The CrowdFlower Blog &#187; Media</title>
	<atom:link href="http://blog.crowdflower.com/topics/media/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.crowdflower.com</link>
	<description></description>
	<lastBuildDate>Tue, 10 Jan 2012 20:00:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Did you say &#8220;Great!&#8221;, or &#8220;Oh Great!&#8221;?</title>
		<link>http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/</link>
		<comments>http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 19:24:39 +0000</pubDate>
		<dc:creator>Jodie Ellis</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[automated sentiment analysis]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsorcing]]></category>
		<category><![CDATA[crowdsource]]></category>
		<category><![CDATA[crowdsourced]]></category>
		<category><![CDATA[herman cain]]></category>
		<category><![CDATA[sentiment analysis]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=4240</guid>
		<description><![CDATA[Being tapped to write a blog post here at CrowdFlower is usually left to the experts. So with that, let me begin by making the disclaimer that I am neither a political analyst nor a data scientist. But I do have a personal fervor for politics and access to some impressive tools, thanks to my job [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/" data-text="Did you say &#8220;Great!&#8221;, or &#8220;Oh Great!&#8221;?" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/"></g:plusone></div></div><div class="wp-caption alignright" style="width: 281px"><a href="http://www.empowernewsmag.com/userfiles/cain.jpg"><img class=" " title="Cain" src="http://www.empowernewsmag.com/userfiles/cain.jpg" alt="" width="271" height="361" /></a><p class="wp-caption-text">Image by Sarah Butrymowicz</p></div>
<p>Being tapped to write a blog post here at CrowdFlower is usually left to the experts. So with that, let me begin by making the disclaimer that I am neither a political analyst nor a data scientist. But I do have a personal fervor for politics and access to some impressive tools, thanks to my job here at <a title="the leader in enterprise crowdsourcing" href="http://crowdflower.com/" target="_blank">CrowdFlower</a>.</p>
<p>For those who aren&#8217;t familiar with CrowdFlower, we specialize in tapping human contributors worldwide to do massive amounts of simple, repetitive tasks (especially tasks that are hard for computers to do by themselves). Here&#8217;s a <a title="How It Works!" href="http://vimeo.com/26878855">quick how-it-works animation</a>.</p>
<p>I had been reading some old blog posts on the CrowdFlower blog when I came across an interesting <a title="crowdsourcing media bias" href="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/" target="_blank">2008 post on election media bias</a>.</p>
<p>I determined that this could be a great opportunity to revisit sentiment analysis, and specifically set out to see if automated sentiment detection tools vs. human assessments could yield any blog-worthy findings.</p>
<p>To see how far the automated sentiment tools have come, I began by using an enterprise-grade social media monitoring tool that provides sentiment analysis.</p>
<p>I ran a few quick monitoring searches of my own to see how the current Republican Primary election was tracking — it seemed a topical place that would be chock full of good commentary.</p>
<p>The instant access to well-organized data from blogs, news sources, and a variety of social media sources was outstanding.</p>
<p>However, I was surprised to find that for each search I conducted, <strong>the automated sentiment detection tool consistently returned an overwhelming proportion of &#8220;Neutral&#8221; ratings (frequently exceeding 90%)</strong>. This seemed funny to me, given the typically emotive nature of politics.<span id="more-4240"></span></p>
<p><strong>It&#8217;s important to note that this particular tool uses a default value of &#8220;Neutral&#8221; for any post it cannot interpret.</strong></p>
<p>A particularly interesting subset of the data was several thousand tweets about Herman Cain immediately following the news of alleged sexual harassment by Cain during his time as leader of the National Restaurant Association. Surely this would yield some sentiment-rich commentary that even machines couldn&#8217;t resist tagging.</p>
<p>For the posts about Herman Cain on Oct 31st, here is what the machine detected on just under 3,000 posts:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/cainauto.png"><img class="aligncenter size-full wp-image-4629" title="cainauto" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/cainauto.png" alt="" width="513" height="306" /></a></p>
<p>Naturally, I took to the CrowdFlower platform and decided I would run the same data through a simple sentiment analysis workflow.  With the help of our team of crowdsourcing gurus, I utilized some simple, but effective best practices to control for quality (you can get a good overview <a title="crowdsourcing quality control" href="http://blog.crowdflower.com/2011/10/stopworrying/" target="_blank">here</a>). Here is what the CrowdFlower contributors detected:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caincf.png"><img class="aligncenter size-full wp-image-4630" title="caincf" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caincf.png" alt="" width="527" height="324" /></a></p>
<p>Here are just a couple of posts marked &#8220;Neutral&#8221; by the machine and &#8220;Negative&#8221; and &#8220;Positive&#8221;, respectively, by CrowdFlower contributors:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet1.png"><img class="aligncenter size-full wp-image-4631" title="caintweet1" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet1.png" alt="" width="505" height="204" /></a><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet2.png"><img class="aligncenter size-full wp-image-4632" title="caintweet2" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet2.png" alt="" width="478" height="142" /></a></p>
<h2>Takeaways</h2>
<p>A spot check of the results on the automated set confirmed that when the machine actually tagged a post as positive or negative, it was usually very accurate (good precision).</p>
<p>However, <strong>the large amount of data that the machine was unable to make a determination on suggests that the pervasive problem of &#8216;recall&#8217; is still the big challenge with automated sentiment detection.</strong></p>
<p><strong></strong>This graph illustrates the recall difference a bit more clearly. The need for human analysis when dealing with the subtleties of language could not be more apparent.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweets.png"><img class="aligncenter size-full wp-image-4633" title="caintweets" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweets.png" alt="" width="650" height="254" /></a></p>
<p><strong>Automated Tool</strong>: Good precision. Poor recall.</p>
<p><strong>CrowdFlower Tool</strong>: Good precision. Good recall.</p>
<h2>Sentiment Analysis is Insightful AND Entertaining</h2>
<p>In addition to the Herman Cain Twitter data, I looked at headlines, blogs, and a broad swath of social media commentary on all the candidates. The conclusion I can draw from my effort is that sentiment detection, is indeed, still a very challenging problem to solve through automation.</p>
<p>This is consistent with what I see here at CrowdFlower daily — in today&#8217;s data-wealthy world, there are countless tasks that require human attention (good to know if my blogging career never gets off the ground).</p>
<p>Hopefully I&#8217;ll get the chance to continue exploring the sentiment about topical news as it breaks, and will look forward to sharing future findings.</p>
<p>Have experience monitoring sentiment? Let us know if this is consistent with what you&#8217;ve seen. Leave a comment.</p>
<p style="text-align: center;">***</p>
<p>To find out more about how CrowdFlower technology is used for sentiment analysis and a wide range of other human powered projects, visit the <a title="enterprise crowdsourcing products" href="http://crowdflower.com/products" target="_blank">CrowdFlower products page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oscar Fever: The Sequel!</title>
		<link>http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/</link>
		<comments>http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/#comments</comments>
		<pubDate>Fri, 04 Mar 2011 22:51:03 +0000</pubDate>
		<dc:creator>Patrick Philips and Joseph Childress</dc:creator>
				<category><![CDATA[Art]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Wisdom of Small Crowds]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=2190</guid>
		<description><![CDATA[The votes are in from our Oscar crowdsourcing experiment, and the crowd successfully picked the winners of 14 of the academy awards. For reference, Roger Ebert got 15 predictions correct so we&#8217;d have to conclude that the crowd performed reasonably well at predicting the winners of this glorified popularity contest. One fascinating thing about aggregating responses [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/" data-text="Oscar Fever: The Sequel!" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/"></g:plusone></div></div><p>The votes are in from <a href="http://blog.crowdflower.com/2011/02/oscar-fever/" target="_blank">our Oscar crowdsourcing experiment</a>, and the crowd successfully picked the winners of 14 of the academy awards. For reference, <a href="http://rogerebert.suntimes.com/apps/pbcs.dll/article?AID=/20110210/OSCARS/110219999" target="_blank">Roger Ebert got 15 predictions correct</a> so we&#8217;d have to conclude that the crowd performed reasonably well at predicting the winners of this glorified popularity contest.</p>
<div id="attachment_2191" class="wp-caption aligncenter" style="width: 808px"><a rel="attachment wp-att-2191" href="http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/actual_results/"><img class="size-full wp-image-2191" title="actual_results" src="http://blog.crowdflower.com/wp-content/uploads/2011/03/actual_results.jpg" alt="movie picks" width="798" height="414" /></a><p class="wp-caption-text">Predicted and Actual Winners of the 2011 Academy Awards</p></div>
<p><span id="more-2190"></span><br />
One fascinating thing about aggregating responses is that the crowd as a whole will often outperform the average worker. In this case, among the 500 people we polled, the majority of respondents picked fewer than 10 awards correctly (mean of 9.6 and median of 9). And yet, by aggregating all the responses, such that the nominee with the most &#8220;votes&#8221; is predicted to win, the crowd as a whole correctly picked 14 awards. While the &#8220;wisdom of crowds&#8221; doesn&#8217;t come as much of a surprise, it&#8217;s always reassuring to see it confirmed in new applications.</p>
<p><a rel="attachment wp-att-2199" href="http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/correct_histogram1/"><img class="aligncenter size-full wp-image-2199" title="correct_histogram1" src="http://blog.crowdflower.com/wp-content/uploads/2011/03/correct_histogram1.jpg" alt="" width="614" height="445" /></a></p>
<p>As we noted in<a href="http://blog.crowdflower.com/2011/02/oscar-fever/"> our earlier post</a>, though, the more interesting question was whether workers who indicated higher confidence in their responses would outperform workers with lower confidence. Looking at the results, however, we saw no significant correlation between a worker&#8217;s predicted accuracy and  actual performance.</p>
<div id="attachment_2222" class="wp-caption aligncenter" style="width: 823px"><a rel="attachment wp-att-2222" href="http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/scatts/"><img class="size-full wp-image-2222" title="scatts" src="http://blog.crowdflower.com/wp-content/uploads/2011/03/scatts.jpg" alt="" width="813" height="547" /></a><p class="wp-caption-text">&quot;Squint all you want, but there&#39;s no pattern&quot;</p></div>
<p>While it&#8217;s certainly possible that we didn&#8217;t offer enough of an incentive for workers to estimate their own accuracy, the more likely explanation is that predicting the winners of the Oscars is not something that a person can do with any degree of certainty. Confident or not, the people we polled did not see the &#8220;Inside Job&#8221; coming.</p>
<p>As a final exercise, we ran a regression on every explanatory variable we could find, including what state workers came from, what day they made their predictions, whether they made their predictions during the day or at night, how long they spent making their predictions and even their historical accuracy on other CrowdFlower tasks. The only variable with any significance turned out to be how long they spent on making their predictions, and while it was significant (at p=0.001), no model we could come up with explained more than 5 percent of the total variation in accuracy.</p>
<p>While the wisdom of crowds seems to extend to picking Oscar winners, the more interesting experiment of having workers self-select as trustworthy is ongoing. In the future, it would be worthwhile to repeat this experiment with questions that can be answered objectively and without uncertainty (solving algebra problems seems like a good candidate), to see if any correlation emerges between predicted and actual accuracy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/03/oscar-fever-the-sequel/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Oscar Fever</title>
		<link>http://blog.crowdflower.com/2011/02/oscar-fever/</link>
		<comments>http://blog.crowdflower.com/2011/02/oscar-fever/#comments</comments>
		<pubDate>Sat, 26 Feb 2011 18:48:16 +0000</pubDate>
		<dc:creator>Patrick Philips and Joseph Childress</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=2142</guid>
		<description><![CDATA[With the most glamorous award ceremony of the year just around the corner, we asked 500 people from across the United States to help predict who the big winners are going to be. Below are their predictions, sorted in descending order of agreement. Based on these early results, Pixar looks like a lock for yet another [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2011/02/oscar-fever/" data-text="Oscar Fever" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2011/02/oscar-fever/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2011/02/oscar-fever/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2011/02/oscar-fever/"></g:plusone></div></div><p>With the most glamorous award ceremony of the year just around the corner, we asked 500 people from across the United States to help predict who the big winners are going to be. Below are their predictions, sorted in descending order of agreement.</p>
<div id="attachment_2185" class="wp-caption aligncenter" style="width: 676px"><a rel="attachment wp-att-2185" href="http://blog.crowdflower.com/2011/02/oscar-fever/predictions/"><img class="size-full wp-image-2185" title="predictions" src="http://blog.crowdflower.com/wp-content/uploads/2011/02/prediction_table.jpg" alt="" width="666" height="449" /></a><p class="wp-caption-text">&quot;I&#39;d like to thank all the little people...&quot;</p></div>
<p>Based on these early results, Pixar looks like a lock for yet another Best Animated Picture award and Natalie Portman had better start polishing her acceptance speech.</p>
<p><span id="more-2142"></span></p>
<p>Anyone can make anonymous predictions, so we made things interesting by rewarding workers for answering correctly. We structured the job such that everyone receives 5 cents for completing the survey, plus an additional 2 cents for each correct answer.</p>
<p>To spice things up even more, we asked workers to guess how many predictions they would answer correctly (their &#8220;Magic Number,&#8221; with a minimum of 5), with the stipulation that we will pay bonuses only to those workers who answer at least their Magic Number of predictions correctly.</p>
<p>As you can see below, while quite a few workers played it safe by selecting the minimum number of correct responses (&#8220;5&#8243;), the most frequent choice was a Magic Number of 10. Also, more than a few brave souls thought they would get <em>every</em> prediction correct.</p>
<p><a rel="attachment wp-att-2144" href="http://blog.crowdflower.com/2011/02/oscar-fever/magic_numbers/"><img class="aligncenter size-full wp-image-2144" title="magic_numbers" src="http://blog.crowdflower.com/wp-content/uploads/2011/02/magic_numbers.jpg" alt="" width="680" height="449" /></a></p>
<p>But the big question, apart from whether Darren Aronofsky (Black Swan) can edge out David Fincher (The Social Network) for Best Director, is whether any correlation exists between worker confidence and actual performance. When workers have a cash incentive to estimate their own accuracy correctly, do the self-labeled &#8220;experts&#8221; perform any better  than the &#8220;novices&#8221; at predicting Oscar winners?</p>
<p>Make some popcorn, grab your Kleenex and stay tuned. After they&#8217;ve rolled up the red carpet, we&#8217;ll come back and see who did a better job of predicting the winners.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/02/oscar-fever/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing the Goldman Sachs Investigation</title>
		<link>http://blog.crowdflower.com/2010/06/crowdsourcing-the-goldman-sachs-investigation/</link>
		<comments>http://blog.crowdflower.com/2010/06/crowdsourcing-the-goldman-sachs-investigation/#comments</comments>
		<pubDate>Tue, 22 Jun 2010 23:31:51 +0000</pubDate>
		<dc:creator>Josh Eveleth</dc:creator>
				<category><![CDATA[Media]]></category>
		<category><![CDATA[Economy]]></category>
		<category><![CDATA[Goldman]]></category>
		<category><![CDATA[Government]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=693</guid>
		<description><![CDATA[When federal investigators asked Goldman Sachs for its transactions with insurance giant AIG, Goldman turned over the information — several hundred billion pages’ worth. John Carney, senior editor at CNBC.com, had an idea for sifting through the data —&#160;crowdsource it. We agree. In fact, CrowdFlower will categorize and tag the first 100,000 documents at no [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/06/crowdsourcing-the-goldman-sachs-investigation/" data-text="Crowdsourcing the Goldman Sachs Investigation" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/06/crowdsourcing-the-goldman-sachs-investigation/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/06/crowdsourcing-the-goldman-sachs-investigation/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/06/crowdsourcing-the-goldman-sachs-investigation/"></g:plusone></div></div><p>When federal investigators asked Goldman Sachs for its transactions with insurance giant AIG, Goldman turned over the information — several hundred billion pages’ worth.</p>
<p>John Carney, senior editor at CNBC.com, had an idea for sifting through the data —&nbsp;<a href="http://www.cnbc.com/id/37619147" TARGET="_blank">crowdsource it</a>. </p>
<p>We agree. In fact, CrowdFlower will categorize and tag the first 100,000 documents at no cost to the government.</p>
<p>If you’re just tuning in, the federal Financial Crisis Inquiry Commission (FCIC) subpoenaed Goldman for its AIG transactions, following accusations that Goldman cooked up a mortgage investment scheme that was rigged to fail.</p>
<p>FCIC has around 50 employees, an $8 million budget, and roughly six months to pore over the five terabytes of data. (Can you say, “Too small to succeed”?)</p>
<p><span id="more-693"></span></p>
<p>Clearly, technology presents a double-edged sword for investigators and other regulators.</p>
<p>On the one hand, companies under investigation can use technology to more efficiently bury investigators in terabytes of data (paging Goldman Sachs). On the other hand, technology provides tools for deftly sifting through the data (enter crowdsourcing).</p>
<p>Crowdsourcing public documents may be relatively new, but it’s not unprecedented. In fact, the British Parliament is under way with a project that uses crowdsourcing to <a href="http://mps-expenses.guardian.co.uk/" TARGET="_blank">investigate MPs’ expenses</a>. </p>
<p>We’ll keep you posted on whether the government takes up our offer.</p>
<p>&#8211; Additional contributions by Anisha Sekar.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/06/crowdsourcing-the-goldman-sachs-investigation/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing to find media bias: Hillary vs. Obama</title>
		<link>http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/</link>
		<comments>http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/#comments</comments>
		<pubDate>Thu, 27 Mar 2008 03:41:52 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Media]]></category>
		<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=21</guid>
		<description><![CDATA[As anyone who follows political races knows, different sources can report the same event in very different ways. We took nearly six thousand recent articles over the past month about Clinton and Obama and sent them to Mechanical Turk to be classified as favorable or unfavorable for the respective candidates. We scraped the articles from [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/" data-text="Crowdsourcing to find media bias: Hillary vs. Obama" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/"></g:plusone></div></div><p>As anyone who follows political races knows, different sources can report the same event in very different ways.  We took nearly six thousand recent articles over the past month about Clinton and Obama and sent them to Mechanical Turk to be classified as favorable or unfavorable for the respective candidates.  We scraped the articles from <a href="http://news.google.com/">Google News</a> restricted to several sources, and threw in front page headlines from <a href="http://digg.com/news">Digg</a>.</p>
<p>Here is the graph for favorability scores, aggregated by source.  We found that Digg was far and away the most favorable for Obama.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-bysource3.png' title='obama-hillary-bysource3.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-bysource3.png' alt='obama-hillary-bysource3.png' class='centered' /></a></p>
<p>
The next graph tracks overall news favorability by date.  To provide some context, we compared it with the change in Obama stock on the <a href="http://intrade.com/">Intrade</a> prediction market.</p>
<p><a href='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-overtime.png' title='obama-hillary-overtime2.png'><img src='http://blog.doloreslabs.com/wp-content/uploads/2008/03/obama-hillary-overtime2.png' alt='obama-hillary-overtime2.png' class='centered' /></a></p>
<p>More details after the jump:</p>
<p><span id="more-21"></span></p>
<p>We created our data set by doing two separate searches, one for &#8220;Barack Obama&#8221; and one for &#8220;Hillary Clinton&#8221;.  This did a pretty good job ensuring that results from Google News or Digg&#8217;s search facility demonstrated how the article was about the given candidate.  For each article, we showed the headline, search result snippet, and link to several Turkers.  They reported whether it was positive, neutral, or negative toward the candidate.</p>
<p>The favorability metric was created by averaging the ratings across articles.  Pro-Obama and anti-Hillary articles were both worth 1 point; anti-Obama and pro-Hillary both worth -1, and neutrals 0.</p>
<p>Therefore, if all articles are either positive towards Obama or negative towards Hillary, the rating is +100%; and vice-versa for -100%.</p>
<p>The data is very noisy.  The question of favorability is extremely tricky: it includes a combination of expectations, sentiment, and the objective events a newspaper chooses to report.  All of these are hard to reliably assess or even define.  (And whether anything you measure constitutes &#8220;media bias&#8221; is another complicated question!)</p>
<p>Despite all this philosophical intractability, the data must be showing something real, because we have a statistically sound result: the difference between Digg and the others was statistically significant (<a href="http://en.wikipedia.org/wiki/Student%27s_t-test">t-test</a>, p&lt;.001).  The differences within the mainstream media were not statistically significant.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com/">Brendan</a>, <a href="http://vandev.com/">Chris</a>, <a href="http://www.lukasbiewald.com/">Lukas</a>, <a href="http://mike-love.net/">Mike</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/feed/</wfw:commentRss>
		<slash:comments>5</slash:comments>
		</item>
		<item>
		<title>Less white people, more football: Sports Illustrated covers since 1954</title>
		<link>http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/</link>
		<comments>http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/#comments</comments>
		<pubDate>Thu, 13 Mar 2008 17:36:33 +0000</pubDate>
		<dc:creator>Brendan O'Connor</dc:creator>
				<category><![CDATA[Media]]></category>
		<category><![CDATA[Miscellaneous]]></category>

		<guid isPermaLink="false">http://blog.doloreslabs.com/?p=10</guid>
		<description><![CDATA[Human annotators are great at providing basic information about images. We were wondering if we could find something interesting about magazine covers. Stumbling upon 2800 Sports Illustrated cover images going back to 1954, we sent them to Mechanical Turk, asking people to identify the race and gender of the person featured (if any), and what [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/" data-text="Less white people, more football: Sports Illustrated covers since 1954" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/"></g:plusone></div></div><p>Human annotators are great at providing basic information about images.  We were wondering if we could find something interesting about magazine covers.  Stumbling upon <a href="http://www.coverbrowser.com/covers/sports-illustrated">2800 Sports Illustrated cover images going back to 1954</a>, we sent <a href="http://assets.doloreslabs.com/jobs/si_sample.html" target="_blank">them</a> to Mechanical Turk, asking people to identify the race and gender of the person featured (if any), and what sport was depicted. There are lots of interesting things in this data; this post will touch on just a few we’ve had time to whip together some graphs for.</p>
<p>Here is a historical graph of the frequency of how often people of different races appear on the cover of Sports Illustrated.  The story is simple and striking:</p>
<p style="text-align: center"><a href="http://blog.doloreslabs.com/wp-content/uploads/2008/03/race.png" title="race.png"><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/race.png" class='centered' /></a></p>
<p>Next: which sports get featured on the  cover?  Here’s a chart for several sports over that same time.</p>
<p align="center"><a href="http://blog.doloreslabs.com/wp-content/uploads/2008/03/sports.png" title="sports.png"><img src="http://blog.doloreslabs.com/wp-content/uploads/2008/03/sports.png" class='centered' /></a></p>
<p>It might be possible to find links between the careers of famous athletes and rises and falls their sports’ popularity; for example, boxing peaks in the 70’s (Muhammad Ali?), basketball peaks in the 90’s (Michael Jordan?) and golf bounces back in the 90’s after a long decline (Tiger Woods?).</p>
<p>Many other sports appear in the data, too; for this chart, we made sure to pick the three most common, and a few other particularly interesting ones.  Percentages don’t add up to 100% because we didn&#8217;t plot all the other sports, including things like horse racing which used to be much more popular.  If you’re really curious, <a href="http://blog.doloreslabs.com/wp-content/uploads/2008/03/sports21.png">here’s the full chart of all sports we asked about</a>, including many of the smaller ones.</p>
<p>-<a href="http://socialscienceplusplus.blogspot.com">Brendan</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2008/03/sports-and-race-on-sports-illustrated-magazine-covers/feed/</wfw:commentRss>
		<slash:comments>14</slash:comments>
		</item>
	</channel>
</rss>

