<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The CrowdFlower Blog</title>
	<atom:link href="http://blog.crowdflower.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.crowdflower.com</link>
	<description></description>
	<lastBuildDate>Tue, 10 Jan 2012 20:00:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>2011 Retrospective: Good Begets Good</title>
		<link>http://blog.crowdflower.com/2012/01/2011-retrospective-good-begets-good/</link>
		<comments>http://blog.crowdflower.com/2012/01/2011-retrospective-good-begets-good/#comments</comments>
		<pubDate>Tue, 10 Jan 2012 20:00:35 +0000</pubDate>
		<dc:creator>Vaughn Hester and Lukas Biewald</dc:creator>
				<category><![CDATA[Disaster Relief]]></category>
		<category><![CDATA[Health]]></category>
		<category><![CDATA[Holidays]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Al Jazeera]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[disaster relief]]></category>
		<category><![CDATA[Pakreport]]></category>
		<category><![CDATA[ushahidi]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=4640</guid>
		<description><![CDATA[One of the most exciting things about working at CrowdFlower is the ongoing discovery of the wide range of crowdsourcing applications. While our core focus is enterprise solutions, we&#8217;re also involved in a number of social innovation projects. At recent meetups and in recent blog posts, we&#8217;ve described CrowdFlower implementations that help create unprecedented social impact. Many [...]]]></description>
			<content:encoded><![CDATA[<p>One of the most exciting things about working at CrowdFlower is the ongoing discovery of the wide range of crowdsourcing applications. While our core focus is enterprise solutions, we&#8217;re also involved in a number of social innovation projects. At recent meetups and in recent <a href="http://blog.crowdflower.com/2011/11/scientific-research/">blog posts</a>, we&#8217;ve described CrowdFlower implementations that help create unprecedented social impact. Many of these projects involve processing data in support of crisis relief or <a href="http://poptech.org/popcasts/fortune_and_biewald_crowdsourcing_tb_cell_annotation">public health research</a>.<a title="" href="#_ftn1">[1]</a> What we continue to see over time is that there are truly inspirational ripple effects emerging from these efforts.</p>
<p><a href="http://blog.crowdflower.com/2012/01/2011-retrospective-good-begets-good/crowdflower-job-67330-preview/" rel="attachment wp-att-4654"><img class="alignright size-full wp-image-4654" title="CrowdFlower/Somalia Speaks Task" src="http://blog.crowdflower.com/wp-content/uploads/2012/01/CrowdFlower-Job-67330-Preview.jpg" alt="" width="401" height="432" /></a></p>
<p>&nbsp;</p>
<div>
<p>In the fall of 2011, the Nexus for ICTs, Climate Change and Development (NICCD) project at the University of Manchester released a <a href="http://www.niccd.org/casestudies.htm">series of case studies</a> on innovative uses of technology for development. <a href="http://www.niccd.org/NICCD_Disasters_Case_Study_Pakreport.pdf">Pakreport was featured</a> as a tool for reporting among flood-affected communities; but it was also highlighted for its contributions to climate change awareness and natural disaster monitoring in a country at very high risk.</p>
<p><span id="more-4640"></span></p>
<p><em>&#8220;Crowdsourcing was critical to the success of this disaster response system,&#8221; the report says. &#8220;It was integral to the data input model, which would otherwise have relied on much more limited inputs from individual relief agency workers.&#8221;</em></p>
<p>As a result of our involvement with Pakreport, our partners in Pakistan signed up to be a CrowdFlower contributor channel. Since April 2011, we have sent 313,030 microtasks to a pool of 500 Pakistani contributors from underserved communities. Our partners in Pakistan also recently launched <a href="http://pakreport.org/dowevote/">DoWeVote</a>, a map-based effort to improve future civic engagement by visualizing 2008 voter turnout data from across Pakistan. Pakreport continues to be an engine for social change, and its core structure is relatively simple to replicate in any setting or modify for similar projects.</p>
<p>After the <a href="http://poptech.org/world_rebalancing">2011 PopTech conference</a>, our partners at <a href="http://www.ushahidi.com">Ushahidi</a> reached out regarding a project to collect reports from the ground in Somalia. In conjunction with Al Jazeera, Souktel, and the African Diaspora Institute, the <a href="http://www.aljazeera.com/indepth/spotlight/somaliaconflict/somaliaspeaks.html">Somalia Speaks project</a> &#8220;seeks to echo the voices of ordinary Somalis in Somalia so they can be heard in the international media.&#8221; Text messages from the ground are collected, translated, categorized and mapped. The translated messages and maps are shared on Al Jazeera. It is a very powerful experience to read the words of a refugee or survivor on one of the largest news websites in the world; examples of mainstream media including the voices of these populations are few and far between.</p>
<div>
<div>
<div>
<p>Beyond all the do-gooder self-congratulation, however, it is important to note  that these efforts create new challenges and move our colleagues and us into new ethical territory. While CrowdFlower maintains rigorous confidentiality and security measures as part of our standard enterprise engagements, the recent collaborations described here are more creative, collaborative and high-profile. As more data flows through open source software and multistep workflows involving collaboration among multiple organizations, critical questions arise as to the confidentiality of the data involved as it is shared by wider audiences. For example, how can you reconcile the flow of sensitive or personal information with the use of software that emphasizes transparency above all else? How can we be certain that, particularly in conflict situations, these workflows do not create a risk of exposure or vulnerability for the people submitting reports from the ground? Who defines the standards for privacy as you amplify voices from vulnerable groups? A key lesson learned through these projects is that confidentiality is essential when dealing with personal information, but that it can be a challenge to protect confidentiality at every step of these multistep workflows. Finally, as we see more demand for these types of crowdsourcing projects, how can we reduce the start up time and the learning curve for organizations who wish to replicate these projects?</p>
<p>These are small but important achievements. The examples from 2010 directly inspired the examples we&#8217;ve seen in 2011. The successful implementation of these projects and the evolution of the discourse surrounding these disruptive tools are the result of substantial collaborative efforts, often by teams comprised entirely of volunteers located around the world. We feel incredibly privileged to work with the clients, partners, researchers, and visionaries who are redefining the ways that technology benefits society. To all of our partners and supporters, thank you for helping us discover new ways in which data can change the world. We can&#8217;t wait to see what 2012 will bring.</p>
<div>
<p><em>The entire team at CrowdFlower wishes you a joyful and peaceful 2012.</em></p>
</div>
</div>
</div>
</div>
</div>
<div>
<p>&nbsp;</p>
<hr align="left" size="1" width="33%" />
<div>
<p><a title="" href="#_ftnref1">[1]</a> In early 2010, we were part of the <a href="http://www.mission4636.org/">Mission 4636</a> collaboration to translate, categorize, and map SMS messages sent from survivors of the first earthquakes in Haiti. We repurposed the Mission 4636 workflow for another deployment of an Ushahidi instance with the <a href="http://www.pakreport.org">Pakreport</a> group in the wake of heavy flooding in Pakistan in the summer of 2010.</p>
</div>
</div>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2012/01/2011-retrospective-good-begets-good/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>10 Things I learned at CrowdConf 2011</title>
		<link>http://blog.crowdflower.com/2011/12/crowdsourcing-10-lessons-learne/</link>
		<comments>http://blog.crowdflower.com/2011/12/crowdsourcing-10-lessons-learne/#comments</comments>
		<pubDate>Wed, 07 Dec 2011 18:09:43 +0000</pubDate>
		<dc:creator>Mollie Allick</dc:creator>
				<category><![CDATA[Conference]]></category>
		<category><![CDATA[2011]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[CNET]]></category>
		<category><![CDATA[Coffee & Power]]></category>
		<category><![CDATA[crowdconf]]></category>
		<category><![CDATA[hackathon]]></category>
		<category><![CDATA[Journalists]]></category>
		<category><![CDATA[kickstarter]]></category>
		<category><![CDATA[lessons]]></category>
		<category><![CDATA[philip rosedale]]></category>
		<category><![CDATA[Quora]]></category>
		<category><![CDATA[taskrabbit]]></category>
		<category><![CDATA[Threadless]]></category>
		<category><![CDATA[ushahidi]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=4223</guid>
		<description><![CDATA[CrowdConf 2011 has come and gone. Here are the top 10 things we learned: 1. People like asking questions, people love giving answers (even if you don&#8217;t pay them). Charlie Cheever, the founder of Quora, pulled back the curtain and talked about who the Quora user is, and why the crowd loves to answer questions&#8212;for [...]]]></description>
			<content:encoded><![CDATA[<p>  <strong>CrowdConf</strong> 2011 has come and gone. Here are the top 10 things we learned:</p>
<h2><strong>1. People like asking questions, people love giving answers (even if you don&#8217;t pay them).</strong></h2>
<p> <strong>Charlie Cheever</strong>, the founder of <strong>Quora</strong>, pulled back the curtain and talked about who the Quora user is, and why the crowd loves to answer questions&mdash;for free.</p>
<p>  Quora&#8217;s success has come from prioritizing quality over quantity. This means one question, and oftentimes a lengthy discussion leading to a singular answer. Answers come in the form of an essay, or a research paper, due to the fact that the Quora community is largely made up of college undergrads and graduate students.</p>
<p>  All of these users have an abundance of free time coupled with a very specific knowledge set, the perfect recipe for a good answer fueled by the right crowd.</p>
<h2><strong>2. You may not be able to crowdsource good AV when planning a conference, but it&#8217;s an ideal platform for finding your next best friend.</strong></h2>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/12/taskrabbit.png"><img class="aligncenter size-full wp-image-4606" title="taskrabbit" src="http://blog.crowdflower.com/wp-content/uploads/2011/12/taskrabbit.png" alt="" width="568" height="316" /></a></p>
<p>  <span id="more-4223"></span></p>
<h2><strong>3. Moby Dick Makes a Great Emojicon Story Too!</strong></h2>
<p>  <center><a href="http://blog.crowdflower.com/wp-content/uploads/2011/12/mobydick.png"><img class="aligncenter size-full wp-image-4605" title="mobydick" src="http://blog.crowdflower.com/wp-content/uploads/2011/12/mobydick.png" alt="" width="569" height="118" /></a></center></p>
<p>  <strong>Fred Benenson</strong> from Kickstarter used the crowd to translate the entirety of Moby Dick into emojicons. To take this to the next crowdsourcing level, the whole project was funded by 83 backers on Kickstarter.</p>
<h2><strong>4. Philip Rosedale crowdsourced an iPhone app for $2,485.</strong></h2>
<p>  Using his newly released crowdsourcing platform <a href="www.coffeeandpower.com">Coffee &amp; Power</a>, Philip developed his entire company infrastructure and platform through a globally distributed workforce. 288 contributors in 127 locations worked together to get this startup off the ground in a whole new way.</p>
<p>  The Coffee &amp; Power platform was built in 1,700 commits ranging from $6 quality checks all the way up to full source-code editing. One element of this process was developing the Hudat iPhone app. In less than a month for $2,485, the <strong>Coffee &amp; Power</strong> community got this mobile app up and running.</p>
<h2><strong>5. Tech journalists believe in the crowd.</strong></h2>
<p>  <strong>Martin Giles</strong>, the US Technology Correspondent for <em>The Economist</em> and <strong>Rafe Needleman</strong>, CNET editor, moderated panels on &#8220;Industry Champions&#8221; and &#8220;Cloud Labor&#8221; respectively.</p>
<p>  These panels took a closer look at the real life day-to-day applications of crowdsourcing, and how they are shaking up traditional industry models. TopCoder, Task Rabbit, GigWalk, Get Satisfaction, Mechanical Turk, LiveOps, uTest, and Trada were all on stage to talk about how the crowd is changing the way work, as we know it, is being done.</p>
<p>  As Rafe points out <a title="P2P marketplaces: Reach out and hire someone" href="http://news.cnet.com/8301-19882_3-20128138-250/p2p-marketplaces-reach-out-and-hire-someone/?tag=contentMain;contentBody;1n" target="_blank">in this piece</a> from October, the crowdsourcing model does not all lie in the cloud. Many of the previously mentioned companies are using the crowd to connect people for labor being done in the real world. Skilled tasks ranging from home and auto repairs to resume writing to Christmas shopping are all being done by the crowd.</p>
<h2><strong>6. A crowd is a crowd, humanitarian or paid.</strong></h2>
<p>  <center><a href="http://blog.crowdflower.com/wp-content/uploads/2011/12/humanitarian.png"><img class="aligncenter size-full wp-image-4604" title="humanitarian" src="http://blog.crowdflower.com/wp-content/uploads/2011/12/humanitarian.png" alt="" width="767" height="514" /></a></center></p>
<p>  Much of the conversation at CrowdConf was about financially motivating the crowd. This panel, dedicated to crowdsourcing humanitarian work, was an insightful shift in perspective.</p>
<p>  <strong>Jeannie Stamberger</strong> from Carnegie Mellon Silicon Valley moderated a panel with <strong>Rob Munro</strong> of Global Viral Forecasting Initiative, <strong>Leila Chirayath Janah</strong> of Samasource, <strong>Patrick Meier</strong> from Ushahidi and <strong>Vijay Pande,</strong> Associate Professor at Stanford University. These organizations are taking on issues such as tracking real-time disease outbreak and interactive mapping for monitoring crisis information around the world.</p>
<h2><strong>7. Plinko has great use outside of &#8220;The Price is Right.&#8221;</strong></h2>
<p>  <center><a href="http://blog.crowdflower.com/wp-content/uploads/2011/12/plinko.png"><img class="aligncenter size-full wp-image-4607" title="plinko" src="http://blog.crowdflower.com/wp-content/uploads/2011/12/plinko.png" alt="" width="576" height="316" /></a></center></p>
<p>  <strong>Charles Festa</strong> talked about the success <a title="Threadless" href="http://www.threadless.com/" target="_blank">threadless.com</a> has found by tapping the design community within the crowd. Interestingly, he also went into detail about how the same company failied to get another project off the ground using the exact same platform.</p>
<p>  That project was Naked &amp; Angry, a pattern design concept that never really took off. For this project they used the shotgun approach to engage designers through popular social media channels. Looking back, Charles thinks a more precise trial and error process, much like the one used on the popular Price is Right game &#8220;Plinko&#8221;, would have been a much more effective way of communicating with the textile design community.</p>
<h2><strong>8. 12 hours of hacking and you can control the &#8220;crowd&#8221; into detecting skin cancer, predicting the weather, and cleaning up neighborhoods in India.</strong></h2>
<p>  The inaugural CrowdConf Hackathon was a huge success. Brendan Gill and Sina Khanifar were awarded the Best Hack for their Android app that pulls in barometric data from Android phones for a real time weather map called &#8230; that&#8217;s right, <a href="http://weathermappr.com/" target="_blank">WeatherMapper</a>. For their efforts, Brendan and Sina received a massive CrowdHack Khukuri knife flown in from Nepal.</p>
<p>  Tim Olson was awarded the Best Use of CloudFactory API for <strong>Clean Up India.</strong> Tim used CloudFactory to commission people in India to leave their computers to go out and clean up a street or park in their neighborhood. They had to show a before and after photo in order to complete the task and get paid.</p>
<p>  The Best Use of CrowdFlower API award went to John Le and Dave Oleson for <strong>melaKNOWma</strong>. Their hack allows users to upload images of any moles or growths they have to check for malignancy. It then sends the images to the crowd, where they have workers assess the mole&#8217;s asymmetry, borders, and color.</p>
<p>  Arran Bardige got the Best Use of Twilio API for <strong>Ringing Restaurants</strong>. Arran&#8217;s hack used the Twilio Robot to call local restaurants to see if they would be a good fit for his vegetarian girlfriend.</p>
<h2><strong>9. 250+ companies from more than 20 countries, ranging from single-employee startups to Fortune 50 companies came together to talk about crowdsourcing.</strong></h2>
<p>  <center><a href="http://blog.crowdflower.com/wp-content/uploads/2011/12/attendance.png"><img class="aligncenter size-full wp-image-4603" title="attendance" src="http://blog.crowdflower.com/wp-content/uploads/2011/12/attendance.png" alt="" width="722" height="449" /></a></center></p>
<h2><strong>10. The Crowd can help make next year&#8217;s CrowdConf even better.</strong></h2>
<p>CrowdFlower is already planning CrowdConf 2012 and are looking forward to the same growth, enthusiasm, and creativity that have been building over the last two years. If you have ideas or thoughts for speakers, topics, sponsors, locations, beverages, or any other content-related matter, contact <a href="mailto:mollie@crowdflower.com">mollie@crowdflower.com</a>.</p>
<p>CrowdConf is far from being the only event we participate in throughout the year. Visit our <a title="CrowdFlower Events and Announcements Sign Up Page" href="http://get.crowdflower.com/NewsEventsSignUp.html" target="_blank">Events and Announcements Sign Up Page</a> to stay in the loop about all things crowd all year long.</p>
<p><center><iframe src="http://player.vimeo.com/video/32726907?title=0&amp;byline=0&amp;portrait=0" frameborder="0" width="400" height="225"></iframe></center><center>CrowdConf 2011 Wrap-up from <a href="http://vimeo.com/user6164862">CrowdFlower</a> on <a href="http://vimeo.com">Vimeo</a>.</center></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/12/crowdsourcing-10-lessons-learne/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Did you say &#8220;Great!&#8221;, or &#8220;Oh Great!&#8221;?</title>
		<link>http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/</link>
		<comments>http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/#comments</comments>
		<pubDate>Mon, 14 Nov 2011 19:24:39 +0000</pubDate>
		<dc:creator>Jodie Ellis</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Media]]></category>
		<category><![CDATA[automated sentiment analysis]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsorcing]]></category>
		<category><![CDATA[crowdsource]]></category>
		<category><![CDATA[crowdsourced]]></category>
		<category><![CDATA[herman cain]]></category>
		<category><![CDATA[sentiment analysis]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=4240</guid>
		<description><![CDATA[Being tapped to write a blog post here at CrowdFlower is usually left to the experts. So with that, let me begin by making the disclaimer that I am neither a political analyst nor a data scientist. But I do have a personal fervor for politics and access to some impressive tools, thanks to my job [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 281px"><a href="http://www.empowernewsmag.com/userfiles/cain.jpg"><img class=" " title="Cain" src="http://www.empowernewsmag.com/userfiles/cain.jpg" alt="" width="271" height="361" /></a><p class="wp-caption-text">Image by Sarah Butrymowicz</p></div>
<p>Being tapped to write a blog post here at CrowdFlower is usually left to the experts. So with that, let me begin by making the disclaimer that I am neither a political analyst nor a data scientist. But I do have a personal fervor for politics and access to some impressive tools, thanks to my job here at <a title="the leader in enterprise crowdsourcing" href="http://crowdflower.com/" target="_blank">CrowdFlower</a>.</p>
<p>For those who aren&#8217;t familiar with CrowdFlower, we specialize in tapping human contributors worldwide to do massive amounts of simple, repetitive tasks (especially tasks that are hard for computers to do by themselves). Here&#8217;s a <a title="How It Works!" href="http://vimeo.com/26878855">quick how-it-works animation</a>.</p>
<p>I had been reading some old blog posts on the CrowdFlower blog when I came across an interesting <a title="crowdsourcing media bias" href="http://blog.crowdflower.com/2008/03/crowdsourcing-to-find-media-bias-hillary-vs-obama/" target="_blank">2008 post on election media bias</a>.</p>
<p>I determined that this could be a great opportunity to revisit sentiment analysis, and specifically set out to see if automated sentiment detection tools vs. human assessments could yield any blog-worthy findings.</p>
<p>To see how far the automated sentiment tools have come, I began by using an enterprise-grade social media monitoring tool that provides sentiment analysis.</p>
<p>I ran a few quick monitoring searches of my own to see how the current Republican Primary election was tracking — it seemed a topical place that would be chock full of good commentary.</p>
<p>The instant access to well-organized data from blogs, news sources, and a variety of social media sources was outstanding.</p>
<p>However, I was surprised to find that for each search I conducted, <strong>the automated sentiment detection tool consistently returned an overwhelming proportion of &#8220;Neutral&#8221; ratings (frequently exceeding 90%)</strong>. This seemed funny to me, given the typically emotive nature of politics.<span id="more-4240"></span></p>
<p><strong>It&#8217;s important to note that this particular tool uses a default value of &#8220;Neutral&#8221; for any post it cannot interpret.</strong></p>
<p>A particularly interesting subset of the data was several thousand tweets about Herman Cain immediately following the news of alleged sexual harassment by Cain during his time as leader of the National Restaurant Association. Surely this would yield some sentiment-rich commentary that even machines couldn&#8217;t resist tagging.</p>
<p>For the posts about Herman Cain on Oct 31st, here is what the machine detected on just under 3,000 posts:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/cainauto.png"><img class="aligncenter size-full wp-image-4629" title="cainauto" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/cainauto.png" alt="" width="513" height="306" /></a></p>
<p>Naturally, I took to the CrowdFlower platform and decided I would run the same data through a simple sentiment analysis workflow.  With the help of our team of crowdsourcing gurus, I utilized some simple, but effective best practices to control for quality (you can get a good overview <a title="crowdsourcing quality control" href="http://blog.crowdflower.com/2011/10/stopworrying/" target="_blank">here</a>). Here is what the CrowdFlower contributors detected:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caincf.png"><img class="aligncenter size-full wp-image-4630" title="caincf" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caincf.png" alt="" width="527" height="324" /></a></p>
<p>Here are just a couple of posts marked &#8220;Neutral&#8221; by the machine and &#8220;Negative&#8221; and &#8220;Positive&#8221;, respectively, by CrowdFlower contributors:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet1.png"><img class="aligncenter size-full wp-image-4631" title="caintweet1" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet1.png" alt="" width="505" height="204" /></a><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet2.png"><img class="aligncenter size-full wp-image-4632" title="caintweet2" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweet2.png" alt="" width="478" height="142" /></a></p>
<h2>Takeaways</h2>
<p>A spot check of the results on the automated set confirmed that when the machine actually tagged a post as positive or negative, it was usually very accurate (good precision).</p>
<p>However, <strong>the large amount of data that the machine was unable to make a determination on suggests that the pervasive problem of &#8216;recall&#8217; is still the big challenge with automated sentiment detection.</strong></p>
<p><strong></strong>This graph illustrates the recall difference a bit more clearly. The need for human analysis when dealing with the subtleties of language could not be more apparent.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweets.png"><img class="aligncenter size-full wp-image-4633" title="caintweets" src="http://blog.crowdflower.com/wp-content/uploads/2011/11/caintweets.png" alt="" width="650" height="254" /></a></p>
<p><strong>Automated Tool</strong>: Good precision. Poor recall.</p>
<p><strong>CrowdFlower Tool</strong>: Good precision. Good recall.</p>
<h2>Sentiment Analysis is Insightful AND Entertaining</h2>
<p>In addition to the Herman Cain Twitter data, I looked at headlines, blogs, and a broad swath of social media commentary on all the candidates. The conclusion I can draw from my effort is that sentiment detection, is indeed, still a very challenging problem to solve through automation.</p>
<p>This is consistent with what I see here at CrowdFlower daily — in today&#8217;s data-wealthy world, there are countless tasks that require human attention (good to know if my blogging career never gets off the ground).</p>
<p>Hopefully I&#8217;ll get the chance to continue exploring the sentiment about topical news as it breaks, and will look forward to sharing future findings.</p>
<p>Have experience monitoring sentiment? Let us know if this is consistent with what you&#8217;ve seen. Leave a comment.</p>
<p style="text-align: center;">***</p>
<p>To find out more about how CrowdFlower technology is used for sentiment analysis and a wide range of other human powered projects, visit the <a title="enterprise crowdsourcing products" href="http://crowdflower.com/products" target="_blank">CrowdFlower products page</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/11/crowdsourcing-sentiment-analysis-herman-cain/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing Scientific Research: Leveraging the Crowd for Scientific Discovery</title>
		<link>http://blog.crowdflower.com/2011/11/scientific-research/</link>
		<comments>http://blog.crowdflower.com/2011/11/scientific-research/#comments</comments>
		<pubDate>Fri, 04 Nov 2011 22:24:24 +0000</pubDate>
		<dc:creator>Dave Oleson</dc:creator>
				<category><![CDATA[Health]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[contributors]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[Harvard]]></category>
		<category><![CDATA[TB]]></category>
		<category><![CDATA[tuberculosis]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3978</guid>
		<description><![CDATA[Lab scientists spend countless hours manually reviewing and annotating cells. What if we could give these hours back, and replace the tedious parts of science with a hands-off, fast, cheap, and scalable solution? That’s exactly what we did when we used the crowd to count neurons, an activity that computer vision can’t yet solve. Building [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_4185" class="wp-caption alignright" style="width: 381px"><a href="http://blog.crowdflower.com/2011/11/scientific-research/all-sizes-cell-counts-flickr-photo-sharing/" rel="attachment wp-att-4185"><img class="size-full wp-image-4185" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/All-sizes-Cell-counts-Flickr-Photo-Sharing.jpg" alt="Cell counts | Flickr - Photo Sharing!" width="371" height="310" /></a><p class="wp-caption-text">Cell counts | Flickr - Photo Sharing!</p></div>
<p>Lab scientists spend countless hours manually reviewing and annotating cells. What if we could give these hours back, and replace the tedious parts of science with a hands-off, fast, cheap, and scalable solution?</p>
<p>That’s exactly what we did when we used the crowd to count neurons, an activity that computer vision can’t yet solve. Building on the work we recently did with the <a title="Harvard Tuberculosis lab" href="http://www.forbes.com/sites/techonomy/2011/10/26/crowdsourcing-scientific-progress-how-crowdflowers-hordes-help-harvard-researchers-study-tb/" target="_blank">Harvard Tuberculosis lab</a>, we were able to take untrained people all over the world (people who might never have learned that DNA Helicase unzips genes…), turn them into image analysts with our task design and quality control, and get results comparable to those provided by trained lab workers.</p>
<h3>Here’s how:</h3>
<p>We took cortex slide images from mice provided by a neuroscience lab at Harvard University. We cut each image into smaller pieces, so they’d be easier for people to work on.</p>
<p>After a brief set of instructions, contributors were instructed to count the neurons in the slide by clicking on each individual neuron.<span id="more-3978"></span></p>
<div id="attachment_3985" class="wp-caption aligncenter" style="width: 639px"><a href="http://blog.crowdflower.com/2011/11/scientific-research/cell1/" rel="attachment wp-att-3985"><img class="size-full wp-image-3985" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/cell1.png" alt="Fig 1: example slide" width="629" height="264" /></a><p class="wp-caption-text">Fig 1: example slide</p></div>
<p>Contributors were given examples of edge cases (i.e. Is this a cell or not? Is this two or three cells?). Contributors identified and clicked on every cell, which placed a green marker on top of each cell. An automated counter kept track of the number of clicks.</p>
<p>We controlled quality using Gold Standard (“Gold”) units, which were images with known cell counts that we added to the task. The benefits here are threefold. First, Gold questions provide training and feedback to our contributors, so that they can get better at the task over time. Second, contributors don’t know which questions are Gold, forcing them to honestly answer all questions. Finally, if a contributor fails to answer enough Gold correctly, we remove them from the job.</p>
<div id="attachment_4208" class="wp-caption aligncenter" style="width: 710px"><a href="http://blog.crowdflower.com/2011/11/scientific-research/world-map-contributors/" rel="attachment wp-att-4208"><img class="size-full wp-image-4208" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/world-map-contributors.png" alt="Fig 2: Map of all contributors on this task. Step it up Iceland!" width="700" height="350" /></a><p class="wp-caption-text">Fig 2: Map of all contributors on this task. Step it up Iceland!</p></div>
<p style="text-align: center;"><a title="batchgeo.com TB Cell Count Contributors" href="http://batchgeo.com/map/faa07e6c77756d3685529b17e9a14a5d" target="_blank">Visit the batchgeo.com navigable map.</a></p>
<p>After removing all of our “untrusted” contributors, we are left with our “trusted” contributors. Each image had four trusted contributors count the neurons. We took the average count less any outliers in order to get the most accurate results.</p>
<p>How did this experiment actually turn out? How did our results compare to those of trained lab workers? In short, we performed extremely well. As you can see in our results below, the average difference between the Crowd and professional lab counts is 2.0%, which can be chalked up to ambiguity in certain clusters of cells.</p>
<table style="width: 364px; margin-left: auto; margin-right: auto; text-align: center;" cellspacing="0px" cellpadding="0px">
<tbody>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Image</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">Lab Count</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">Avg. Crowd Count</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">Difference</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">% Difference</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap">Sox6ko2a-1</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">239</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">229</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-10</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(4.0%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko2a-3</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">157</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">153</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-4</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(2.4%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko2a-4</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">161</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">160</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-1</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(0.6%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko2b-1</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">250</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">240</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-11</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(4.2%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko2b-2</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">179</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">173</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-6</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(3.2%)</td>
</tr>
<tr style="border: 1px solid black;">
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko2b-3</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">134</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">130</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-4</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(3.0%)</td>
</tr>
<tr style="border: 1px solid black;">
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko2b-4</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">153</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">152</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-2</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(1.0%)</td>
</tr>
<tr style="border: 1px solid black;">
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko3a-1</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">209</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">209</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">0</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(0.1%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko3a-2</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">147</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">149</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">2</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">1.2%</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko3a-3</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">134</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">129</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-5</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(3.9%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko3a-4</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">138</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">136</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-2</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(1.6%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko3b-1</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">213</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">212</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-1</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(0.5%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">Sox6ko3b-3</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">78</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">75</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">-4</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">(4.5%)</td>
</tr>
<tr>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="67">sox6ko3b-4</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="58">54</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="101">54</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="63">0</td>
<td style="border: 1px solid black;" valign="bottom" nowrap="nowrap" width="75">0.0</td>
</tr>
</tbody>
</table>
<p>Future iterations of this will include breaking the images into smaller pieces, as well as using automated solutions to tag the high confidence units, leaving only the edge case cells that require human eyes.</p>
<p>We hope that results like these will encourage more scientists, labs, and biotech firms to crowdsource pieces of their research. We believe this could free up their time for more complicated work, decrease the latency of results for experiments, and quicken the pace of scientific discovery.</p>
<p>Yesterday it was <a title="TB Cells" href="http://www.forbes.com/sites/techonomy/2011/10/26/crowdsourcing-scientific-progress-how-crowdflowers-hordes-help-harvard-researchers-study-tb/" target="_blank">TB Cells</a>, today its neuron cells; tomorrow: cancer cells? We’re issuing an open call for any university lab, biotechnology, or pharmaceutical company with large image data sets: contact us to utilize the crowd to shorten the life cycle from idea to major scientific advancement.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/11/scientific-research/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>258 Guys in a Garage! Crowdsourcing an Entire Startup</title>
		<link>http://blog.crowdflower.com/2011/10/258-guys-in-a-garage/</link>
		<comments>http://blog.crowdflower.com/2011/10/258-guys-in-a-garage/#comments</comments>
		<pubDate>Thu, 27 Oct 2011 17:14:49 +0000</pubDate>
		<dc:creator>Philip Rosedale</dc:creator>
				<category><![CDATA[Conference]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[crowdconf]]></category>
		<category><![CDATA[philip rosedale]]></category>
		<category><![CDATA[silicon valley]]></category>
		<category><![CDATA[startup]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3944</guid>
		<description><![CDATA[About the author: Philip Rosedale is the creator of Second Life and a Co-Founder of LoveMachine, Inc. My co-founder Ryan and I are having so much fun pulling together data and thoughts for my upcoming keynote at CrowdConf next week. It&#8217;s a great opportunity to try and summarize much of what we&#8217;ve learned over the [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_3947" class="wp-caption alignright" style="width: 151px"><a href="http://blog.crowdflower.com/2011/10/258-guys-in-a-garage/philip-rosedale3/" rel="attachment wp-att-3947"><img class="size-full wp-image-3947    " title="philip-rosedale" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/philip-rosedale3.jpg" alt="philip-rosedale" width="141" height="212" /></a><p class="wp-caption-text">via: FastCompany</p></div>
<p><em>About the author: Philip Rosedale is the creator of Second Life and a Co-Founder of LoveMachine, Inc.</em></p>
<p>My co-founder Ryan and I are having so much fun pulling together data and thoughts for my upcoming keynote at CrowdConf next week. It&#8217;s a great opportunity to try and summarize much of what we&#8217;ve learned over the last few years about whether and how crowdsourcing can be taken to the next logical (we think) level: to replace a bunch of what we&#8217;ve come to think of as the nature of &#8220;work&#8221; and &#8220;company”.</p>
<p>The Silicon Valley startup formula is now a well-recognized and time-honored strategy, which I think we&#8217;ve all worn into a bit of a rut: 3 or 4 very smart people (usually guys) hunker down in someone&#8217;s garage, work a bleary-eyed 80 hours a week producing a prototype, getting funding, hiring those first handful of key engineers, etc.</p>
<p><span id="more-3944"></span>But what would happen if you never hired anyone at all? What about doing a &#8216;real&#8217; Silicon Valley startup &#8211; meaning a new, risky, design-sensitive idea backed by venture capital &#8211; but without the garage and the crunch time and team of &#8216;A players&#8217;? What if you built a system that instead allowed people all over the world to be paid small amounts of money to help you prototype, build, and then release a high quality new website and software?</p>
<p>Well that&#8217;s what we did, and there is a lot to talk about!</p>
<p>-Philip</p>
<p>&nbsp;</p>
<p><em>Come hear Philip and many other experts in the crowdsourcing field speak at <a title="CrowdConf 2011" href="http://www.crowdconf.com">CrowdConf 2011</a>. It is a great opportunity to discover what is happening in the world of crowdsourcing. It starts next week, November 1-2 at the Mission Bay Conference Center in San Francisco. You can buy your tickets <a title="CrowdConf 2011 Tickets" href="http://crowdconf2011.eventbrite.com/">here</a>. Hope to see you there!</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/10/258-guys-in-a-garage/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing and Retention: From First-Timers to Seasoned Veterans</title>
		<link>http://blog.crowdflower.com/2011/10/seasonedveterans/</link>
		<comments>http://blog.crowdflower.com/2011/10/seasonedveterans/#comments</comments>
		<pubDate>Tue, 25 Oct 2011 20:10:31 +0000</pubDate>
		<dc:creator>Patrick Philips</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[engagement]]></category>
		<category><![CDATA[gold]]></category>
		<category><![CDATA[insights]]></category>
		<category><![CDATA[judgment]]></category>
		<category><![CDATA[loyal]]></category>
		<category><![CDATA[retention]]></category>
		<category><![CDATA[veterans]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=2589</guid>
		<description><![CDATA[Millions of people have participated in our tasks over the last few years, and tens of thousands of people are active at any given moment. However, crowdsourcing is not a traditional engagement model. Tasks are elective, which means people are free to come and go as they please. It&#8217;s a fair question, then, to ask whether they [...]]]></description>
			<content:encoded><![CDATA[<p>Millions of people have participated in our tasks over the last few years, and tens of thousands of people are active at any given moment. However, crowdsourcing is not a traditional engagement model. Tasks are elective, which means people are free to come and go as they please. It&#8217;s a fair question, then, to ask whether they keep coming back.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/10/seasonedveterans/comebacksoon/" rel="attachment wp-att-3929"><img class="aligncenter size-full wp-image-3929" title="Come Back Soon" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/comebacksoon.jpg" alt="crowdsourcing " width="486" height="334" /></a></p>
<p><em>Do people perform tasks only fleetingly, or has crowdsourcing become more of a long-term engagement? </em><em>Furthermore, just how important is contributor retention in the world of crowdsourcing?</em></p>
<p>While a majority of people fall into the &#8220;one-and-done&#8221; camp, many of the most productive contributors tend to have participated in previous jobs. Within any single job, these seasoned veteran contributors also provide far more work than their less experienced counterparts.</p>
<p><span id="more-2589"></span></p>
<p>Over a period of two months, we ran a series of five very similar jobs, retrieving ratings information for businesses throughout North America. In total, we collected over half a million judgments from a total of 2,901 unique contributors,<sup><a href="#footnote-1">1</a></sup> representing multiple labor channels and 101 countries. As a first test to whether seasoned veterans are common, we looked at how many people participated in more than one job. In total, 2,389 people participated in one job only, meaning that First-Timers accounted for over 82 percent of all contributors.</p>
<p>But this may not be the best number to look at. We&#8217;re really interested in whether certain people recognize and seek out specific types of tasks after having worked on them before. To look for this behavior, we analyzed the most recent job, counting how many people participated in at least one prior iteration of the task.</p>
<p>For a recent job, a total of 906 people participated, 247 of whom had done the task previously. By this measure, Seasoned Vets constitute approximately 27% of the workforce. While the impact of returning contributors is greater under this methodology, the fact remains that these &#8220;loyal&#8221; contributors are firmly in the minority on this series of jobs.</p>
<div id="attachment_3869" class="wp-caption aligncenter" style="width: 631px"><a href="http://blog.crowdflower.com/2011/10/seasonedveterans/contributor_count/" rel="attachment wp-att-3869"><img class="size-full wp-image-3869 " title="contributor_count" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/contributor_count.jpg" alt="crowdsourcing" width="621" height="395" /></a><p class="wp-caption-text">Count of Contributors by Profile</p></div>
<p>&nbsp;</p>
<p>However, <a title="crowdsourcing" href="http://blog.crowdflower.com/2010/12/good-work-knows-no-boundaries/" target="_blank">as we&#8217;ve seen before</a>, the individual impact of contributors varies widely, with a minority of people often providing the vast majority of work. With this in mind, we looked at the contributions of First-Timers and Seasoned Vets and found some striking differences.</p>
<div id="attachment_3870" class="wp-caption aligncenter" style="width: 598px"><a href="http://blog.crowdflower.com/2011/10/seasonedveterans/contributor_share/" rel="attachment wp-att-3870"><img class="size-full wp-image-3870 " title="contributor_share" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/contributor_share.jpg" alt="crowdsourcing" width="588" height="396" /></a><p class="wp-caption-text">Share of Contributions by Profile</p></div>
<p>&nbsp;</p>
<p><strong>Seasoned Vets, while constituting only 27% of the workforce, provided 47% of the total work completed.</strong> <strong>On average, each Seasoned Veteran provided 2.5 times more judgments than their less experienced counterparts.</strong> It&#8217;s also interesting to note that there was no significant difference between the quality of work provided by First-Timers and Seasoned Vets, no doubt due to our suite of <a title="Enterprise Crowdsourcing or: How I learned to stop worrying and trust the crowd" href="http://blog.crowdflower.com/2011/10/stopworrying/">quality control measures</a>.</p>
<p>Given that the people who stick around tend to be far more productive, improving retention is a useful consideration (for these jobs, at least). We&#8217;re now interested in how best to attract people to return to the types of jobs they&#8217;ve already seen. That, however, is a work in progress and a story for another day.</p>
<hr />
<p id="footnote-1" style="text-align: -webkit-auto;">1. Note that this analysis only considers Trusted workers.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/10/seasonedveterans/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Enterprise Crowdsourcing or: How I learned to stop worrying and trust the crowd</title>
		<link>http://blog.crowdflower.com/2011/10/stopworrying/</link>
		<comments>http://blog.crowdflower.com/2011/10/stopworrying/#comments</comments>
		<pubDate>Wed, 05 Oct 2011 23:53:33 +0000</pubDate>
		<dc:creator>Patrick Philips</dc:creator>
				<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[confidence]]></category>
		<category><![CDATA[contributors]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[demographic]]></category>
		<category><![CDATA[filtering]]></category>
		<category><![CDATA[gold]]></category>
		<category><![CDATA[judgment]]></category>
		<category><![CDATA[location]]></category>
		<category><![CDATA[redundancy]]></category>
		<category><![CDATA[review]]></category>
		<category><![CDATA[training]]></category>
		<category><![CDATA[trust]]></category>
		<category><![CDATA[workflow]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3692</guid>
		<description><![CDATA[Our recent post about confidence bias, where we showed that most contributors vastly overestimate their own ability to complete tasks correctly, raised a lot of questions about how we manage quality at CrowdFlower. You might remember these themes from such classic posts as: AMT is Fast, Cheap and Good or the Wisdom of Small Crowds series [...]]]></description>
			<content:encoded><![CDATA[<p>Our recent post about confidence bias, where we showed that <a title="Confidence Bias: Evidence from Crowdsourcing" href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/" target="_blank">most contributors vastly overestimate their own ability to complete tasks correctly</a>, raised a lot of questions about how we manage quality at CrowdFlower. You might remember these themes from such classic posts as: <a title="AMT is fast, cheap, and good for machine learning data" href="http://blog.crowdflower.com/2008/09/amt-fast-cheap-good-machine-learning/">AMT is Fast, Cheap and Good</a> or the Wisdom of Small Crowds series <a title="Wisdom of Small Crowds Part 1" href="http://blog.crowdflower.com/2008/06/aggregate-turker-judgments-threshold-calibration/" target="_blank">[1]</a> <a title="Wisdom of Small Crowds Part 2" href="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-2-individual-workloads-and-rates/" target="_blank">[2]</a> <a title="Wisdom of Small Crowds Part 3" href="http://blog.crowdflower.com/2008/08/wisdom-of-small-crowds-part-3-another-worker-visualization/" target="_blank">[3]</a>.</p>
<div id="attachment_3786" class="wp-caption alignright" style="width: 240px"><a href="http://blog.crowdflower.com/2011/10/stopworrying/prospector/" rel="attachment wp-att-3786"><img class="size-full wp-image-3786    " title="prospector" src="http://blog.crowdflower.com/wp-content/uploads/2011/10/prospector.jpeg" alt="crowdsourcing" width="230" height="226" /></a><p class="wp-caption-text">via: reddead.wikia.com/</p></div>
<p>The standard CrowdFlower model is agnostic towards the quality of any individual contributor. Typically, we let anyone attempt a task, using our technology to filter out low-quality contributors and score the responses. Without further ado, what follows is quick review of the steps we take to do that filtering.</p>
<p><span id="more-3692"></span><span class="Apple-style-span" style="font-size: 20px; font-weight: bold;">Gold (What is it Good For?)</span></p>
<p>In almost every job, we take a subset of the data to be processed and manually score the correct response. This manually-scored set, which we refer to as Gold Standard Data, is at the core of managing quality in the context of enterprise crowdsourcing:</p>
<ul>
<li><strong>Filtering</strong>: We use Gold to create an up-front test, creating a barrier to entry such that only workers who understand and successfully complete a task are allowed to participate. This allows us to prevent unsavory characters from entering jobs and contaminating results.</li>
<li><strong>On-going Training</strong>: We also use Gold to conduct on-going training, offering corrections for units that are answered incorrectly. This allows us to continually instruct and improve highly prolific contributors.</li>
<li><strong>Dynamic Trust Score</strong>: We use each contributor’s performance on Gold as a basis to determine their overall accuracy within a task. Each contributor must exceed our minimum trust thresholds to continue working on a task. If at any point a contributor falls below the trust threshold, we&#8217;ll exclude his work.</li>
</ul>
<p>Because creating Gold is labor-intensive, we&#8217;ve created <a title="Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing" href="http://crowdflower.com/images/marketing/papers/HCOMP2011-philosopher-stone.pdf" target="_blank">an automated process to generate Gold</a> using units that have already been completed. This has significantly reduced the time needed for setup and ongoing job creation, without sacrificing our ability to differentiate contributors.</p>
<p>Of course, <a title="Ensuring quality in crowdsourced search relevance evaluation: The effects of training question distribution" href="http://crowdflower.com/images/marketing/papers/SIGIRpaper.pdf" target="_blank">the amount and distribution of Gold</a> is critical. Often, a uniform distribution of Gold across response types is ideal, though in certain situations we&#8217;ll use a skewed Gold set. For example, in an experiment on crowdsourced document review, we used a skewed Gold set to avoid missing relevant documents (reduced &#8220;false negatives,&#8221; if you prefer).</p>
<p style="text-align: center;"><img class="aligncenter" title="eDiscovery Results" src="http://blog.crowdflower.com/wp-content/uploads/2011/04/ediscover_stats.jpg" alt="crowdsourcing" width="835" height="264" /></p>
<h2><span class="Apple-style-span" style="font-size: 26px;">Department of Judgment Redundancy Department</span></h2>
<p>If the purpose of Gold is to manage the quality of individual contributors, we use multiple judgments per unit to improve the accuracy of completed units. The basic premise is simple enough. We look for agreement among trusted workers to indicate correct responses at the unit level. For example, if we ask four people to verify a phone number for a business, the answer is more likely to be correct if all four agree. In fact, every unit processed by CrowdFlower is annotated with a response as well as a Confidence Score (based on agreement weighted by Trust, plus some secret sauce).</p>
<p>More generally, assume we set the trust threshold for a given job at 70 percent (meaning that anyone who doesn&#8217;t answer at least 70 percent of Gold correctly gets booted) and that contributors are uniformly distributed in terms of ability (not true, but convenient). We can easily model the effect of additional judgments on estimated accuracy, showing that the probability that the majority response is correct increases with the number of judgments collected:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/10/stopworrying/redundancy/" rel="attachment wp-att-3700"><img class="aligncenter size-full wp-image-3700" title="Judgment Redundancy and Accuracy" src="http://blog.crowdflower.com/wp-content/uploads/2011/09/redundancy.jpg" alt="crowdsourcing" width="471" height="308" /></a></p>
<p>Of course, while collecting 10 judgments per unit yields highly accurate results, it may not be the most efficient way to structure a job. Imagine that the first 2, 4 or even 6 contributors agree on how a unit should be classified. At some point, the marginal impact of an additional judgment is not worth the additional cost. We&#8217;ve automated a process to vary the number of judgment each unit receives based on agreement thresholds, so that we can reach accuracy targets more efficiently.</p>
<p>The following shows actual results from a sample job, where we set a minimum confidence threshold of 0.7:</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/10/stopworrying/variable-judgments-viz/" rel="attachment wp-att-3722"><img class="aligncenter size-full wp-image-3722" title="Variable Judgments Viz" src="http://blog.crowdflower.com/wp-content/uploads/2011/09/Variable-Judgments-Viz.jpg" alt="crowdsourcing" width="694" height="454" /></a></p>
<p>Approximately 50 percent of units completed with just 2 judgments and 75 percent completed with 4 or fewer. In any case, each unit received only as many judgments as necessary to reach the confidence threshold. For any job, some subset of units will be ambiguous enough that they won&#8217;t reach a confidence threshold, so we use also set maximum judgments cap to &#8220;stop the bleeding.&#8221; Depending on the specific circumstances, we may reroute those ambiguous units to a parallel process with different structure, contributors, etc. for another round of judgments.</p>
<h2>One More Trick</h2>
<p>For complex tasks, we have developed a <strong>workflow management system</strong> to link together multiple jobs. For example, we might ask one pool of contributors to write a product description, verify the spelling and accuracy with a second pool and rank the subjective quality with a third pool. Alternatively, we might take a business listing and break out each attribute for independent collection and verification, with a separate job for name, address and phone number, or cuisine type, cash-only, types of credit cards accepted, on-site parking, or any other attribute that can be verified online. In general, <strong>peer review</strong> means that we can always give data a second pass to improve accuracy.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/10/stopworrying/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing a Map, to Eat</title>
		<link>http://blog.crowdflower.com/2011/09/crowdsourcing-a-map-to-eat/</link>
		<comments>http://blog.crowdflower.com/2011/09/crowdsourcing-a-map-to-eat/#comments</comments>
		<pubDate>Fri, 16 Sep 2011 17:15:26 +0000</pubDate>
		<dc:creator>Aron Hegyi</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[Balsamic]]></category>
		<category><![CDATA[book]]></category>
		<category><![CDATA[Campania]]></category>
		<category><![CDATA[city]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[directions]]></category>
		<category><![CDATA[food]]></category>
		<category><![CDATA[geocoding]]></category>
		<category><![CDATA[Google]]></category>
		<category><![CDATA[Google Geocoding]]></category>
		<category><![CDATA[Google Refine]]></category>
		<category><![CDATA[guidebook]]></category>
		<category><![CDATA[Italy]]></category>
		<category><![CDATA[Italy Pummel]]></category>
		<category><![CDATA[KML]]></category>
		<category><![CDATA[La Vecchia Dispensa]]></category>
		<category><![CDATA[map]]></category>
		<category><![CDATA[PDF]]></category>
		<category><![CDATA[region]]></category>
		<category><![CDATA[restaurant]]></category>
		<category><![CDATA[self-service]]></category>
		<category><![CDATA[slow food]]></category>
		<category><![CDATA[solutions]]></category>
		<category><![CDATA[Tuscany]]></category>
		<category><![CDATA[Zingerman’s]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3637</guid>
		<description><![CDATA[Last May, I took a trip to Italy for two weeks. A little bit of history: my friend Jessica and I are both Italophiles, and when her mom sent us a link to a video contest where the prize was a round trip flight to Italy, we knew we had to enter. After a week [...]]]></description>
			<content:encoded><![CDATA[<div class="wp-caption alignright" style="width: 343px"><a href="http://www.flickr.com/photos/aronh/5800985627/in/set-72157626896293314/" target="_blank"><img class="  " src="http://farm4.static.flickr.com/3098/5800985627_40ed3a7cfb.jpg" alt="A parmigiano reggiano aging room" width="333" height="500" /></a><p class="wp-caption-text">Expanse of Cheese: Caseificio di San Silvestro&#39;s Aging Room in Castelvetro (MO)</p></div>
<p>Last May, <span class="s1"><a title="I took a trip to Italy" href="http://www.youtube.com/watch?v=n66L-HplnKI" target="_blank">I took a trip to Italy</a></span> for two weeks. A little bit of history: my friend Jessica and I are both Italophiles, and when her mom sent us a link to <a title="A Video Contest" href="http://www.zingermans.com/BalsamicVideos.aspx" target="_blank">a video contest</a> where the prize was a round trip flight to Italy, we knew we had to enter. After a week of writing and editing lyrics in a Google Doc — half in Italian, half in English — <span class="s1"><a title="the resulting music video" href="http://www.youtube.com/watch?v=ot-ixjYolKM" target="_blank">the resulting music video</a> </span>ended up winning us a trip to the holy land of olive oil, vino, and other delectable edibles.</p>
<p class="p1">Apart from being a passionate eater, I&#8217;m a passionate supporter of the <span class="s1"><a title="Slow Food" href="http://slowfoodusa.org/" target="_blank">Slow Food</a></span> movement, an organization which promotes good, clean, and fair food around the world. Each year, they publish a guidebook to restaurants in Italy that adhere to their principles. In Italy, this usually means each restaurant is handpicked to showcase the traditional food of a particular region; each restaurant supports artisanal methods and products that otherwise might go extinct (were eaters not eating them), and where the food is most likely naturally organic and local anyway.</p>
<p class="p1"><span id="more-3637"></span>But, a problem: it&#8217;s 2011, and I&#8217;m more apt to travel through interactive maps than with old-fashioned guidebooks. More importantly, I don&#8217;t plan, and I needed to know what edible options were around me at any moment in time on my trip. What I really needed was a version of the guidebook, in map form, that I could use on my mobile. No such option existed, of course, and I was left with two options: magically visualize restaurants around me by poring through the guidebook, or, crowdsource it.</p>
<p class="p1">What I needed to do (and, what much of our work at CrowdFlower comprises), was structure unstructured data. To create the map from the book, I went through the following steps:</p>
<ul class="ul1">
<li class="li3"><span class="s2">Obtained <span class="s3"><a title="a PDF version of the book" href="http://ultimabooks.simplicissimus.it/catalog/product/view/id/1874/s/osterie-d-italia-2011/" target="_blank">a PDF version of the book</a></span></span></li>
<li class="li1">Split the book into pages, and uploaded each page as a separate PDF (thanks to <span class="s1"><a title="pdftk" href="http://www.pdflabs.com/tools/pdftk-the-pdf-toolkit/" target="_blank">pdftk</a></span>)</li>
<li class="li1">Created a CSV (comma separated values file) with each page&#8217;s PDF link and page number</li>
<li class="li1">Created a crowdsourcing task to structure the data, using the previously uploaded individual PDF pages</li>
<li class="li1">Geocoded the structured data</li>
<li class="li1">Output the geocoded data in KML (<span class="s1"><a title="Keyhole Markup Language" href="http://en.wikipedia.org/wiki/Keyhole_Markup_Language" target="_blank">Keyhole Markup Language</a></span>) form</li>
<li class="li1">Uploaded the KML file to a mapping site (e.g. Google Maps)</li>
</ul>
<p class="p1">Conveniently, each page that outlined a restaurant in the PDF was formatted nearly the same. This made it easy to give instructions to workers, as seen below:</p>
<div class="wp-caption aligncenter" style="width: 582px"><a href="http://publicassets.s3.amazonaws.com/pdf_transcription_test/capture_areas_example.gif"><img class=" " src="http://publicassets.s3.amazonaws.com/pdf_transcription_test/capture_areas_example.gif" alt="crowdsourcing" width="572" height="414" /></a><p class="wp-caption-text">Instructions</p></div>
<p class="p1">For each area on the page, workers were asked to copy and paste specific sections into the task. Each page, then, was split up into corresponding parts (Region, City, Directions, Restaurant Name, and the two Capture Areas). This is the essential concept here: structuring the unstructured data such that I could later geocode it properly, and display it in the way that I needed.</p>
<div class="wp-caption alignnone" style="width: 510px"><a href="http://www.flickr.com/photos/aronh/5801664934/in/set-72157626896293314/" target="_blank"><img class=" " src="http://farm6.static.flickr.com/5199/5801664934_027586411b.jpg" alt="crowdsourcing Pasta" width="500" height="333" /></a><p class="wp-caption-text">The tonnarelli cacio e pepe at Da Felice a Testaccio in Rome (used in the Geocode example)</p></div>
<p class="p1">Once the task finished, I downloaded the resulting CSV file, and whipped out <a title="Google Refine" href="http://code.google.com/p/google-refine/" target="_blank">Google Refine</a> (a.k.a. Excel on crack), which has a feature that allows you to enter a template API call that changes based on specific values in each row. Using the <a title="Google Geocoding API" href="http://code.google.com/apis/maps/documentation/geocoding/" target="_blank">Google Geocoding API</a> (any will do), I constructed the following API call, using the address value in each row as the “address&#8221; parameter for each API call:</p>
<pre style="color: #000; padding: 17px 20px; border: 1px solid #E6DB55; background: lightYellow;">http://maps.googleapis.com/maps/api/geocode/json?address=via+Mastro+Giorgio%2C+26+Roma+Lazio&amp;sensor=false&amp;region=it&amp;language=it</pre>
<p class="p1">After slicing and dicing the rest of the data into bits that I wanted to display on a map, I used Google Refine&#8217;s &#8220;templating&#8221; feature to export each row as a Placemark in KML format. Finally, I uploaded the resulting KML file into several different maps, each representing one region in Italy.</p>
<div id="attachment_3683" class="wp-caption alignnone" style="width: 477px"><a href="http://maps.google.com/maps/ms?msa=0&amp;msid=209651542845453072441.0004a388168c61ef8f9ec" target="_blank"><img class="size-full wp-image-3683  " src="http://blog.crowdflower.com/wp-content/uploads/2011/09/Screen-shot-2011-09-15-at-4.02.58-PM1.png" alt="crowdsource food map" width="467" height="304" /></a><p class="wp-caption-text">A portion of the Campania map - tons of places to explore around here!</p></div>
<p class="p1">Try it out! Check out the map for <a title="Tuscany" href="http://maps.google.com/maps/ms?msa=0&amp;msid=209651542845453072441.0004a2ac2f900cba0b2f3" target="_blank">Tuscany</a>, and the map for <a title="Campania" href="http://maps.google.com/maps/ms?msa=0&amp;msid=209651542845453072441.0004a388168c61ef8f9ec" target="_blank">Campania</a>.</p>
<p class="p1">If you want to give crowdsourcing a spin, head on over to our <a title="Crowdsource Self-Service" href="http://crowdflower.com/solutions/self-service" target="_blank">self-service product</a> and click &#8220;sign up&#8221;. Or, if you&#8217;d prefer a hands-off &#8220;we-do-everything&#8221; approach, <a title="solutions@crowdflower.com" href="mailto:solutions@crowdflower.com" target="_blank">contact us</a> to get started.</p>
<p>Watch the video that started it all: <a href="http://www.youtube.com/watch?v=ot-ixjYolKM?rel=0" target="_blank"><em>Io Sono Balsamico</em> by Balsamico</a></p>
<p><em>Aron is a Crowdsourcing Project Manager at CrowdFlower, and is the resident agriculturalist-eater. Follow him on Twitter (<a href="http://twitter.com/aron" target="_blank">@aron</a>) for more sage bits of agricultural-eating learnings.</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/09/crowdsourcing-a-map-to-eat/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Confidence Bias: Evidence from Crowdsourcing</title>
		<link>http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/</link>
		<comments>http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/#comments</comments>
		<pubDate>Wed, 07 Sep 2011 23:02:04 +0000</pubDate>
		<dc:creator>Patrick Philips</dc:creator>
				<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[bias]]></category>
		<category><![CDATA[confidence]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[experiment]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3080</guid>
		<description><![CDATA[Evidence in experimental psychology suggests that most people overestimate their own ability to complete objective tasks accurately. This phenomenon, often called confidence bias, refers to &#8220;a systematic error of judgment made by individuals when they assess the correctness of their responses to questions related to intellectual or perceptual problems.&#8221; 1 But does this hold up in crowdsourcing? We ran an experiment to [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_3593" class="wp-caption alignleft" style="width: 164px"><a href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/overconfidence/" rel="attachment wp-att-3593"><img class="size-full wp-image-3593    " title="psychologytoday.com" src="http://blog.crowdflower.com/wp-content/uploads/2011/09/Overconfidence.gif" alt="crowdsourcing" width="154" height="132" /></a><p class="wp-caption-text">psychologytoday.com</p></div>
<p>Evidence in experimental psychology suggests that most people overestimate their own ability to complete objective tasks accurately. This phenomenon, often called <em>confidence bias, </em>refers to &#8220;a systematic error of judgment made by individuals when they assess the correctness of their responses to questions related to intellectual or perceptual problems.&#8221; <sup><a href="#footnote-1">1</a></sup> But does this hold up in crowdsourcing?</p>
<p>We ran an experiment to test for a persistent difference between people&#8217;s perceptions of their own accuracy and their actual objective accuracy. We used a set of standardized questions, focusing on the Verbal and Math sections of a common standardized test. For the 829 individuals who answered more than 10 of these questions, we asked for the correct answer as well as an indication of how confident they were of the answer they supplied.</p>
<p><span id="more-3080"></span>We didn&#8217;t use any Gold in this experiment. Instead, we incentivized performance by rewarding those finishing in the top 10%, based on objective accuracy.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/sample_problem/" rel="attachment wp-att-3427"><img class="aligncenter size-full wp-image-3427" title="sample_problem" src="http://blog.crowdflower.com/wp-content/uploads/2011/08/sample_problem.jpg" alt="crowdsourcing" width="713" height="520" /></a></p>
<h2>Does Bias Exist?<em> </em></h2>
<p>To estimate confidence bias, we looked at the difference between the average of how confident an individual was of his/her answers and how many he/she answered correctly. If the difference is positive, the individual overestimated how well they did. <strong>Amazingly, over 75% of contributors overestimated their ability to answer multiple choice questions correctly.</strong></p>
<h2><a href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/histogram_res/" rel="attachment wp-att-3278"><img class="aligncenter size-full wp-image-3278" title="histogram_res" src="http://blog.crowdflower.com/wp-content/uploads/2011/07/histogram_res.jpg" alt="crowdsourcing" width="599" height="341" /></a></h2>
<h2>Are Individuals Consistently Biased?</h2>
<p>Because our dataset consisted of Math and Verbal questions, we looked at each individual contributor&#8217;s confidence bias for both types of questions. In aggregate, people tended to have more trouble with the Verbal questions (average accuracy of 28%, compared to 41% for Math), though the average confidence score was nearly identical (63% +/-1).</p>
<h2><a href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/scatterplot/" rel="attachment wp-att-3279"><img class="aligncenter size-full wp-image-3279" title="scatterplot" src="http://blog.crowdflower.com/wp-content/uploads/2011/07/scatterplot.jpg" alt="crowdsourcing" width="639" height="380" /></a></h2>
<p>The vast majority of contributors fall into the &#8220;overconfident on both&#8221; quadrant (top right), while only a handful of contributors were overconfident for one question type and underconfident for the other (top left and bottom right quadrants). Overall, there is certainly a correlation between bias scores on the two problem types, suggesting that many individuals are consistently biased on different types of problems. However, this explains only a portion of the variation.</p>
<h2>Does Bias Vary Across Groups?</h2>
<p>Given that overconfidence seems to be a consistent trait, we were curious how this trait varies across the different groups making up our contributor pool. We sliced and diced our contributors into a number of different sub-groups, which are summarized below.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/summary-table/" rel="attachment wp-att-3280"><img class="aligncenter size-full wp-image-3280" title="summary table" src="http://blog.crowdflower.com/wp-content/uploads/2011/07/summary-table.jpg" alt="crowdsourcing" width="703" height="460" /></a></p>
<p>There are a lot of interesting things going on here. To highlight a few, accuracy increases consistently as the contributor&#8217;s education level advances from High School to College, but so does confidence, leaving the bias score nearly unchanged. There&#8217;s a similar pattern with Age, with older contributors tending to be both more accurate and more confident.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/splits/" rel="attachment wp-att-3408"><img class="aligncenter size-full wp-image-3408" title="splits" src="http://blog.crowdflower.com/wp-content/uploads/2011/08/splits.jpg" alt="crowdsourcing splits" width="768" height="252" /></a></p>
<p>Gender and Location also have an effect on confidence bias. Taking the two countries that supplied the most people, contributors from the US were much more accurate and slightly more confident than the average, while those from India were average in terms of accuracy but much more confident. As such, the bias score for contributors from India is nearly double that of contributors from the US. With respect to gender, confidence didn&#8217;t vary much, but women were more accurate and thus less biased than men. Moving on.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/splits2/" rel="attachment wp-att-3405"><img class="aligncenter size-full wp-image-3405" title="splits2" src="http://blog.crowdflower.com/wp-content/uploads/2011/08/splits2.jpg" alt="crowdsourcing" width="773" height="251" /></a></p>
<h2>Further Research</h2>
<p>In the context of experimentation, we decided against using Gold to minimize any selection bias among contributors. However, this makes it difficult to apply these results to enterprise crowdsourcing, at least as practiced by CrowdFlower. In the future, it would be interesting to look at confidence bias among trusted workers only, and particularly among trusted workers with repeated experience in specific job types. We would expect these workers to have a better sense of whether their answers are correct, though it is possible (and perhaps likely) that confidence would increase along with accuracy.</p>
<p>&nbsp;</p>
<hr />
<p id="footnote-1" style="text-align: -webkit-auto;">1. Pallier, G., Wilkinson, R., Danthir, V., Kleitman, S., Knezevic, G., Stankov, L., &amp; Roberts, R. D. (2002). The role of individual differences in the accuracy of conﬁdence judgments. Journal of General Psychology, 129,257–299</p>
<p style="text-align: center;">
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/09/confidence-bias-evidence-from-crowdsourcing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing and SEM (now with even more cat pics)</title>
		<link>http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/</link>
		<comments>http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/#comments</comments>
		<pubDate>Fri, 26 Aug 2011 18:58:20 +0000</pubDate>
		<dc:creator>Zoe Vance</dc:creator>
				<category><![CDATA[Challenges]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[cats]]></category>
		<category><![CDATA[challenge]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[customers]]></category>
		<category><![CDATA[gold]]></category>
		<category><![CDATA[keyword]]></category>
		<category><![CDATA[ninja]]></category>
		<category><![CDATA[search]]></category>
		<category><![CDATA[SEM]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3450</guid>
		<description><![CDATA[Every modern business wrestles with the elusive lady that is the search engine and the potential she offers to connect with customers. Google and Bing make it easy for anyone to buy keywords and drive customers to a website, but what keywords are our customers searching for? Would a sales manager frustrated with the average [...]]]></description>
			<content:encoded><![CDATA[<p>Every modern business wrestles with the elusive lady that is the search engine and the potential she offers to connect with customers. Google and Bing make it easy for anyone to buy keywords and drive customers to a website, but what keywords are our customers searching for? Would a sales manager frustrated with the average 70-80% accuracy of business listings bought from data providers search for &#8220;crowdsourcing,&#8221; &#8220;address checking,&#8221; or something else entirely? Since we&#8217;re a crowdsourcing company, we had to try crowdsourcing the solution &#8230;</p>
<p>In the last two weeks of my summer internship at CrowdFlower, the marketing team challenged me to generate the widest range of search engine seed terms that could be used in SEM keyword tools to generate &#8220;hot&#8221; search phrases. For those of you who’ve dealt with SEM, you know that thinking of seed phrases to plug into these tools can be a painfully frustrating and surprisingly difficult task. (For those of you who haven’t and don’t believe me, try right now to describe what your company does — or anything for that matter — in 10 significantly different ways.)</p>
<p>Keyword tools are based on your thought process, which takes care of the customers who are thinking in the same way you are, but what about all the people of a different mindset who are trying to find your solution? For example, if I were looking for pet grooming services, depending on my thought process, vocabulary range, and amount of sleep the night before, I could search anything from “pet grooming salon” to “quality feline hair cuts” to “kitty bad hair day.” The challenge was to understand the full breadth of how the crowd approaches a certain problem, essentially the perfect task for the crowd.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/pet-grooming-salon-google-search/" rel="attachment wp-att-3453"><img class="aligncenter size-full wp-image-3453" src="http://blog.crowdflower.com/wp-content/uploads/2011/08/pet-grooming-salon-Google-Search.jpg" alt="crowdsourcing seo" width="753" height="167" /></a> <a href="http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/quality-feline-haircuts-google-search/" rel="attachment wp-att-3454"><img class="aligncenter size-full wp-image-3454" src="http://blog.crowdflower.com/wp-content/uploads/2011/08/quality-feline-haircuts-Google-Search.jpg" alt="crowd sourcing seo" width="756" height="170" /></a></p>
<div id="attachment_3452" class="wp-caption aligncenter" style="width: 766px"><a href="http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/kitty-bad-hair-day-google-search/" rel="attachment wp-att-3452"><img class="size-full wp-image-3452 " src="http://blog.crowdflower.com/wp-content/uploads/2011/08/kitty-bad-hair-day-Google-Search.jpg" alt="crowdsourcing sem" width="756" height="294" /></a><p class="wp-caption-text">Looks like PetSmart forgot to buy an ad for &quot;kitty bad hair day&quot;</p></div>
<p><span id="more-3450"></span><br />
I set up the job to have the contributor imagine working for a business that could use CrowdFlower’s services (whether or not they know of them) and then have the contributors write search queries they would use to find a solution to their problem on the Internet. I created numerous versions for each scenario, varying both the industry jargon and the background we gave the contributor, so that I could fully mimic the knowledge and background of a real customer, and thus elicit as wide a range of responses as possible. I then ran a second job to rate those queries to identify quality results worth pursuing. Finally, I took those quality terms and fed them back into a keyword generator tool, making sure that we had the full range of potential search phrases optimized to what people would search for the most.</p>
<p style="text-align: center;"><a href="http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/crowdflower-job-58631-preview/" rel="attachment wp-att-3457"><img class="aligncenter size-full wp-image-3457" src="http://blog.crowdflower.com/wp-content/uploads/2011/08/CrowdFlower-Job-58631-Preview.jpg" alt="crowd sourcing sem" width="620" height="312" /></a> An unspoken secondary challenge was &#8220;Can you become a full-fledged crowdsourcing ninja before you leave?&#8221; This final job I ran was by far the most challenging because crowdsourcing content generation requires a seemingly daunting use of CrowdFlower’s automated workflow system, which uses gold units and an active, real-time peer review system. I successfully passed this final test, with the help of our in-house content generation team.</p>
<div id="attachment_3533" class="wp-caption alignnone" style="width: 575px"><a href="http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/job-sample-results1-2/" rel="attachment wp-att-3533"><img class="size-full wp-image-3533 " src="http://blog.crowdflower.com/wp-content/uploads/2011/08/Job-Sample-Results11.gif" alt="crowdsourcing sem" width="565" height="274" /></a><p class="wp-caption-text">Sample Generated Keyword Seeds</p></div>
<p>The keyword gen job yielded really interesting results. There were some genuinely clever and alternative thought process seeds, some that we were already using, and some that were borderline ridiculous. My personal favorite keyword seed though, as the office intern, had to be &#8220;how to find an intern&#8221; — the obvious solution to all life’s problems.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/08/crowdsourcing-and-sem-now-with-even-more-cat-pics/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Should organizations establish a Crowdsourcing Center of Excellence?</title>
		<link>http://blog.crowdflower.com/2011/08/should-organizations-establish-a-crowdsourcing-center-of-excellence/</link>
		<comments>http://blog.crowdflower.com/2011/08/should-organizations-establish-a-crowdsourcing-center-of-excellence/#comments</comments>
		<pubDate>Thu, 18 Aug 2011 16:51:26 +0000</pubDate>
		<dc:creator>Ram Rampalli, eBay</dc:creator>
				<category><![CDATA[CTL]]></category>
		<category><![CDATA[CCE]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[crowdsourcing center of excellence]]></category>
		<category><![CDATA[crowdsourcing thought leadership]]></category>
		<category><![CDATA[ctl]]></category>
		<category><![CDATA[Ebay]]></category>
		<category><![CDATA[ram rampalli]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3435</guid>
		<description><![CDATA[When I started the crowdsourcing program within my group, I was planning to implement one or two projects. Within weeks, the number of projects grew to six. In the last year, we have experimented over fifteen different types of projects. As more and more projects with CrowdFlower graduate to production, different groups (engineering, product management, [...]]]></description>
			<content:encoded><![CDATA[<p>When I started the crowdsourcing program within my group, I was planning to implement one or two projects. Within weeks, the number of projects grew to six. In the last year, we have experimented over fifteen different types of projects.</p>
<p>As more and more projects with CrowdFlower graduate to production, different groups (engineering, product management, quality engineering) across the organization are seeing the benefits of crowdsourcing and are keen on embracing this new paradigm.</p>
<p>With so many projects in-flight and so many ideas coming up, would it make sense for organizations to setup a Crowdsourcing Center of Excellence (CCE) much like the PMOs?</p>
<p><span id="more-3435"></span></p>
<p>I personally think that there are a lot of advantages in doing this. Here are my thoughts:</p>
<ul>
<li><strong>Crowdsourcing is growing fast</strong> – new tools and technologies are emerging daily. It would benefit for a centralized team to keep abreast of the developments in this space.</li>
<li><strong>Crowdsourcing is an emerging solution</strong>, so it is important that everyone in the organization understand what Crowdsourcing can and cannot do. The CCE can evangelize, educate, mentor, and consult with teams and individuals on crowdsourcing.</li>
<li><strong>The CCE can design and implement processes and methodologies</strong> for the organization. The CCE can also develop tools that can be deployed across the organization.</li>
<li><strong>The CCE can serve as the centralized office to evaluate the many vendor options</strong>, and select the right one(s).</li>
<li><strong>The CCE can design, implement, and help manage metrics and KPIs</strong> at a project and program level.</li>
</ul>
<p>I am intentionally excluding from the charter the responsibility of running individual projects. There are pros and cons of having a centralized (or decentralized) process to implement and manage your individual projects.</p>
<p>I would like to hear from the crowdsourcing community, your thoughts and feedback on the pros and cons of establishing a CCE.</p>
<p>In particular, what do you think about these two specific questions? Thanks in advance for your feedback.</p>
<p><em>1. Do you think organizations will benefit from establishing a CCE?</em><br />
<em> 2. The CCE will obviously involve cost – to establish and run. Will this investment be justified?</em></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/08/should-organizations-establish-a-crowdsourcing-center-of-excellence/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects (Part 4)</title>
		<link>http://blog.crowdflower.com/2011/08/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-5/</link>
		<comments>http://blog.crowdflower.com/2011/08/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-5/#comments</comments>
		<pubDate>Tue, 02 Aug 2011 19:03:43 +0000</pubDate>
		<dc:creator>Ram Rampalli, eBay</dc:creator>
				<category><![CDATA[CTL]]></category>
		<category><![CDATA[API]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[crowdsourcing thought leadership]]></category>
		<category><![CDATA[Ebay]]></category>
		<category><![CDATA[ram rampalli]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3295</guid>
		<description><![CDATA[This is part of a series of guest posts by Ram Rampalli, our crowdsourcing partner at eBay. Part I &#8211; Assessment Stage Part II &#8211; Pilot Stage Part III &#8211; Analysis Stage Part IV &#8211; Production Stage About the author: Ram Rampalli created and leads the crowdsourcing program within the Selling &#38; Catalogs team at [...]]]></description>
			<content:encoded><![CDATA[<div id="attachment_3312" class="wp-caption alignright" style="width: 310px"><a href="http://blog.crowdflower.com/2011/08/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-5/kpis-3/" rel="attachment wp-att-3312"><img class="size-full wp-image-3312   " title="KPIs" src="http://blog.crowdflower.com/wp-content/uploads/2011/08/KPIs2.png" alt="KPIs Crowdsourcing" width="300" height="199" /></a><p class="wp-caption-text">KPIs (via howtoworkthis.com)</p></div>
<p><strong>This is part of a series of guest posts by Ram Rampalli, our crowdsourcing partner at eBay.</strong><br />
<a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-2/"> Part I &#8211; Assessment Stage</a><br />
<a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/">Part II &#8211; Pilot Stage</a><br />
<a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-4/">Part III &#8211; Analysis Stage</a><br />
Part IV &#8211; Production Stage</p>
<p>About the author: <em>Ram Rampalli created and leads the crowdsourcing program within the Selling &amp; Catalogs team at eBay Inc. You can follow him on Twitter (<a href="http://twitter.com/ramrampalli">@ramrampalli</a>)</em></p>
<h3>Building a successful portfolio of crowdsourcing projects – Part 4</h3>
<p>In the first three parts of this series, we discussed the <a title="Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects (Part 1)" href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-2/" target="_blank">Assessment</a>, <a title="Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects (Part 2)" href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/" target="_blank">Pilot</a>, and <a title="Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects (Part 3)" href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-4/" target="_blank">Analysis &amp; Optimization</a> stages. Now that the task is moved to production, what steps can you take to manage this effectively?</p>
<p><span id="more-3295"></span></p>
<h4>1. Leverage the CrowdFlower API</h4>
<p>When you move to production, the task is already designed, but you still need to provide input data. If you want the crowd to perform a product categorization tasks, you need to provide the crowd with the product and category information.</p>
<p>At this point, while it is not required, I recommend integrating with the CrowdFlower API. Integrating with the <a title="CrowdFlower API" href="http://crowdflower.com/docs/api" target="_blank">CorwdFlower API</a> allows you to send and receive data seamlessly. This allows you to get your data back as it’s completed without waiting for batch results in the form of a CSV.</p>
<h4>2. Identify the KPIs</h4>
<p>With the success of our early projects here at eBay, we quickly added new projects and soon had seven concurrent projects running with CrowdFlower. With this type of volume, it is impossible to track every key metric for each project all the time.</p>
<p>I recommend ranking the top three key metrics and identifying them to the CrowdFlower team. This will give the CrowdFlower team targeted metrics for which they can optimize.</p>
<h4>Closing Remarks</h4>
<p>Crowdsourcing is a relatively new paradigm. Different companies have implemented this paradigm in different ways, focusing on different ways of structuring their interactions with the crowd. So just because a project did not work under one model does not necessarily mean that the project will never work.</p>
<p>These companies are also constantly innovating, coming up with new products and solutions. Over the last 18 months we’ve worked closely with the CrowdFlower team to develop features that are now staples of their platform.</p>
<p>For me, every project has been a learning experience, and as I work on newer projects, I have lots to learn. I hope you enjoyed this four-part series. I would love to hear your thoughts and comments. Happy crowdsourcing!</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/08/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-5/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CrowdFlower Challenges Yelp: It’s a Nerd-Off</title>
		<link>http://blog.crowdflower.com/2011/07/crowdflower-challenges-yelp-it%e2%80%99s-a-nerd-off/</link>
		<comments>http://blog.crowdflower.com/2011/07/crowdflower-challenges-yelp-it%e2%80%99s-a-nerd-off/#comments</comments>
		<pubDate>Thu, 28 Jul 2011 19:33:11 +0000</pubDate>
		<dc:creator>Greg Laughlin</dc:creator>
				<category><![CDATA[Challenges]]></category>
		<category><![CDATA[BLV]]></category>
		<category><![CDATA[business listing verification]]></category>
		<category><![CDATA[challenge]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[engineer]]></category>
		<category><![CDATA[enterprise]]></category>
		<category><![CDATA[Nerd-Off]]></category>
		<category><![CDATA[Yelp]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3005</guid>
		<description><![CDATA[Dramatic Intro It is high noon in business listing verification crowdsourcing land. We are throwing down the gauntlet. We are stepping in the ring. We are mixing our metaphors. Undramatic Intro Yelp engineers recently described their efforts to correct business listing data using Amazon Turk. They tapped the services of 4,660 contributors; only 79 passed [...]]]></description>
			<content:encoded><![CDATA[<h3><strong>Dramatic Intro</strong></h3>
<p>It is high noon in business listing verification crowdsourcing land. We are throwing down the gauntlet. We are stepping in the ring. We are mixing our metaphors.</p>
<h3><strong>Undramatic Intro</strong></h3>
<p>Yelp engine<span style="color: #000000;">ers <a title="Yelp Report" href="http://yelp.typepad.com/engineeringblog_files/yelp_mturk_hcomp.pdf" target="_blank">recently described</a> their efforts t</span>o correct business listing data using Amazon Turk. They tapped the services of 4,660 contributors; only 79 passed their quality assurance testing (1.7% of contributors were “trusted”), and the data they output was (very roughly) 80% accurate.</p>
<p>This smelled funny to us. Our business listing verification service routinely returns results above 97% accuracy. In fact, some of the most recognizable names in local search and business data pay for that service. (See a <a title="BLV Customer Report" href="http://get.crowdflower.com/rs/crowdflower/images/Customer_Report_BLV.pdf" target="_blank">full report</a> on 100,000 listings we did for a major search company to see some typical figures). Out of the last couple dozen crowdsourcing tasks we’ve run, the absolute minimum proportion of contributors who were “trusted” was 34%. <strong>But more importantly, our platform identifies these trusted contributors within minutes, meaning the best contributors get the job done quickly.</strong></p>
<p><span id="more-3005"></span></p>
<div id="attachment_3008" class="wp-caption aligncenter" style="width: 510px"><a href="http://blog.crowdflower.com/2011/07/crowdflower-challenges-yelp-it%e2%80%99s-a-nerd-off/url-precision-numbers/" rel="attachment wp-att-3008"><img class="size-full wp-image-3008  " title="URL Precision Numbers (excerpted from an actual past report to client)" src="http://blog.crowdflower.com/wp-content/uploads/2011/07/URL-Precision-Numbers.png" alt="crowdsourcing URL Precision Numbers" width="500" height="324" /></a><p class="wp-caption-text">URL Precision Numbers (excerpted from an actual past report to client)</p></div>
<h3><strong>So. Why Did Yelp’s Project Struggle to Meet Enterprise Accuracy Standards?</strong></h3>
<p><strong>It was not because of a lack of brains.</strong> The bios of the eight folks at Yelp who worked on this project are smattered with words like “Harvey Mudd” and “Computer Science” and “Stanford” and “PhD”.  And they work at Yelp, which is, y’know, awesome.</p>
<p><strong>And it was not because of a lack of good contributors.</strong> CrowdFlower has first-hand experience with well over one million contributors (many from the Mechanical Turk platform), and, when given the right tools and feedback, we’ve found them to be very accurate.</p>
<h3><strong>No, Really. Why?</strong></h3>
<p>It was in part because the Yelp team did not have the tools developed over the years by CrowdFlower’s crack engineering team.</p>
<ul>
<li>Our contributors face ongoing tests as they complete work, and whenever they get an answer wrong they are <em>given feedback as to why they were wrong in real-time;</em> these tests are carefully calibrated to test for the most common types of contributor errors.</li>
<li>Our contributor UIs are the products of dozens of A|B tests run through CrowdFlower’s custom A|B testing infrastructure.</li>
<li>We use <em>digital assembly line</em> technology, chaining together many very simple tasks to yield a complex result. The below is a (somewhat outdated) representation of our business listing verification assembly line, where each blue box represents one discrete user task:</li>
</ul>
<p>&nbsp;</p>
<div id="attachment_3272" class="wp-caption aligncenter" style="width: 510px"><a href="http://blog.crowdflower.com/2011/07/crowdflower-challenges-yelp-it%e2%80%99s-a-nerd-off/digital-assembly-line-5/" rel="attachment wp-att-3272"><img class="size-full wp-image-3272 " title="Digital Assembly Line" src="http://blog.crowdflower.com/wp-content/uploads/2011/07/Digital-Assembly-Line4.png" alt="crowdsourcing Digital Assembly Line" width="500" height="468" /></a><p class="wp-caption-text">Digital Assembly Line</p></div>
<p>&nbsp;</p>
<p>Just as important… this project probably struggled to succeed because crowdsourcing to enterprise standard quality is incredibly hard! The business listing verification team at CrowdFlower only succeeded after a full year and tens of millions of human judgments. We worked quite a few 24-hour days, and our social skills atrophied from lack of use.</p>
<p>In the end, we have achieved a solution that is fast, accurate, and affordable to use – and continues to be improved upon.</p>
<p><strong>We did all this work so others won’t have to. </strong>If you’re contemplating going it alone, give us a call! It’s not worth it! So many people love you!</p>
<h3><strong>The Challenge</strong></h3>
<p>On behalf of our contributors, CrowdFlower, and the business listing verification team, I’d like to offer a challenge to you, friendly neighborhood Yelp engineers.</p>
<p><strong>Give us 5,000 business listings. If we can raise the precision of those listings to 95%+ and beat any machine learning algorithms you can build, you give us two engineers. No, just kidding. If we can do so, you’ll write about the experience on your engineering blog. </strong></p>
<p>If we lose (actually, regardless of whether we win or lose), we’ll happily sit with you to show and tell you everything we’ve ever learned about business listing verification.</p>
<p>For more information and actual sample data from another Business Listing Verification project, check out this <a title="crowdsourcing BLV Customer Report" href="http://get.crowdflower.com/rs/crowdflower/images/Customer_Report_BLV.pdf" target="_blank">customer report</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/07/crowdflower-challenges-yelp-it%e2%80%99s-a-nerd-off/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects (Part 3)</title>
		<link>http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-4/</link>
		<comments>http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-4/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 22:35:33 +0000</pubDate>
		<dc:creator>Ram Rampalli, eBay</dc:creator>
				<category><![CDATA[CTL]]></category>
		<category><![CDATA[analysis]]></category>
		<category><![CDATA[crowdsourcing thought leadership]]></category>
		<category><![CDATA[ctl]]></category>
		<category><![CDATA[Ebay]]></category>
		<category><![CDATA[optimization]]></category>
		<category><![CDATA[ram rampalli]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=3089</guid>
		<description><![CDATA[via: www.jamorama.com This is part of a series of guest posts by Ram Rampalli, our crowdsourcing partner at eBay. Part I &#8211; Assessment Stage Part II &#8211; Pilot Stage Part III &#8211; Analysis Stage Part IV &#8211; Production Stage About the author: Ram Rampalli created and leads the crowdsourcing program within the Selling &#38; Catalogs [...]]]></description>
			<content:encoded><![CDATA[<div class="mceTemp" style="text-align: center;">
<dl id="attachment_3140" class="wp-caption   alignleft" style="width: 256px;">
<dt class="wp-caption-dt"><a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-4/tuning-guitar-2/" rel="attachment wp-att-3140"><img class="size-full wp-image-3140" title="via: www.jamorama.com" src="http://blog.crowdflower.com/wp-content/uploads/2011/07/tuning-guitar1.jpg" alt="Analysis &amp; Optimization" width="246" height="160" /></a></dt>
<dd class="wp-caption-dd">via: www.jamorama.com</dd>
</dl>
</div>
<p><strong>This is part of a series of guest posts by Ram Rampalli, our crowdsourcing partner at eBay.</strong><br />
<a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-2/"> Part I &#8211; Assessment Stage</a><br />
<a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/">Part II &#8211; Pilot Stage</a><br />
Part III &#8211; Analysis Stage<br />
<a href="http://blog.crowdflower.com/2011/08/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-5/">Part IV &#8211; Production Stage</a></p>
<p>About the author: <em>Ram Rampalli created and leads the crowdsourcing program within the Selling &amp; Catalogs team at eBay Inc. You can follow him on Twitter (<a href="http://twitter.com/ramrampalli">@ramrampalli</a>)</em></p>
<h3>Building a successful portfolio of crowdsourcing projects – Part 3</h3>
<p>In the first two parts of this series, we discussed the <a title="Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects" href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-2/">Assessment</a> and <a title="Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects" href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/">Pilot</a> stages. Now that the pilot task is finished and you have a copy of the results file, it’s time to analyze these results and plan for the next steps.</p>
<h4 style="text-align: left;">What can you expect to get from CrowdFlower?</h4>
<p>CrowdFlower can provide you both the aggregated judgment file (a single “consensus” judgment per unit) and the full judgment file (all the judgments collected for that unit). Each judgment is annotated with additional data, including date/time collected, labor channel and geographic origin.<br />
<span id="more-3089"></span></p>
<h4>Analysis</h4>
<p>Post pilot, we analyze the results of the data. Many new projects that we pilot with CrowdFlower are also operational through one of our outsourcing partners. Therefore, we compare the performance of CrowdFlower with that of our outsourcer on three metrics:</p>
<ul>
<li>Performance (Speed)</li>
<li>Cost</li>
<li>Quality (Accuracy)</li>
</ul>
<p>The project sponsor marks the project metrics using traffic light icons: green (met/exceeded standard), yellow (did not meet standard but is acceptable for now), red (did not meet standard). Upon completion of this analysis, we re-engage the CrowdFlower team to review the results and determine next steps in any areas that might need improvement.</p>
<h4>Optimization</h4>
<p>The CrowdFlower platform is setup in such a way that it can adapt to changing business requirements. This adaptability is especially important as you gain insight into the data that you are collecting. The platform offers optimizations such as geographic segmentation, runtime quality checks, and increased levels of throughput.</p>
<p>Here are some examples from projects that we ran:</p>
<h5>Geographic Segmentation</h5>
<ul>
<li>In one test, we realized that the quality of work completed by US based contributors was significantly higher than the rest of the contributors because of contextual knowledge. Therefore, we restricted the task to a US based workforce only.</li>
</ul>
<h5>Runtime Quality Checks</h5>
<ul>
<li>In another test, we were able to identify a couple of runtime quality checks that can be performed on the task. Implementing these changes helped us improve the quality metric – the only lagging metric for this test.</li>
</ul>
<h5>Increased Throughput</h5>
<ul>
<li>In a third test, the speed of the work done was slower than anticipated. Therefore, we opened up the work to more contributor channels for this task and the speed significantly increased.</li>
</ul>
<p>When we implemented one or more of these optimization techniques, we ran a few tests to ensure that these changes gave us the right results. Once that was set, the next step was to move these projects to the final stage – <strong>Production</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-4/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects (Part 2)</title>
		<link>http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/</link>
		<comments>http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/#comments</comments>
		<pubDate>Wed, 20 Jul 2011 17:49:44 +0000</pubDate>
		<dc:creator>Ram Rampalli, eBay</dc:creator>
				<category><![CDATA[CTL]]></category>
		<category><![CDATA[crowdflower]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[crowdsourcing thought leadership]]></category>
		<category><![CDATA[ctl]]></category>
		<category><![CDATA[Ebay]]></category>
		<category><![CDATA[pilot]]></category>
		<category><![CDATA[ram rampalli]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=2934</guid>
		<description><![CDATA[This is part of a series of guest posts by Ram Rampalli, our crowdsourcing partner at eBay. Part I &#8211; Assessment Stage Part II &#8211; Pilot Stage Part III &#8211; Analysis Stage Part IV &#8211; Production Stage About the author: Ram Rampalli created and leads the crowdsourcing program within the Selling &#38; Catalogs team at [...]]]></description>
			<content:encoded><![CDATA[<p><a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/shuttle-launch-4/" rel="attachment wp-att-3001"><img class="size-full wp-image-3001   alignleft" title="(photo via www.insideflorida.com)" src="http://blog.crowdflower.com/wp-content/uploads/2011/07/shuttle-launch3.jpg" alt="(photo via www.insideflorida.com)" width="255" height="200" /></a></p>
<p><strong>This is part of a series of guest posts by Ram Rampalli, our crowdsourcing partner at eBay.</strong><br />
<a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-2/"> Part I &#8211; Assessment Stage</a><br />
Part II &#8211; Pilot Stage<br />
<a href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-4/">Part III &#8211; Analysis Stage</a><br />
<a href="http://blog.crowdflower.com/2011/08/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-5/">Part IV &#8211; Production Stage</a></p>
<p>About the author: <em>Ram Rampalli created and leads the crowdsourcing program within the Selling &amp; Catalogs team at eBay Inc. You can follow him on Twitter (<a href="http://twitter.com/ramrampalli">@ramrampalli</a>)</em></p>
<p>A quick aside before jumping into Part 2: This series lays out a methodology for compiling a successful portfolio of high-accuracy, deterministic crowdsourcing projects done through the <strong><span style="text-decoration: underline;">CrowdFlower</span></strong> platform. It is not an absolute methodology for all possible crowdsourcing projects.</p>
<h3>Building a successful portfolio of crowdsourcing projects – Part 2</h3>
<p>In the first part of this series, we discussed the <a title="Crowdsourcing Thought Leadership: Building a successful portfolio of crowdsourcing projects" href="http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-2/" target="_blank">Assessment</a> stage. If your proposed project passed the initial assessment, it graduates to the next stage: <strong>Pilot</strong>.</p>
<p><span id="more-2934"></span></p>
<p>The first recommended step in the pilot stage is the <strong>pilot requirements specification</strong>. In my experience, some ideas that passed the assessment stage were later disqualified after further review with the CrowdFlower team. This prompted us to come up with the pre-pilot document, in which we ask the project sponsors to complete pilot set-up information that also serves as a second checkpoint:</p>
<ul>
<li><strong>Name</strong>
<ul>
<li>Choose a simple and descriptive name.</li>
</ul>
</li>
<li><strong>Objective</strong>
<ul>
<li>Define clear and concise objectives.</li>
<li>Example: <em>Improve the recommendation engine algorithm accuracy from 65% to 90% by end of 2011.</em></li>
</ul>
</li>
<li><strong>Description</strong>
<ul>
<li>Describe the task that you expect the worker to perform. If it’s difficult to describe in a few sentences, you may need to refine the task.</li>
<li><em>Often projects are rejected when the sponsor cannot provide a simple job description. </em></li>
</ul>
</li>
<li><strong>Sample Data</strong>
<ul>
<li>Provide sample data that is representative of the source data for the ongoing project.</li>
<li><em>This process often results in project sponsors uncovering bottlenecks in the data generation process.</em></li>
</ul>
</li>
<li><strong>Gold Units</strong>
<ul>
<li>The project sponsor should either present sample <a title="Gold Units" href="http://crowdflower.com/docs/gold" target="_blank">gold units</a> (preferable) or have a plan in place for the gold units.</li>
</ul>
</li>
<li><strong>Volume</strong>
<ul>
<li>Define how often the test is to be run and the volume for each run.</li>
</ul>
</li>
<li><strong>Success Metrics</strong>
<ul>
<li>Define the metrics to evaluate the success of this project.</li>
<li>If this project is currently run through outsourcing, or other means, list the current performance metrics.</li>
</ul>
</li>
</ul>
<p>If the project satisfies the pilot specifications document, we work with the CrowdFlower team to design the pilot, with thorough testing by the CrowdFlower team prior to launch.</p>
<p>Many projects are launched in two stages.</p>
<h4><span style="text-decoration: underline;">Stage I: Controlled Launch</span></h4>
<p>In the first stage, we do a controlled launch of a subset of all units. We monitor the task closely to make sure that it is on track to achieve the success metrics outlined previously. If necessary, we calibrate and optimize the task to improve performance.</p>
<h4><span style="text-decoration: underline;">Stage II: Full Launch</span></h4>
<p>Once we are sure the project is running smoothly we then open up the task at full scale. Upon completion, CrowdFlower sends a copy of the results.</p>
<p>To save time, be sure to agree upon the structure of the results file, so that you are getting the data in its most usable format.</p>
<p>This sets you up for the next phase – <strong>Optimization and Analysis</strong>.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/07/crowdsourcing-thought-leadership-building-a-successful-portfolio-of-crowdsourcing-projects-3/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

