<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The CrowdFlower Blog &#187; Law</title>
	<atom:link href="http://blog.crowdflower.com/topics/law-2/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.crowdflower.com</link>
	<description></description>
	<lastBuildDate>Tue, 10 Jan 2012 20:00:35 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>eDiscovery, meet Crowd</title>
		<link>http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/</link>
		<comments>http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/#comments</comments>
		<pubDate>Fri, 06 May 2011 19:58:01 +0000</pubDate>
		<dc:creator>Patrick Philips</dc:creator>
				<category><![CDATA[Conference]]></category>
		<category><![CDATA[Experiments]]></category>
		<category><![CDATA[Law]]></category>
		<category><![CDATA[crowdsourcing]]></category>
		<category><![CDATA[document review]]></category>
		<category><![CDATA[eDiscovery]]></category>
		<category><![CDATA[legal]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=2441</guid>
		<description><![CDATA[Once upon a time, I had a job that included looking through boxes of documents that were supposedly related to environmental litigation, but were generally (a) unrelated, (b) dusty and (c) mind-numbingly dull. Earlier this year, as I looked back on those dark days, it seemed to me that crowdsourcing would be a great tool [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/" data-text="eDiscovery, meet Crowd" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div></div><p>Once upon a time, I had a job that included looking through boxes of documents that were supposedly related to environmental litigation, but were generally (a) unrelated, (b) dusty and (c) mind-numbingly dull. Earlier this year, as I looked back on those dark days, it seemed to me that crowdsourcing would be a great tool for a first pass through documents, helping a legal team focus its efforts away from documents that are obviously not responsive to a given request.</p>
<p><span id="more-2441"></span></p>
<p>To test this suspicion, we used a dataset of ~2,700 documents that were pre-coded with relevance assessments by a team of legal experts associated with the TREC 2010 Legal Learning task.<sup><a href="#footnote-1">1</a></sup> We asked multiple workers whether each document, emails made public during the course of the Enron investigation, was responsive to a request for Residential Real Estate (full instructions <a href="http://plg1.uwaterloo.ca/~gvcormac/treclegal09/topic.txt">here</a>):</p>
<p><img class="aligncenter size-full wp-image-2442" title="ediscovery_ui" src="http://blog.crowdflower.com/wp-content/uploads/2011/04/ediscovery_ui.jpg" alt="" width="647" height="425" /></p>
<p style="text-align: -webkit-auto;"><strong><span style="font-size: 26px;">&#8220;So a Team of Lawyers walks into a Conference Room&#8230;&#8221;</span></strong></p>
<p style="text-align: -webkit-auto;">We ran two iterations of the document review task. In the first, we used a Gold <sup><a href="#footnote-2">2</a></sup> distribution of 50% Relevant and 50% Not Relevant. For the second iteration, we increased the proportion of Relevant documents in our Gold set to 60%. Our thinking was that, at least in the context of litigation, returning an irrelevant document (false positive) was preferable to missing a Relevant document (false negative).</p>
<p style="text-align: center;"><a rel="attachment wp-att-2448" href="http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/ediscover_stats/"><img class="aligncenter size-full wp-image-2448" title="ediscover_stats" src="http://blog.crowdflower.com/wp-content/uploads/2011/04/ediscover_stats.jpg" alt="" width="835" height="264" /></a></p>
<p style="text-align: -webkit-auto;">In the chart above, <strong>Recall</strong> is the percentage of all responsive documents that were returned; it measures how thorough the search is. <strong>Precision</strong> is the percentage of returned documents that are responsive; it measures how accurate the process is. <strong>F1</strong> is the harmonic mean, a simple summary measure that rewards high values in both. We used a simple majority among workers to determine a document&#8217;s relevance.</p>
<p style="text-align: -webkit-auto;">Before looking at how this performance compares with manual and automated document review, it&#8217;s worth noting that changing the Gold distribution had very little effect on the overall accuracy, but it had a large effect on the distribution of errors. By increasing the frequency of Relevant Gold units, we cut the number of false negative errors by nearly 50%. In a context where one type of error is relatively more &#8220;expensive&#8221; than another, this is a useful tool to be aware of.</p>
<p style="text-align: -webkit-auto;">Without running the same dataset through crowdsourced, automated, and manual document review, it&#8217;s difficult to compare performance across methods. Nevertheless, Grossman and Cormack (2011) discuss manual and automated document review, finding that average recall for manual review of documents can be as low as 20-50%, though typically with much higher precision. For automated review on a dataset similar to the one we used, recall averaged 77%, though with average precision of 85%.<sup><a href="#footnote-3">3</a></p>
<p></sup></p>
<p style="text-align: -webkit-auto;"><strong><span style="font-size: 26px;">Living in a (non)Binary World</span></strong></p>
<p>Every document in our test was also graded with a probabilistic measure of Relevance by default. Because we asked multiple reviewers whether a given document was Relevant, we used inter-coder agreement to suggest the likelihood that a document is responsive. Further, because we tracked each individual worker&#8217;s performance on Gold, we weighted each worker&#8217;s contribution to the agreement by his/her estimated accuracy.</p>
<p>For this exercise, we remapped our confidence scores such that a document that was the least likely to be Relevant received a Relevance Score of 0.01, while the documents most likely to be Relevant received a Relevance Score of 0.99. <sup><a href="#footnote-4">4</a></sup> The distribution of documents by Relevance Score is included below.</p>
<p><a rel="attachment wp-att-2453" href="http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/ediscover_counts/"><img class="aligncenter size-full wp-image-2453" title="ediscover_counts" src="http://blog.crowdflower.com/wp-content/uploads/2011/04/ediscover_counts.jpg" alt="" width="519" height="389" /></a></p>
<p>Note that because most documents in this test received judgments from three different workers, there isn&#8217;t much variation in the middle of the distribution. Most units had either unanimous agreement or a 2/1 split. Nevertheless, the Relevance Score makes it possible to set a threshold on what should be considered Relevant.</p>
<p>By changing the threshold, we can include any document that received at least one judgment of Relevant (increased Recall) or to include only documents that did not receive any judgment of Not Relevant (increased Precision). As shown below, different thresholds dramatically influence the number and characteristics of the documents returned as Relevant.</p>
<p><a rel="attachment wp-att-2539" href="http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/pr_flip/"><img class="aligncenter size-full wp-image-2539" title="PR_flip" src="http://blog.crowdflower.com/wp-content/uploads/2011/05/PR_flip.jpg" alt="" width="640" height="642" /></a></p>
<p style="text-align: -webkit-auto;">While there is no substitute for trained legal experts, these results show that crowdsourcing is an effective complement to eDiscovery document review. The promise of putting multiple pairs of eyes on every document dramatically decreases the likelihood of missing a relevant document. And consider that in less than 24 hours, we collected over 15,000 unique relevance judgments on nearly 3,000 documents, and for much less than the billing rate of your average attorney.</p>
<hr />
<p id="footnote-1" style="text-align: -webkit-auto;">1 <a href="http://plg1.uwaterloo.ca/~gvcormac/treclegal09/">http://plg1.uwaterloo.ca/~gvcormac/treclegal09/</a></p>
<p id="footnote-2" style="text-align: -webkit-auto;">2 One of the ways that we control for quality is by randomly inserting a subset of units for which we already know the answers. We refer to this data as Gold. We track worker performance on these Gold units as a proxy for overall accuracy. Additional documentation is <a href="http://crowdflower.com/docs/gold">here.</a></p>
<p id="footnote-3" style="text-align: -webkit-auto;">3 Maura R. Grossman &amp; Gordon V. Cormack, Technology-Assisted Review in E-Discovery Can Be More Effective and More Efficient Than Exhaustive Manual Review, XVII  RICH. J.L. &amp; TECH. 11 (2011), <a href="http://jolt.richmond.edu/v17i3/article11.pdf">http://jolt.richmond.edu/v17i3/article11.pdf</a></p>
<p id="footnote-4" style="text-align: -webkit-auto;">4 For &#8220;Relevant&#8221; documents, P(Relevance)=0.99*(Confidence). For &#8220;Not Relevant&#8221; documents, P(Relevance)=1-0.99(Confidence)</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2011/05/ediscovery-meet-crowd/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Regulating Distributed Work (Part Three: Why It&#8217;s a Good Idea)**</title>
		<link>http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/</link>
		<comments>http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/#comments</comments>
		<pubDate>Tue, 29 Jun 2010 20:29:30 +0000</pubDate>
		<dc:creator>Alek Felstiner</dc:creator>
				<category><![CDATA[Economics]]></category>
		<category><![CDATA[Labor Law]]></category>
		<category><![CDATA[Law]]></category>
		<category><![CDATA[Miscellaneous]]></category>
		<category><![CDATA[Regulation]]></category>
		<category><![CDATA[law]]></category>

		<guid isPermaLink="false">http://blog.crowdflower.com/?p=709</guid>
		<description><![CDATA[In previous posts, I discussed the nature of employment law as it relates to crowd work, and the problems involved in trying to classify crowd workers according to existing categories and in transferring rights of free assembly and collective action into virtual space. Now comes the controversial part: explaining why I think it’d be a good [...]]]></description>
			<content:encoded><![CDATA[<div class="socialize-in-content" style="float:left;"><div class="socialize-in-button socialize-in-button-left"><a href="http://twitter.com/share" class="twitter-share-button" data-url="http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/" data-text="Regulating Distributed Work (Part Three: Why It&#8217;s a Good Idea)**" data-count="vertical" data-via="crowdflower" ><!--Tweetter--></a></div><div class="socialize-in-button socialize-in-button-left"><script>
			<!-- 
			var fbShare = {
				url: "http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/",
				size: "large",
				google_analytics: "true"
			}
			//-->
			</script>
                        <script src="http://widgets.fbshare.me/files/fbshare.js"></script></div><div class="socialize-in-button socialize-in-button-left"><script type="in/share" data-url="http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/" data-counter="top"></script></div><div class="socialize-in-button socialize-in-button-left"><g:plusone size="small" href="http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/"></g:plusone></div></div><p>In <a href="http://blog.crowdflower.com/2010/05/regulating-distributed-work-part-one-employment-classification/" target="_blank">previous</a> <a href="http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-two-free-assembly-collective-action/" target="_blank">posts</a>, I discussed the nature of employment law as it relates to crowd work, and the problems involved in trying to classify crowd workers according to existing categories and in transferring rights of free assembly and collective action into virtual space. Now comes the controversial part: explaining why I think it’d be a good idea for the law to jump into the middle of this complicated mess and start telling people what to do.</p>
<p><a href="http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/200px-us_department_of_justice_scales_of_justice/" rel="attachment wp-att-738"><img src="http://blog.crowdflower.com/wp-content/uploads/2010/06/200px-US_Department_of_Justice_Scales_Of_Justice.png" alt="Scales of Justice" title="200px-US_Department_of_Justice_Scales_Of_Justice" style="border: none !important" width="120" height="140" class="alignleft size-full wp-image-738"/></a></p>
<p>For some lawyers and lawmakers, “because we can” is a good enough reason. Others might press for regulation because advising clients in a regulation-free market generates fewer billable hours. But for a moment, let’s at least pretend that we as a society ought to engage in some kind of critical inquiry before intervening in an as-yet unregulated industry. And, while we’re pretending, let’s presume that such an inquiry would be shaped not by political dynamics but by the best information we have regarding how the law works and how regulation affects economic and social activity.</p>
<p>I’m not an economist, so I won’t be discussing the potential influence of economic theory on regulatory policy in this area. Instead I’ll focus on how the law deals with scenarios, like this one, in which existing doctrine appears woefully ill-equipped. The first question should always be: Does a problem actually exist? (Contrary to what you may believe, many lawyers and judges are perfectly willing to leave well enough alone. We’re not all “activists,” and in some cases, the most activist thing one can do is to permit the unfettered private ordering of employment relationships.)</p>
<p><span id="more-709"></span></p>
<p>So does a problem exist? When I’ve presented <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1593853" target="_blank">my argument</a> that Mechanical Turk Providers should be classified as statutory employees, and that Amazon should function as a joint employer, I’ve gotten a variety of responses from classmates and colleagues. A few agree right off the bat, perhaps out of ideological sympathy (or pity). Others reject the argument, deciding that to the extent the parties are legally connected at all, they are governed by private contracts. And some go a step further. They conclude that no one in this situation is really a performing the kind of “work” that any of our laws &mdash; employment, labor, or contract &mdash; ought to regulate.  In other words, they’re saying that there isn’t a problem. At least not one the law can address.</p>
<p>As you might imagine, I vigorously dispute that view. My argument for statutory coverage may stretch a little thin in places, but just because crowd workers don’t fit the “statutory employee” definition does not mean they fall easily into another. And they <em>are</em> being paid for their work. Many of them (<a href="http://www.ics.uci.edu/~jwross/pubs/RossEtAl-WhoAreTheCrowdworkers-altCHI2010.pdf" target="_blank">perhaps as high as 18%</a>) rely on it to make ends meet. It seems self-evident to me that their work should fit somewhere on the employment law spectrum, and if there’s no space right now, we should make room.</p>
<p><span style="text-decoration: underline">Something Doesn’t Look Right<br />
</span>It is true that many crowd workers perform their tasks in spare time, while doing something else, or for recreational/entertainment purposes. And often, that kind of work ends up outside the scope of employment law. But that’s not <em>because</em> it gets performed in spare time, or while watching TV, or simply for fun. It’s because when we think of idle college students, retirees, and stay-at-home moms, we think of them filling their time with entertainment, volunteerism, or education-focused internships — none of which are covered by employment statutes.</p>
<p>The key thing to recognize here is that for the most part the work itself determines statutory coverage. Or, at least, that’s the way it should be (agricultural and domestic workers absolutely deserve protection, in my view, but were excluded from minimum wage and collective bargaining laws for political and cultural reasons). Regardless of who they are, or why exactly they perform these tasks, crowd workers don’t fit the picture of the type of workers legislatures, courts, and administrative agencies have traditionally chosen to exempt from statutory coverage. They can bargain independently on only certain crowdsourcing platforms, and rarely have an opportunity to maximize profits through business organization and initiative. In short, though they may think of themselves as entrepreneurs, they aren&#8217;t really the type of entrepreneurs that employment law tends to leave alone. Turkers and similar crowd workers would more accurately be described as fungible particles in an on-demand labor pool. In that sense, they resemble day laborers, migrant farmworkers, and urban domestic workers. Most of them deserve coverage, they just don&#8217;t have it yet.</p>
<p><span style="text-decoration: underline">The Law Abhors a Vacuum<br />
</span>I’ll reiterate at this point that I have no particular economic expertise. My amateurish assessment leads me to believe that crowd labor presents at least some potential for market failure (information asymmetry, deception, problems with competition and global supply, etc.). I readily concede that it’s probably too soon to give any weight to those conjectures. Luckily for me, legal scholars don’t really require an impending market failure to justify regulatory intervention. Impending <em>legal</em> failure will suffice. </p>
<p>If we have an unstable, growing industry, with no reliable law and an unclear picture of who may owe what duties to whom, we can end up with problems. Stakeholders can’t adequately assess and manage risk. Lawyers give bad, conflicting advice, or, worse, there’s no way to tell whether any advice is good or bad. Practices develop, and expectations settle, without any consideration of how they might fit or contradict our existing legal principles and public policies. The law abhors a vacuum. Absence of regulation may be a major boon to industry pioneers (such as the one that has been generous enough to grant me space on its blog), but regulatory vacuums can really wreak havoc on the rest of us.</p>
<p><span style="text-decoration: underline">“Wait and See” Created This Problem<br />
</span>I have heard some in the industry and in the cyberlaw field suggest that it may be too early to address legal problems presented by online work. They argue that we don’t know exactly how it will play out, and that premature regulation could unintentionally suppress the healthy development of online democracy, commerce, and information exchange. I agree that we don&#8217;t know how it will develop, and that in regulating now we run some risk of stifling valuable development. But this argument really underestimates both the flexibility of the law and our own capacity to identify and articulate our priorities. Regulation does not necessarily imply blanket prohibitions and severe criminal penalties. There are creative legislators, lawyers, and judges out there. For that matter, there is no reason crowdsourcing stakeholders couldn’t participate in crafting a flexible and somewhat open-ended or discretionary approach to regulating crowd work. And we ought to be able to figure out our objectives without knowing exactly how the technology and industry will develop. For all its faults, that is the function of the legislative process, and if we trust it at all, we can trust it in this context.</p>
<p>What we shouldn’t do is “wait and see.” “Wait and see,” or rather, “wait and ignore,” is what got us here in the first place. It may be that in order to craft an effective regulatory approach to virtual property, lawmakers require a more fully developed picture of VP transactions. But such procrastination has not helped American workers in the slightest. Our laws were out of touch before the Internet. Permatemps, day laborers, and other contingent workers are already falling outside the reach of laws that should protect them. We cannot afford to exacerbate the problem.</p>
<p>Moreover, now is actually a <em>good</em> time to undertake some kind of regulatory intervention. Once expectations have settled, and the industry has begun to function in a certain way, and accumulate its own political clout, legislators and judges will find it more and more difficult to set rules. Customary practices will become norms, and eventually transform into sanctified industrial principles that cannot be disturbed. I’m sure that suits companies like Amazon just fine, since they already play such a prominent role in the industry and will likely continue. But my inner organizer and my inner corporate reformer don’t want to see settled expectations become law simply by virtue of the fact that things happen to have turned out that way. Even if the emerging structure of crowd labor perfectly reflected pure economic principles, and could thus function happily and indefinitely without any correction, I still wouldn&#8217;t want to see that structure automatically become law. Neither should you, if (like me) you believe in the potential of crowd work to transform economies and provide unprecedented opportunity. We have a chance to do better by workers (and employers) this time around, and we should take it.</p>
<p>**Note: This is the opinion of the author, and is not necessarily shared by CrowdFlower or, say, its CEO.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.crowdflower.com/2010/06/regulating-distributed-work-part-three-why-its-a-good-idea/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
	</channel>
</rss>

