<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress/2.3.3" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>
<channel>
	<title>Comments on: Crowdsifter: More Efficient Content Filtering</title>
	<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/</link>
	<description></description>
	<pubDate>Thu, 11 Mar 2010 22:01:45 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.3.3</generator>
		<item>
		<title>By: Pelez</title>
		<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1354</link>
		<dc:creator>Pelez</dc:creator>
		<pubDate>Mon, 20 Jul 2009 08:07:39 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1354</guid>
		<description>Пора переименовать блог, присвоив слово связанное с доменами :) может хватит про них?</description>
		<content:encoded><![CDATA[<p>Пора переименовать блог, присвоив слово связанное с доменами :) может хватит про них?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick Perry</title>
		<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1336</link>
		<dc:creator>Patrick Perry</dc:creator>
		<pubDate>Tue, 14 Jul 2009 19:22:01 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1336</guid>
		<description>I should add that: a) it would be nice to see some plots with standard error estimates, and b) it may not be feasible to get a good estimate when you have less than 1% porn in the training set</description>
		<content:encoded><![CDATA[<p>I should add that: a) it would be nice to see some plots with standard error estimates, and b) it may not be feasible to get a good estimate when you have less than 1% porn in the training set</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Patrick Perry</title>
		<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1335</link>
		<dc:creator>Patrick Perry</dc:creator>
		<pubDate>Tue, 14 Jul 2009 19:17:12 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1335</guid>
		<description>This is interesting stuff.  Clearly, you guys have thought about the problem a lot and are doing a great job.

I suspect these measurements are be biased, though I can't figure out in what direction.  In the real world, the number of porn images is nowhere near 40%.  I don't know what the actual number is, but I would expect less than 0.01% (you guys probably have a better idea about this than I do).  Porn images are rare.  I can see two potential consequences of this.

Scenario 1: workers almost always click "not porn", and fatigue sets in.  They stop paying attention to the task and their accuracy goes down.

Scenario 2: because porn is so rare and so different from the other images, it really stands out.  Workers do not have to think much or pay much attention to get a very good accuracy rate.

Do you guys have any guesses as to which of these happens?  I would be very interested in seeing how the accuracy changes as you vary the proportion of porn images in the training set.</description>
		<content:encoded><![CDATA[<p>This is interesting stuff.  Clearly, you guys have thought about the problem a lot and are doing a great job.</p>
<p>I suspect these measurements are be biased, though I can&#8217;t figure out in what direction.  In the real world, the number of porn images is nowhere near 40%.  I don&#8217;t know what the actual number is, but I would expect less than 0.01% (you guys probably have a better idea about this than I do).  Porn images are rare.  I can see two potential consequences of this.</p>
<p>Scenario 1: workers almost always click &#8220;not porn&#8221;, and fatigue sets in.  They stop paying attention to the task and their accuracy goes down.</p>
<p>Scenario 2: because porn is so rare and so different from the other images, it really stands out.  Workers do not have to think much or pay much attention to get a very good accuracy rate.</p>
<p>Do you guys have any guesses as to which of these happens?  I would be very interested in seeing how the accuracy changes as you vary the proportion of porn images in the training set.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Bob Carpenter</title>
		<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1331</link>
		<dc:creator>Bob Carpenter</dc:creator>
		<pubDate>Mon, 13 Jul 2009 16:58:21 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1331</guid>
		<description>I'm confused by the comments. Isn't Crowdsifter implemented on top of Mechanical Turk?  That's what the post says in the very first line after the quote.  I understand why you can't just use their bulk task mechanisms, but at some level, everything bottoms out in their web API.

My second question is: why is this task so hard?  Are there really so many borderline cases that a 5% false negative is reasonable?  Maybe you need Justice Potter Stewart as an annotator.

My third question is that if the task is as hard as all that, what's your agreement among the people doing the "gold standard" annotations? Or do you just annotate easy cases in your gold standard?  That'd probably work just as well for adjusting for annotator accuracy in predictions.</description>
		<content:encoded><![CDATA[<p>I&#8217;m confused by the comments. Isn&#8217;t Crowdsifter implemented on top of Mechanical Turk?  That&#8217;s what the post says in the very first line after the quote.  I understand why you can&#8217;t just use their bulk task mechanisms, but at some level, everything bottoms out in their web API.</p>
<p>My second question is: why is this task so hard?  Are there really so many borderline cases that a 5% false negative is reasonable?  Maybe you need Justice Potter Stewart as an annotator.</p>
<p>My third question is that if the task is as hard as all that, what&#8217;s your agreement among the people doing the &#8220;gold standard&#8221; annotations? Or do you just annotate easy cases in your gold standard?  That&#8217;d probably work just as well for adjusting for annotator accuracy in predictions.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: brendano</title>
		<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1326</link>
		<dc:creator>brendano</dc:creator>
		<pubDate>Sat, 11 Jul 2009 21:36:37 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1326</guid>
		<description>Josh, on implementing with the standard AMT API -- it makes very strong assumptions that you want the same number of annotations per item.  Furthermore, it has a static publishing model so you can't make dynamic decisions which items should get more; and you need your own gold testing system; and etc etc like lukas said</description>
		<content:encoded><![CDATA[<p>Josh, on implementing with the standard AMT API &#8212; it makes very strong assumptions that you want the same number of annotations per item.  Furthermore, it has a static publishing model so you can&#8217;t make dynamic decisions which items should get more; and you need your own gold testing system; and etc etc like lukas said</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: lukas</title>
		<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1322</link>
		<dc:creator>lukas</dc:creator>
		<pubDate>Fri, 10 Jul 2009 21:53:03 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1322</guid>
		<description>Josh - Great question.

1) It's not trivial to implement an alogirthm using the turk API where different images are shown with different frequencies.  It's also not trivial to hide in gold standard data that immediately calls out worker performance.

2) Finding an optimal weighting algorithms and worker quality calculation is not trivial.

3) We have historical data on every worker that has done this task, so we know which workers are higher quality and lower quality and we know who is scamming us.

The result is more than double the efficiency of aggregating turk results without these benefits.  So not only do you NOT have to deal with the turk API, and build your own reporting tools, but your images will get the same quality of moderation in half the time.

My goal with this post was less to brag about the quality of our system and more because I thought our intern John had done a great job of systematically measuring our quality and comparing to a baseline.  I thought it would be interesting to show blog readers how we do our quality measurements and the way think about the quality problem with multiple annotators.</description>
		<content:encoded><![CDATA[<p>Josh - Great question.</p>
<p>1) It&#8217;s not trivial to implement an alogirthm using the turk API where different images are shown with different frequencies.  It&#8217;s also not trivial to hide in gold standard data that immediately calls out worker performance.</p>
<p>2) Finding an optimal weighting algorithms and worker quality calculation is not trivial.</p>
<p>3) We have historical data on every worker that has done this task, so we know which workers are higher quality and lower quality and we know who is scamming us.</p>
<p>The result is more than double the efficiency of aggregating turk results without these benefits.  So not only do you NOT have to deal with the turk API, and build your own reporting tools, but your images will get the same quality of moderation in half the time.</p>
<p>My goal with this post was less to brag about the quality of our system and more because I thought our intern John had done a great job of systematically measuring our quality and comparing to a baseline.  I thought it would be interesting to show blog readers how we do our quality measurements and the way think about the quality problem with multiple annotators.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Josh</title>
		<link>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1317</link>
		<dc:creator>Josh</dc:creator>
		<pubDate>Fri, 10 Jul 2009 04:24:30 +0000</pubDate>
		<guid>http://blog.crowdflower.com/2009/07/crowdsifter-more-efficient-pornography-filtering/#comment-1317</guid>
		<description>While I have played with MT, I'm no expert - but what stops one from implementing the same algorithm atop of MT? Via the API you can keep track of who each turker is, and can control against a known good sample. Or is there some additional secret sauce that is not implementable using the Turk API?</description>
		<content:encoded><![CDATA[<p>While I have played with MT, I&#8217;m no expert - but what stops one from implementing the same algorithm atop of MT? Via the API you can keep track of who each turker is, and can control against a known good sample. Or is there some additional secret sauce that is not implementable using the Turk API?</p>
]]></content:encoded>
	</item>
</channel>
</rss>
