CrowdFlower Research


Crowdsourcing the Haiti Relief

January 29th, 2010 by Lukas Biewald

ushahidi.pngFriday night I was getting ready to spend the weekend working on my board meeting slides when my friend Ian Monroe came by the office and told me to talk to Robert Munro. Robert is a computational linguist who does research on large scale processing of text messages — an obscure subject until the earthquake in Haiti happened two weeks ago.Robert had been working with Josh Nesbit, cofounder of FrontLineSMS:Medic, an awesome NGO dedicated to building SMS based communication infrastructure for people in the developing world.

Just after the earthquake, Josh convinced Digicel and Comcel, the two largest mobile carriers in Haiti to setup a number, 4636, that anyone could text message to. They advertised it all over Haiti and pretty soon they were getting a message every few seconds. A team of techies including Robert and (in particular) Brian Herbert of Ushahidi (a very cool organization which creates various kinds of crowdsourced mapping projects) hacked together an infrastructure which aggregates and processes the SMS messages coming from Haiti. This allowed a team of volunteers to translate, classify and geocode the messages. From launch, InSTEDD/Thompson Reuters worked with the Red Cross on the ground, responding to emergency requests. Within days, Ushahidi and a second team of volunteers were mapping incidents and coordinating actionable responses with the US Coast Guard.

Robert and Brian were looking for a scalable solution to route tasks to a distributed workforce that could handle the high volume and low latency necessary to make their program a success — exactly what CrowdFlower does. They were also looking for a more stable, long term solution than a strictly volunteer workforce. Our partners at Samasource, a socially responsible outsourcing organization, had just setup a large bilingual digital workforce in Haiti.

It felt like we had a perfect solution to an important problem and I stopped working on the board slides and started writing code to glue the Digicel and Comcel feeds into our API and build the task within our framework. Our engineers were extremely responsive to bug reports from a CEO who may not have read the API documentation as thoroughly as he could have at first. I presented my most underprepared board slides to a very understanding board ☺ and on Wednesday night we became the live feed and essentially the 911 switchboard for the 4636 project in Haiti.

You can learn more about the project at Mission 4636 on the Samasource website where, if you speak Kreyol or know someone that does, you can sign up to volunteer. Samasource has been hard at work building up the infrastructure in Haiti needed to process the messages.  In the meantime, we’re recruiting volunteers around the world to help meet the demand. As I write this, more than 12,000 messages have been translated and processed by the amazing volunteers.

If you don’t speak Kreyol and want to support the project, another way to help is by donating money directly to Samasource.

Altruism on Amazon Mechanical Turk

January 27th, 2010 by David Rand

 

Many workers on Amazon Mechanical Turk are willing to help others at a cost to themselves, just like participants in laboratory experiments.

While traditional economic models assume that people are entirely selfish, a central theme in behavioral economics is the existence of ‘social preferences’, or caring for others. Countless laboratory experiments have demonstrated that many people are willing to help others, even at a cost to themselves. This behavior is clearly inconsistent with being motivated only by your own monetary payoff – if you are entirely selfish, you would never pay money to help someone else in the totally anonymous conditions of the lab. In this post I describe an experiment I conducted together with John Horton, and with invaluable technical assistance from Xiaoqi Zhu, that replicates the existence of social preferences on Amazon Mechanical Turk (AMT), showing that many Turkers behave altruistically.

We also demonstrate the principle of priming, another focus of great interest in experimental economics. In priming studies, stimuli unrelated to the decision task (and which do not affect the monetary outcomes) can nonetheless significantly alter subjects’ behavior.

To assess altruistic behavior on AMT, 194 subjects played an incentivized Prisoner’s Dilemma (PD), the canonical game for studying altruistic cooperation. Subjects were informed that they had been randomly assigned to interact with another Turker, and that they would each have a choice between two options, A or B. In addition to a 20 cent “show-up fee”, they were informed of the following payoff structure: if both subjects chose A, they receive each earn a 120 cent bonus; if both chose B, they would each receive an 80 cent bonus; if one chose A while the other chose B, the A player would receive 40 cents while the B player would receive 160 cents. The resulting payoff matrix is as follows (in each cell I first show the row player’s payoff, and then the column player’s payoff):

A

B

A

120,120

40,160

B

160,40

80,80

 

Thus A represents cooperation, and B represents defection. If both people chose A, they both do better than if both choose B. However, regardless of the other’s action, you earn more by choosing B (hence the ‘dilemma’). Rational self-interested players should therefore always select B, and it is altruistic to choose A (helping the other at a cost to you). Given previous evidence from experiments in the laboratory, however, we predicted that AMT subjects would demonstrate a level of cooperation significantly greater than 0 in a one-shot PD.

To explore the effects of priming on AMT subjects, we built on a previous study demonstrating that exposure to religious words and phrases increases altruistic behavior, particularly among those who believe in god (Shariff & Norenzayan 2007). Among the 194 subjects in our experiment, the prime group (N=89) read a Christian religious passage about the importance of charity (Mark 12:21-22) before playing the PD, whereas the no-prime group (N=105) did not. Following the PD, subjects completed a demographic questionnaire reporting age, gender, and education, and indicated whether they had ever had an experience which convinced them of the existence of god.Based on the results of Sheriff & Norenzayan, we hypothesized that the religious prime would increase cooperation, and further hypothesized that the effect would be driven by subjects that believe in god.

Consistent with our first prediction, we observe a level of cooperation significantly greater than 0 in both the no-prime (54% C: sign-rank test, p<0.001) and prime (71% C: sign-rank test, p<0.001) conditions. Consistent with our second prediction, we observe significantly more cooperation in the prime condition compared to the no-prime condition (Chi2 test, p=0.018). Consistent with our third prediction, the prime only increases cooperation among subjects who believe in god (Chi2 test, non-believers: p=0.82, believers: p=0.004). The results are visualized in Figure 1. Using logistic regression with robust standard errors, we also find that these results are robust to controlling for age, gender, country of residence (US vs non-US), religion (Christian vs non-Christian) and education.

Figure 1 Figure 1. Reading a religious passage significantly increases Prisoner’s Dilemma cooperation among those who believe in god, but not among non-believers.


To summarize, we have demonstrated two aspects of Turker behavior:

1. A majority of Turkers chose the altruistic option of cooperating in a Prisoner’s Dilemma. Thus even in the entirely anonymous and profit-motivated online labor market of AMT, many people still choose to help each other. This sort of altruistic cooperation is a fundamental part of the natural world, and is the building block of human societies. For more, see (Nowak 2006).

2. Reading a religious passage about the important of charity makes religious Turkers more altruistic, but has no effect on Turkers who do not believe in god. This shows that Turkers respond in basically the same way as “normal” lab subjects, and is fairly intuitive. Those who believe in god are receptive to calls for generosity phrased in religious language, while non-believers aren’t. Secular primes have been shown to work for both religious and non-religious subjects (Shariff & Norenzayan 2007).

Although AMT workers are certainly not a generally representative sample, this study demonstrates that they show several of the same basic behavioral features observed in behavioral laboratory experiments. Furthermore, AMT allowed this study to be run extremely quickly and inexpensively. The 200 subjects were recruited in less than 2 days, at a total cost of $253. As a behavioral researcher, this is amazingly exciting! I usually spend months and thousands of dollars per study. AMT opens the possibility of exploring countless interesting ideas that otherwise we would have had neither the time nor money to pursue.

For other studies about cooperation, reward and punishment that I’ve conducted at Harvard, see the pdfs on my webpage: www.DavidGertlerRand.com.

Crowdsourcing Work Meetup January 20th

January 19th, 2010 by Lukas Biewald

smallmeetup.jpgWe’re hosting another Crowdsourcing work meetup! Once again, we’re bringing together people interested in all types of crowdsourcing work.

My favorite thing about the last meetup was that it gave me an excuse to invite people from around the country whose work I admired and meet them in person. This meetup is no different: we have four awesome speakers all of whom I’ve been a huge fan of for a long time:

  • Aaron Koblin, a local artist, who created The Sheep Market, Bicycle Built for Two Thousand and Ten Thousand Cents projects.
  • Leila Janah, a social entrepreneur who runs Samasource, our fantastic non-profit partner.
  • Sharon Chiarella, VP at Amazon in charge of Mechanical Turk.
  • Panos Ipeirotis NYU Stern school professor who writes one of my favorite blogs on crowdsourcing and Mechanical Turk.

The doors open at 6 and we’ll start the talks at 7.

Note that this time we’re having it at our partner Samasource’s office at 972 Mission St, Ste 500 in downtown San Francisco.

We are really looking forward to meeting everyone. Please RSVP on our meetup group so that we can get an accurate headcount. Many people have requested a live stream – you should be able to access our feed at http://justin.tv/crowdflower.

Update #1 WhitePages offered to host a video link at their offices in Seattle:
Date: Wednesday the 20th
Time: Stop by anytime between 6pm and 7pm
Address: WhitePages, 1301 Fifth Ave, Suite 1600
(that’s the Rainier Tower at 5th and Union in downtown Seattle)
Please call Joe to be let in — 206-306-5253

Update #2 We have a live feed setup to go at http://justin.tv/crowdflower - we will try to post the video on line after the fact as well.

Hope to see you in person! Samasource is located b/t 5th & 6th on Mission Street.

The Labor Economics of Paid Crowdsourcing

January 14th, 2010 by John Horton

This is the first in (what we hope will be) a series of guest posts from John Horton, a Doctoral Candidate in Public Policy at the Harvard Kennedy School. John and Aaron Shaw are collaborating on some research projects and we were both introduced to Dolores Labs around the time of last year’s Mechanical Turk Meetup.

Since then, John’s been busy establishing himself as a Crowdsourcing research pioneer by designing a suite of online data collection tools as well as running numerous experimental and observational studies on several different Crowdsourcing labor markets. We really admire his work, which tends to involve well-designed methods and cut straight to big, interesting questions. In this post, John discusses a recent experiment he ran on Amazon Mechanical Turk that looks at worker motivations in the context of labor economics and theories of the “reservation wage.”





Hi - this is my first post here (though I’ve commented a bit). I work with Aaron Shaw and got to know Lukas at the last meet-up he hosted. Anyway, I’m interested in crowdsourcing and online labor more generally and Lukas was kind enough to let me write about some of my research here.

It’s pretty clear that many Amazon Mechanical Turk (AMT) workers are motivated primarily by money, which suggests economics is the best tool for understanding worker decision-making.Research by Winter Mason and Duncan Watts shows that workers behave in a way consistent with economic rationality: when they were paid more, workers produced more output. Although any sensible model predicts that workers will work more when paid more, standard labor economics models make several other predictions (some might call them assumptions): workers should make decisions based solely on the real wage offered — payment divided by time spent. They should compare this offered wage to their reservation wage for a particular task.

Because it drives decision-making, the reservation wage is the key parameter in labor supply models, but it is hard to estimate in practice; when we observe someone working, even if we know their wage we don’t get to observe their reservation wage parameter — we just know that their wage is above the reservation wage. In a new paper (joint with Lydia Chilton), we use a unique method that allows us to estimate reservation wages for AMT workers. Although we find some agreement with the predictions of the simple rational model, we also find some evidence that workers are “target earners,” meaning that the work until they reacg certain salient earnings targets (e.g., the maximum amount available). This kind of behavior has been found in other contexts, but it runs counter to the rational model.

The Task

For our task, subjects clicked back and forth between two vertical bars in a Flash game (screen shot below). A block of 10 back-and-forth clicks made up one unit of output, and subjects could decide how many blocks to complete. The amount paid per-block was constantly decreasing. This constantly decreasing rate allowed us to esimate a worker’s reservation wage, by looking at the implied wage when they “quit.” A live demo of the task is available here.task.png

Results

Subjects were randomly assigned to either a HIGH or LOW group. The HIGH group was paid 3 times more than LOW for every task. The figure below shows output in both groups. One striking feature of the data is how bimodal output is: some workers produced lots of output and some produced very little. For this bimodality to be consistent with rationality, the distribution of reservation wages themselves would have to be very bimodal, which seems unlikely.We found that the imputed reservation wage distributions were quite different across groups. Because of randomization, the distributions should have been indistinguishable. In particular, we found that the reservation wages in LOW were too low, suggesting that workers in LOW, on average, worked more than they should have. Why?

output.png

Target earning

One possible explanation for why there is too much output in LOW is that at least some workers try to earn the maximum amount possible, regardless of the “wage” associated with this strategy. Having an earnings target may sound rational, but can lead to some perverse results. For example, workers might work longer when wages are low (because they still want to meet their target) than when they are higher (though there are other reasons this can happen, namely income effects). It is an open controversy in economics whether employees with “real” jobs are target earners (see this work by Henry Farber as well as Colin Camerer’s work), but we find several pieces of evidence for target earning in our data.

The strongest evidence we find for target earning is that some workers show a preference for earning total amounts divisible by 5. In the figure below, the earnings of workers in HIGH are plotted as a histogram, with horizontal panels for the whole cents earned. E.g., earnings amounts 29.2, 29.5 and 29.9 would all be in the same “29 cent” panel. The height of the bars show how many subjects earned that amount of money. Panels where the whole cents are divisible by 5 have black histogram bars; the others have white bars.We can see that several subjects earn the smallest amounts available (e.g., 2, 3 and 5 cents). Because these low earners quit very early, they presumably do not have a target or would not need a target. However, we see clear output spikes at 15, 20 and 25 cents. The probability of this happening by chance is about 3 in 1000 (see the paper for details).

Conclusion

We find some agreement with the rational model, as well as important anomolies consistent with some ideas from behavioral economics. While it’s probably too early to offer much practical design advice, it does seem that designers should give workers natural targets, as they seem to help at least some workers. The paper is called “The Labor Economics of Crowdsourcing” and is available here.

Not-quite-live-blog: Jonathan Zittrain on “Minds For Sale”

December 22nd, 2009 by Aaron Shaw

Jonathan Zittrain, Professor of Law and Faculty Co-director (and co-founder) of the Berkman Center for Internet and Society at Harvard University, gave a presentation at the Computer History Museum in Mountain View about a month ago that ought to be required viewing for anyone interested in Cloudlabor and Crowdsourcing.

Drawing examples from all over the Internet - including a certain iPhone app that you may have heard of - Zittrain raises some serious (and some seriously entertaining) questions about ethical and legal aspects of distributed human computing.

Straight from the Berkman Center YouTube channel, here’s the full video (which is also available for download under a Creative Commons Attribution 3.0 license from the President and Fellows of Harvard College:









Zittrain focuses on the potential alienation and opportunities for abuse that can arise with the growth of distributed online production. He also contemplates the thin line that separates exploitation from volunteering in the context of online communities and collaboration.

I enjoyed his analysis and the discussion afterwards, although I suspect that some of the conversation with the audience might get lost in the video. As with Zittrain’s most recent book, The Future of the Internet and How to Stop It, this is some of the best thinking about life online that you’ll find anywhere.

Zittrain has also published an abbreviated portion of his argument in Newsweek under the slightly more extreme title “Work the New Digital Sweatshops.”

I find a lot of what Zittrain has to say compelling; however, I do wonder if the efforts of ReCaptcha-spammers and sock-puppeteers to exploit Crowdsourcing markets will ultimately prove successful. I also wonder whether the imposition of labor regulations in these contexts makes sense or would prove effective. Should my decision to kill time or make a few extra bucks by filtering images be subject to labor law? What about the ability of other people to offer money for distasteful and perhaps unethical (but usually not illegal) micro-tasks?

It may be a few years before anyone really understands if Crowdsourcing lends itself to unique types of market failure along these lines, but Zittrain and others such as Lily Irani and Aaron Koblin are doing us all a favor by asking some of the most important questions early in the game.

Full disclosure: the author of this post is affiliated with Harvard and the Berkman Center for Internet and Society, where he was a fellow during 2008-2009. While he doesn’t think that his affiliation influences his opinions about Zittrain’s work, it does mean that he’s very pleased not to be spending another winter in Cambridge this year.

Ask a Stupid Question

December 16th, 2009 by Aaron Shaw

What makes a bad survey question and why does it matter? I thought I’d use my first blog posts as Dolores Labs’s friendly neighborhood social scientist to talk a little bit about question design since it’s a relevant, but often overlooked, area of Crowdsourcing work.

You can ask “the crowd” all kinds of questions, but if you don’t stop to think about the best way to ask your question, you’re likely to get unexpected and unreliable results. You might call it the GIGO theory of research design.

To demonstrate the point, I decided to recreate some classic survey design experiments and distribute them to the workers in Crowdflower’s labor pools. For the experiments, every worker saw only one version of the questions and the tasks were posted using exactly the same title, description, and pricing. One hundred workers did each version of each question and I threw out the data from a handful of workers who failed a simple attention test question. The results are actual answers from actual people.

An Example: Response Scales

The rest of this post focuses on one example question that involved a response scale and a test to see how altering the scale would affect people’s answers. Here are two versions of the same question that I posted to Crowdflower:


Low Scale Version:


About how many hours do you spend online per day?

(a) 0 – 1 hour
(b) 1 – 2 hours
(c) 2 – 3 hours
(d) More than 3 hours






High Scale Version:


About how many hours do you spend online per day?

(a) 0 – 3 hours
(b) 3 - 6 hours
(c) 6 – 9 hours
(d) More than 9 hours





Notice that both versions can accommodate any answer and that the only difference is in the range of the scale items. You can give an accurate response to either question and neither version explicitly pushes you to give any answer over another.

So what did people say? Here’s a pair of histograms breaking the responses up by the two versions of the question:

boring histograms: hours online by scale

I didn’t label the height of the bars because the results are almost useless in this form. The only conclusion we can draw is that a lot of people in the Crowdflower worker pool tend to spend more than three hours per day online (whoa, no way…).

At the same time, it seems like the workers might have given low answers more frequently in response the low scale (check out how big the first three blue bars are compared to just the first orange bar).

To look at that comparison more closely, let’s break the answers into two categories for each scale: (1) the percentage of responses that were less than three hours, or (2) the percentage of responses that were more than 3 hours.

hours online in two bins

The difference between the height of the orange points (high scale) is much bigger than the corresponding difference between the height of the blue points (low scale). In other words, people who saw the high scale were much more likely to say they spent more than 3 hours online. In case you’re a stats nerd, the Chi-square test showed that this variation was significant with a p-value < 0.001, so the difference was almost certainly not due to chance.

But maybe collapsing the responses like this is a little too coarse and you'd still like to see how the variation worked across the scale as a whole. With that in mind, Lukas suggested another way to look at the effects – a comparison of the cumulative percentage of responses – and the differences are even more clear.

hours online - cumulative bins

That gap between the blue and the orange line at “Less than 3 hours” – the one level that was measured explicitly on both scales – is huge!

Explaining the Gap

If you’re thinking that the differences between the scales alone can’t explain why all of these results are so skewed, that’s a good thought. However, the fact that this was a randomized experiment on a relatively homogeneous group of people makes it very unlikely that anything else explains the difference. Just to be sure, I did some other tests and found no significant differences between the sets of respondents that saw the low and high scales in terms of gender, country of origin, and the amount of time they took to complete the survey. So it seems like the scale is indeed the most likely culprit.

But what explains why scale questions can bias people’s responses so heavily? Survey researchers call this kind of behavior satisficing - it happens when people taking a survey use cognitive shortcuts to answer questions. In the case of questions about personal behaviors that we’re not used to quantifying (like the time we spend online), we tend to shape our responses based on what we perceive as “normal.” If you don’t know what normal is in advance, you define it based on the midpoint of the answer range. Since respondents didn’t really differentiate between the answer options, they were more likely to have their responses shaped by the scale itself.

These results illustrate a sticky problem: it’s possible that a survey question that is distributed, understood, and analyzed perfectly could give you completely inaccurate results if the scale is poorly designed.

Okay, it’s Broken. Now How Do I fix It?

So what are you supposed to do in order to figure out which scale is more accurate? One of the best ways to mitigate the problem is to do some open-ended research on your respondent population so that you can get a good sense of a reasonable range of responses. Then you can re-center your response scale around that distribution.

To try this out, I ran the survey yet again with the same question, except that this time I left the “hours online” question open-ended, allowing Crowdflower workers to type in their responses. Here’s a density plot of those responses with the minimum, maximum, and mean responses highlighted (sparklines style):

hours online - open ended

While the distribution is skewed and has something of a long-ish tail, the mean (6.53 hours per day), median (6 hours per day), and mode (5 hours per day) are all close to the midpoint of the high scale in my original questions. Therefore, the responses from the high scale were probably a more accurate reflection of the worker’s judgments.

Keep in mind, this technique provides no guarantee that the workers have accurate knowledge of how many hours they spend online – it’s turtles all the way down. I’d be willing to bet that their best guesses are pretty good, but if a big policy decision was riding on this question, I’d try to supplement my little survey with some other data sources. No matter what, there’s no perfect solution.

So what?

The point of all this has not been to undermine survey research, but to illustrate some of the problems that can happen if you’re not careful with things like scale design, as well as to present some strategies for solving those problems. As crowdsourcing becomes a mainstream tool in a range of academic and commercial fields, survey and questionnaire design techniques are also becoming more widely applicable. Nevertheless, people don’t usually encounter this kind of stuff outside of research methodology textbooks and the polling season of an election year.

I have a few more examples from these same experiments that I hope to follow up with in more posts soon. Meanwhile, leave a comment or email me at aaron [at] doloreslabs [dot] com with questions, comments, corrections and requests for data/code. All of these plots were created using R.

Getting the Gold-Farmers to do useful work

October 22nd, 2009 by Lukas Biewald

screen shot crowdflower gambitOne of the most interesting and successful ways that games make money is through “offers” — basically ads or surveys that players can do to earn virtual currency. The game maker earns real money for every player that completes an offer.

We’ve integrated with Gambit, a leading offer provider. They post our tasks inside games alongside other offers. Instead of filling out a survey or buying something they don’t actually want, we have people doing real, useful work for our customers. I might be too old to understand the appeal of virtual currency, but we’ve observed from the feedback and volume of gamers doing our tasks that people care about getting their in-game money. It’s fascinating and exciting that people are shifting from doing things that aren’t standardly conceived of as productive to tasks that people need done. You can see a screenshot of how a task looks in the facebook game “SportsBets” on the left.

This is one way that by working through CrowdFlower we’re able to give you access to people speaking virtually every major language.

iPhone app — Give work

October 13th, 2009 by Stephanie Geerlings

landing-page.jpgWe just launched our first iPhone app: Give Work lets you do tasks in your downtime and help increase the wages of refugees in Kenya working for us.

We have been working with Samasource for a while now — they are a fantastic local non-profit that brings computer based work to people in Africa. We send tasks to one of their hardest to employ groups: a Kenyan refugee camp.

The people are extremely motivated, speak fluent English and even have high speed internet. But sometimes there are downtime issues (due to floods, satellite failure, etc.) and sometimes there are data quality issues (due to cultural misunderstandings), which makes it hard for them to compete for traditional outsourcing work. Fortunately, our dynamic routing and quality control technology can resolve these problems gracefully.

When you complete a task on your iPhone, your work is paired with the work of someone in Kenya.  iPhone users results are used for quality control — if someone waiting for a bus in San Francisco gives the same answer as someone working in a refugee camp, we can be fairly certain that the results are reliable. All of the profits we make on the work collected between the iPhone and the refugees go directly into the pockets of the refugee workers.

How to do tasks
Download the free app from the app store. You can start doing tasks in seconds. Shoot us an email at feedback@crowdflower.com if you have an issues or questions.

How to submit tasks to Samasource and GiveWork
Visit CrowdFlower where you can build tasks to outsource. On the order page, click the “iPhone” and “Samasource” channels.

Thanks!
I want to give a special shoutout to Josh Snyder, who did most of the work of building the actual application. I wasn’t sure we had the resources to get this crazy idea out the door, but everyone pitched in after hours and it looks great!

We’re still growing

October 5th, 2009 by Lukas Biewald

The Office

Come work for us! We’re funded by a group of well known investors, we’re generating substantial increasing revenue, and we’re looking for people to take us to the next level.

We’re located in the heart of the Mission, and we particularly love to meet readers of this blog. Among many amenities, we offer unlimited otter pops and a healthy oxygen-neutral environment.

If you refer someone that we hire, we will also confer upon you lifetime access to our otter pop supply.

Please send your application to jobs@doloreslabs.com.

Director of BD/Sales

Responsibilities:

  • Manage the sales pipeline
  • Investigate new markets and new applications of our technology
  • Close deals with large enterprise customers

Requirements

  • Proven track record of closing deals with enterprise customers

Account Manager

Responsibilities:

  • Communicate with large customers
  • Handle new customers
  • Reviews all major deliverables to ensure quality standards and client expectations are met.
  • Approves Change Orders and invoices, and is responsible for payment collections.

Requirements

  • Basic statistical/quantitative literacy
  • Organized and meticulous, but willing to work within the chaos of a startup
  • Undergraduate degree

Airlines: Who to fly with?

September 30th, 2009 by John Le

I really hate flying, not because I do not like being transported through the sky in a giant metal cylinder which is incredibly amazing and cool, but because service is unpredictable. Inevitably, I am burdened with mundane research, searching for and reading recent reviews, to find a good service. No one really wants to do this, but we still want to know which airlines have a greater percentage of recently satisfied customers. PeopleBrowsr, a social search engine we are working with, did a really interesting sentiment analysis task on airlines and generously allowed us to publish the data.

positive-tweet-percentage-number-of-passengers

We can see a slight negative correlation between size of the airline and the percentage of positive tweets. Smaller commercial airlines like Hawaiian, SkyWest, and Virgin had higher percentages of positive tweets clustering towards the lower right hand corner of the graph, while larger carriers like United, Delta, and Continental had lower percentages clustering towards the upper left hand corner. Though SouthWest was one of the larger carriers that broke this trend. We should note that Aloha Airlines no longer exists (the passenger data for which is from 2007) and it’s possible that the tweets for Aloha airlines showed such a high percentage of positive tweets because “aloha” invokes positive sentiment.

What was said

To gain insight into what we can expect from a positively or negatively viewed airline, it would also be nice to know what words were being used. So below are two nice Wordle visualizations of tweeted words where size indicates greater prevalence of the word, while word orientation and color are for style.

Positive Sentiment Tweet Words
positive-tweet-words
Negative Sentiment Tweet Words
negative-tweet-words

Not surprisingly the prevalences of the words “delay”, “wait”, and “waiting” for negatively viewed airlines implies having delays and making people wait is bad. Positive tweets contained common words like “great”, “best”, and “good”, but also “internet”, “wifi”, and “wireless” which is also not very surprising, since internet connectivity is so highly valued. Interestingly, some positive tweets contain words like “galactic”, “mothership”, and “spaceport”. Taking this into account I’ll try to find an airline run by aliens the next time I fly, and hopefully my interactions with them won’t prove disastrous.

-John