For the past week, psychologists all over America have been freaking out.
The cause of their agita was an observation by a psychology graduate student from the University of Minnesota named Max Hui Bai. Like many researchers, Bai uses Amazon’s Mechanical Turk platform, where individuals sign up to complete simple tasks, such as taking surveys for academics or marketers, and earn a low fee. On Tuesday, August 7, he posed a simple question in a Facebook group for psychology researchers: "Have anyone used Mturk in the last few weeks and notice any quality drop?"
As he would later elaborate in a blog post, Bai had found that the surveys he conducted with MTurk were full of nonsense answers to open-ended questions and respondents with duplicate GPS locations. He said he had to throw out nearly half of the data in his most recent survey, a sharp increase from what he was used to seeing. His Facebook post garnered 181 comments, with other researchers describing similar signs of low-quality data in their own recent work. A number of them wondered if the culprit was bots—automated programs mimicking human behavior, not the actual human labor MTurk is supposed to supply.
The discussion soon spread over Twitter and email, until it appeared the whole field was worried about MTurk. By Friday, New Scientist ran an article with the headline “Bots on Amazon’s MTurk Are Ruining Psychology Studies.” One psychology professor mused on Facebook, “I wonder if this is the end of MTurk research?”
If that were the case, it would be a pretty big deal. Thousands of published social science studies use MTurk survey data every year, according to Panos Ipeirotis, a data scientist at New York University’s Stern School of Business.
But here’s the thing: It’s hard to know for sure if what Bai reported was the result of bots run amok. There are plenty of explanations for junk responses on MTurk. Bai recognizes this. “It might be bots, it might be human-augmented bots, or it might be humans who are tired of taking the survey and are just randomly clicking the buttons,” he says. It could also be the result of poor survey design, as Joe Miele, who operates an MTurk data consultancy, pointed out in response to the uproar.
And not all bot-like behavior on MTurk is considered bad. The platform's Acceptable Use Policy says that Amazon is “generally OK with you using scripts and automated tools” to more efficiently preview and pick tasks. It’s not uncommon for MTurk workers, or Turkers, to use scripts to help them find high-paying tasks they’re suitable for and to accept them quickly. What you cannot do is complete those tasks using automated tools, because then you aren’t using your human intelligence to do the job, and that’s the whole point of MTurk. That hasn’t stopped some people from reportedly using tools to automate filling out forms, but it’s not clear yet whether their use is on the rise, or even that common. Amazon will only say this behavior is against its rules.
“There are bots on MTurk and have been for years,” says digital labor researcher Rochelle LaPlante, a former moderator of Reddit’s r/mturk subreddit. “I don’t know if this new flare of discussion is actually an increase in bots, or just an increase in researchers talking about it and actively searching their data for it.”
MTurk and Social Science
When it launched in 2005, Mechanical Turk was a game-changer. It offered a wider pool for researchers than undergraduates on campus, who before online crowdsourcing had been the main participants in many of these studies, and for a relatively low cost. MTurk ushered in a "golden age" for social science research. Today, data gathered on the platform is used in thousands of studies a year.
But all along there have also been reservations about the site and that data’s reliability. People have worried they could get scammed—requesters about getting back bad work, and workers about taking on tasks that never pay out. Marketers and researchers worried that the Turker population wasn’t representative enough for their surveys. And the incredibly low pay—just a few cents per task—has disconcerted labor activists and ethicists, who wonder if it’s kosher for scientists to rely on laborers who are making so little. It also provides an incentive for workers to complete tasks as quickly as possible. Researchers do have control over how much they compensate Turkers for work and can opt to pay more if they want. At the same time, scientists are concerned about keeping costs down, and institutional review boards have often expressed concerns (wrongly, some say) that high pay for human research subjects could be coercive.
As more researchers used MTurk, they’ve also discovered ways to mitigate many of these concerns. Ipeirotis has found that the Turker population is just as representative as university undergraduates for survey populations, and the data could be just as reliable as long as people took proper precautions in designing their studies. As for bot work, experts say that researchers can avoid problems by setting up their surveys with stringent parameters and designing tasks that are difficult to automate.
“Most people are, by now, smart enough to deal with noise that comes from workers that do not pay attention, or from bots,” Ipeirotis says. Notably, Bai and many of the researchers who reported an uptick in bad data this summer were using Captcha and attention checks, as experts advise, though Miele and some researchers in the Facebook group suggest perhaps their participant qualifications could have been stronger.
Kathryn Johnson, a psychology professor at Arizona State University, spent the past week going back over her data to see if what Bai reported on his blog was true for her research. “I usually have one MTurker study going a month,” she says.
She found the same two troubling things in her most recent MTurk studies: repeated GPS locations and nonsense answers to open-ended questions. But location data, whether an IP address or GPS, is not by itself a reliable indicator of fraudulent behavior, four different MTurk and bot experts told WIRED. So if that’s the only suspicious thing that researchers are seeing in their results, they should not worry.
The nonsense answers are more meaningful. A bunch of researchers told Bai that they had repeated instances of survey respondents replying “nice” or “good” to open-ended questions for which those words made no sense.
“There are browser extensions that fill forms with random answers, so I’m certain some of [what they are seeing] is this,” LaPlante says, but she is quick to note it could also be people not answering the surveys carefully. Repeated bad answers could be Turkers copying and pasting quickly so they can complete more surveys and earn more money. It could also mean people with poor English skills are taking the surveys, notes Miele.
But How Big a Problem Are Bots?
Though Amazon explicitly disallows bots that complete jobs on Mechanical Turk, the company is not very forthcoming about how big a problem they are on the platform. Perhaps it’s because they haven’t needed to be. Unlike Twitter, which in light of its well-known bot infestations has had to be vocal about kicking them off, the possibility of bots on MTurk struck many people as news this week.
Amazon also makes setting up multiple or fake accounts a bit more difficult, by requiring workers to provide valid tax information. But that doesn’t prevent a verified person from supplementing their own MTurk labor with an automated system. It would be much easier to set a script on your account to complete a bunch of jobs while you sleep or go to your other job, for instance.
“When most of us think of bots, we think of large networks of criminals, but a bot is just a tool for automation. It could be used by one individual to say, instead of me making $5 a day in Amazon Turk, I’m going to use it to make $20 a day. It’s not necessarily nefarious or evil, but there is a gray area,” says Reid Tatoris, vice president of product outreach and marketing at Distil Networks, which detects and protects clients from automated attacks and bots. “But it’s definitely not in compliance,” he added.
In response to this week’s bot worries, a representative for Amazon told WIRED that the company suspends or terminates anyone found completing MTurk tasks by automated means. “We have both automated and manual mechanisms to detect fraud and misuse of the service by bots, and we are always improving these mechanisms as we discover new forms of abuse,” the representative said.
Amazon wouldn’t say whether there has been an uptick in automated behavior on MTurk recently, nor would the company discuss specific examples of bots or accounts.
“This has been going on since the beginning of Mechanical Turk, since forever,” says Kristy Milland, who has conducted research on the platform and worked as a Turker herself for 12 years. She describes herself as an MTurk labor activist, working to encourage fair pay on the platform. “There are a dozen people I know of personally who run bots, and they get away with it,” she says, adding that it would take her 30 seconds to write a simple script to fill in survey information automatically on MTurk.
Forums for Turkers are full of conversations about scripts, some of which would violate Amazon’s terms. You can also find YouTube videos showing Turkers how to write a script to auto-fill answers. In Milland’s opinion, this behavior is in some ways inevitable thanks to the platform’s policies.
“Mechanical Turk workers have been treated really, really badly for 12 years, and so in some ways I see this as a point of resistance," she says. "If we were paid fairly on the platform, nobody would be risking their account this way.”
Anyone who finds a possible bot account, or an account otherwise violating MTurk’s terms, can alert Amazon via a contact form on the site. Milland says she has sent Amazon multiple MTurk IDs that are running bots, but the accounts are still active.
“They don’t want to admit [bots are on the platform],” Milland says. “There are enough people who don’t know that such a thing is possible that they don’t want to even let a whisper of the fact that it’s a possibility out. So they won’t talk about it. I send it on. I do not get a reply at all.”
Amazon’s silence on the topic is striking, considering the level of concern among researchers. Members of the Facebook group where Bai originally posted about MTurk say they’ve reached out to Amazon this week, but do not report hearing back. Last week, Bai created a questionnaire for researchers—not on MTurk—and is now leading a crowdsourced effort among social scientists to figure out how much of the bad data he has seen is new, how large the problem is, and how to stop it. He’s still analyzing that survey, but he plans to send their research to Amazon in the hopes that the data will force the company to respond.
If we think of Bai’s quest like a scientific experiment, we just passed the hypothesis phase (“There may be more bad data on MTurk, and it may be due to bots”) and have entered data-collection mode. The results aren’t in yet.