Once upon a time in the 18th century a fantastic chess-playing machine known as the Mechanical Turk was exhibited around the world, stunning audiences with its ability to beat skilled players and heads of state like Napoleon Bonaparte. Years later it transpired that the machine’s extraordinary feats were only possible because a human was hiding inside the machine and making all the moves.
Today a similar phenomenon goes on behind the scenes in developing artificial intelligence: humans label much of the data used to train AI models and they often babysit those models in the wild too, meaning our modern day machinery isn’t as fully automated as we think. Yet now comes a twist in the tale: AI systems can produce content that is so humanlike, some of those behind-the-scenes humans are training new AI with old AI.
AI models are often described as a black box, so what happens when one black box teaches another? The new system becomes even harder to scrutinise. It can make biases in those systems more entrenched.
A new study from academics at Switzerland EPFL suggested workers on Amazon.com’s MTurk — a crowdsourcing job platform named after the original mechanical Turk — have started using ChatGPT and other large language models to automate their work. The researchers said 33% to 46% of them were using the AI tools when carrying out their tasks.
Normally companies and academics hire MTurk workers because of their ability to do things computers cannot, like label an image, rate an ad or answer survey questions. Their work is often used to train algorithms to do things like recognise photos or read receipts.
Nearly all tasks on MTurk pay tiny amounts. West Virginia-based Sherry Stanley, who was an MTurk worker for more than seven years until recently, said she’d seen requesters offer to pay just 50c (about R9) for three paragraphs of written work. Turkers can hike up their hourly takings from $3 (R55) to around $30 (R547) if they use specialised software to speed up their tasks.
The problem with using ChatGPT, though, is that it isn’t just streamlining the work, it’s doing it.
There are several implications. For example, this behavior impacts the 250,000 or so people, mostly in the US, who are estimated to be working on the MTurk platform.
“Scam workers can exploit the whole system,” said Stanley.
“And the good workers are the ones who suffer the consequences.”
Companies who hire Turkers pay them based on the number of tasks they complete and the quality of their work. If some are producing work faster thanks to software that mimics their human abilities, that puts greater pressure on MTurk workers to increase their speed and output overall, something other professionals are likely to experience too in the advent of generative AI.
Another consequence is skewed results for academic researchers who use MTurk to carry out studies, and for companies that hire Turkers to help train AI systems. If less human input goes into those processes, the algorithms and scientific studies that use crowdsourcing will get a more warped reflection of reality.
“Human data is hugely important,” said Veniamin Veselovsky, an author on the EPFL research paper.
“Psychology, computational social science, sociology all depend on it to better understand ‘us.’”
If more crowd workers use ChatGPT, they’ll also add to the growth of synthetic content derived from AI that is coming to the web. Large language models developed by companies like OpenAI and Google are poised to play a larger role in our information ecosystem, adding to growing amounts of synthetic data that companies are producing to teach AI models.
Overall that’ll make the internet a potentially more confusing place to learn about the world. Between the bots on Twitter and AI-generated ads, it’s becoming harder to find content on the web that comes from real humans. That shift threatens to reinforce prejudices known to have been baked into some language models and AI systems.
“It opens up a series of ethical questions,” said Veselovsky.
“These models can represent specific viewpoints, opinions and ideologies. This may lead to a lack of diversity in the models we are training.”
In other words, if biased AI systems are training other AI systems, we’ll find ourselves caught in a loop of dodgy information whose origins become harder and harder to decipher. The humans who are working behind the scenes of AI are integral to its development, but it would be good if they could stay human for as long as possible.
More stories like this are available on bloomberg






Would you like to comment on this article?
Sign up (it's quick and free) or sign in now.
Please read our Comment Policy before commenting.