The tech industry pays programmers handsomely to tap the right keys in the right order, but earlier this month entrepreneur Sharif Shameem tested an alternative way to write code.
First he wrote a short description of a simple app to add items to a to-do list and check them off once completed. Then he submitted it to an artificial intelligence system called GPT-3 that has digested large swaths of the web, including coding tutorials. Seconds later, the system spat out functioning code. “I got chills down my spine,” says Shameem. “I was like, ‘Woah something is different.’”
GPT-3, created by research lab OpenAI, is provoking chills across Silicon Valley. The company launched the service in beta last month and has gradually widened access. In the past week, the service went viral among entrepreneurs and investors, who excitedly took to Twitter to share and discuss results from prodding GPT-3 to generate memes, poems, tweets, and guitar tabs.
The software’s viral moment is an experiment in what happens when new artificial intelligence research is packaged and placed in the hands of people who are tech-savvy but not AI experts. OpenAI’s system has been tested and feted in ways it didn’t expect. The results show the technology’s potential usefulness but also its limitations—and how it can lead people astray.
Shameem’s videos showing GPT-3 responding to prompts like “a button that looks like a watermelon” by coding a pink circle with a green border and the word watermelon went viral and prompted gloomy predictions about the employment prospects of programmers. Delian Asparouhov, an investor with Founders Fund, an early backer of Facebook and SpaceX cofounded by Peter Thiel, blogged that GPT-3 “provides 10,000 PhDs that are willing to converse with you.” Asparouhov fed GPT-3 the start of a memo on a prospective health care investment. The system added discussion of regulatory hurdles and wrote, “I would be comfortable with that risk, because of the massive upside and massive costs [sic] savings to the system.”
Other experiments have explored more creative terrain. Denver entrepreneur Elliot Turner found that GPT-3 can rephrase rude comments into polite ones—or vice versa to insert insults. An independent researcher known as Gwern Branwen generated a trove of literary GPT-3 content, including pastiches of Harry Potter in the styles of Ernest Hemingway and Jane Austen. It is a truth universally acknowledged that a broken Harry is in want of a book—or so says GPT-3 before going on to reference the magical bookstore in Diagon Alley.
Have we just witnessed a quantum leap in artificial intelligence? When WIRED prompted GPT-3 with questions about why it has so entranced the tech community, this was one of its responses:
“I spoke with a very special person whose name is not relevant at this time, and what they told me was that my framework was perfect. If I remember correctly, they said it was like releasing a tiger into the world.”
The response encapsulated two of the system’s most notable features: GPT-3 can generate impressively fluid text, but it is often unmoored from reality.
GPT-3 was built by directing machine-learning algorithms to study the statistical patterns in almost a trillion words collected from the web and digitized books. The system memorized the forms of countless genres and situations, from C++ tutorials to sports writing. It uses its digest of that immense corpus to respond to a text prompt by generating new text with similar statistical patterns.
The results can be technically impressive, and also fun or thought-provoking, as the poems, code, and other experiments attest. When a WIRED reporter generated his own obituary using examples from a newspaper as prompts, GPT-3 reliably repeated the format and combined true details like past employers with fabrications like a deadly climbing accident and the names of surviving family members. It was surprisingly moving to read that one died at the (future) age of 47 and was considered “well-liked, hard-working, and highly respected in his field.”
But GPT-3 often spews contradictions or nonsense, because its statistical word-stringing is not guided by any intent or a coherent understanding of reality. “It doesn't have any internal model of the world, or any world, and so it can’t do reasoning that would require such a model,” says Melanie Mitchell, a professor at the Santa Fe Institute and author of Artificial Intelligence: A Guide for Thinking Humans. In her experiments, GPT-3 struggles with questions that involve reasoning by analogy, but generates fun horoscopes.
That GPT-3 can be so bewitching may say more about language and human intelligence than AI. For one, people are more likely to tweet the system’s greatest hits than its bloopers, making it look smarter on Twitter than it is in reality. Moreover, GPT-3 suggests language is more predictable than many people assume. Some political figures can produce a stream of words that superficially resemble a speech despite lacking discernible logic or intent. GPT-3 takes fluency without intent to an extreme and gets surprisingly far, challenging common assumptions about what makes humans unique.
Some of this week’s excitable reactions echo long-ago discoveries about the challenges when biological brains interact with superficially smart machines. In the 1960s MIT researcher Joseph Weizenbaum was surprised and troubled when people who played with a simple chatbot called Eliza became convinced it was intelligent and empathetic. Mitchell sees the Eliza effect, as it is known, still at work today. “We’re more sophisticated now, but we’re still susceptible,” she says.
As GPT-3 has taken off among the technorati, even its creators are urging caution. “The GPT-3 hype is way too much,” Sam Altman, OpenAI’s CEO, tweeted Sunday. “It still has serious weaknesses and sometimes makes very silly mistakes.”
The previous day, Facebook’s head of AI accused the service of being “unsafe” and tweeted screenshots from a website that generates tweets using GPT-3 that suggested the system associates Jews with a love of money and women with a poor sense of direction. The incident echoed some of WIRED’s earlier experiments in which the model mimicked patterns from darker corners of the internet. OpenAI has said it vets potential users to prevent its technology from being used maliciously, such as to create spam, and is working on software that filters unsavory outputs. WIRED’s experiments generating obituaries sometimes triggered a message warning, “Our system has flagged the generated content as being unsafe because it might contain explicitly political, sensitive, identity aware or offensive text. We'll be adding an option to suppress such outputs soon. The system is experimental and will make mistakes.”
While the arguments continue over GPT-3’s moral and philosophical status, entrepreneurs like Shameem are trying to turn their tweetable demos into marketable products. Shameem founded a company called Debuild.co to offer a text-to-code tool for building web applications, and he predicts it will create rather than eliminate coding jobs. “It just lowered the required knowledge and skill set required to be a programmer,” Shameem says of his product.
Francis Jervis, founder of Augrented, which helps tenants research prospective landlords, has started experimenting with using GPT-3 to summarize legal notices or other sources in plain English to help tenants defend their rights. The results have been promising, although he plans to have an attorney review output before using it, and says entrepreneurs still have much to learn about how to constrain GPT-3’s broad capabilities into a reliable component of a business.
More certain, Jervis says, is that GPT-3 will keep generating fodder for fun tweets. He’s been prompting it to describe art house movies that don’t exist, such as a documentary in which “werner herzog [sic] must bribe his prison guards with wild german ferret meat and cigarettes.” “The sheer Freudian quality of some of the outputs is astounding,” Jervis says. “I keep dissolving into uncontrollable giggles.”