Natural language processing
Glean provides tools for searching through applications like Gmail, Slack, and Salesforce. Qi says new AI techniques for parsing language would help Glean’s customers unearth the right file or conversation a lot faster.
But training such a cutting-edge AI algorithm costs several million dollars. So Glean uses smaller, less capable AI models that can’t extract as much meaning from text.
AI has spawned exciting breakthroughs in the past decade—programs that can beat humans at complex games, steer cars through city streets under certain conditions, respond to spoken commands, and write coherent text based on a short prompt. Writing in particular relies on recent advances in computers’ ability to parse and manipulate language.
Those advances are largely the result of feeding the algorithms more text as examples to learn from, and giving them more chips with which to digest it. And that costs money.
Consider OpenAI’s language model GPT-3, a large, mathematically simulated neural network that was fed reams of text scraped from the web. GPT-3 can find statistical patterns that predict, with striking coherence, which words should follow others. Out of the box, GPT-3 is significantly better than previous AI models at tasks such as answering questions, summarizing text, and correcting grammatical errors. By one measure, it is 1,000 times more capable than its predecessor, GPT-2. But training GPT-3 cost, by some estimates, almost $5 million.
“If GPT-3 were accessible and cheap, it would totally supercharge our search engine,” Qi says. “That would be really, really powerful.”
The spiraling cost of training advanced AI is also a problem for established companies looking to build their AI capabilities.
Dan McCreary leads a team within one division of Optum, a health IT company, that uses language models to analyze transcripts of calls in order to identify higher-risk patients or recommend referrals. He says even training a language model that is one-thousandth the size of GPT-3 can quickly eat up the team’s budget. Models need to be trained for specific tasks and can cost more than $50,000, paid to cloud computing companies to rent their computers and programs.
McCreary says cloud computing providers have little reason to lower the cost. “We cannot trust that cloud providers are working to lower the costs for us building our AI models,” he says. He is looking into buying specialized chips designed to speed up AI training.
Part of why AI has progressed so rapidly recently is because many academic labs and startups could download and use the newest ideas and techniques. Algorithms that produced breakthroughs in image processing, for instance, emerged from academic labs and were developed using off-the-shelf hardware and openly shared data sets.
Over time, though, it has become increasingly clear that progress in AI is tied to an exponential increase in the underlying computer power.
Big companies have, of course, always had advantages in terms of budget, scale, and reach. And large amounts of computer power are table stakes in industries like drug discovery.
Now, some are pushing to scale things up further still. Microsoft said this week that, with Nvidia, it had built a language model more than twice as large as GPT-3. Researchers in China say they’ve built a language model that is four times larger than that.
“The cost of training AI is absolutely going up,” says David Kanter, executive director of MLCommons, an organization that tracks the performance of chips designed for AI. The idea that larger models can unlock valuable new capabilities can be seen in many areas of the tech industry, he says. It may explain why Tesla is designing its own chips just to train AI models for autonomous driving.
Some worry that the rising cost of tapping the latest and greatest tech could slow the pace of innovation by reserving it for the biggest companies, and those that lease their tools.
“I think it does cut down innovation,” says Chris Manning, a Stanford professor who specializes in AI and language. “When we have only a handful of places where people can play with the innards of these models of that scale, that has to massively reduce the amount of creative exploration that happens.”
Ten years ago, Manning says, his lab had enough computing resources to explore any project. “One PhD student working hard could be producing work that was state-of-the-art,” he says. “It seems like that window has now closed.”
At the same time, the rising cost is pushing people to look for more efficient ways of training AI algorithms. Dozens of companies are working on specialized computer chips for both training and running AI programs.
Qi of Glean and McCreary of Optum are both talking to Mosaic ML, a startup spun out of MIT that is developing software tricks designed to increase the efficiency of machine-learning training.
The company is building on a technique developed by Michael Carbin, a professor at MIT, and Jonathan Frankle, one of his students, that involves “pruning” a neural network to remove inefficiencies and create a much smaller network capable of similar performance. Frankle says early results suggest that it should be possible to cut the amount of computer power needed to train something like GPT-3 in half, reducing the cost of development.
Carbin says there are other techniques for improving the performance of neural network training. Mosaic ML plans to open-source much of its technology but also offer consulting services to companies keen to lower the cost of the AI deployment. One potential offering: a tool to measure the trade-offs among different methods in terms of accuracy, speed, and cost, Carbin says. “Nobody really knows how to put all of these methods together,” he says.
Kanter of MLCommons says Mosaic ML’s technology may help well-heeled companies take their models to the next level, but it could also help democratize AI for companies without deep AI expertise. “If you can cut the cost, and give those companies access to expertise, then that will promote adoption,” he says.