David Ferrucci, the man who built IBM’s Jeopardy!-playing machine, Watson, is explaining a children’s story to his new creation.
In the tale, Fernando and Zoey buy some plants. Fernando places his plant on a windowsill, while Zoey tucks hers away in a darkened room. After a few days, Fernando’s plant is green and healthy, but the leaves of Zoey’s have browned. She moves her plant to the windowsill and it flourishes.
A question appears on the screen in front of Ferrucci: “Does it make sense that Fernando put his plant in the window because he wants it to be healthy? The sunny window has light and the plant needs to be healthy.”
The question is part of an effort by Ferrucci’s artificial intelligence system to learn how the world works. It might be obvious to you or me why Fernando put his plant in the window. But it is surprisingly difficult for an AI system to grasp.
Ferrucci and his company, Elemental Cognition, hope to fix a huge blind spot in modern AI by teaching machines to acquire and apply everyday knowledge that lets humans communicate, reason, and navigate our surroundings. We use common-sense reasoning so often, and so easily, that we barely notice it.
Ernest Davis, a professor at NYU who has been studying the problem for decades, says common sense is essential for advancing everything from language understanding to robotics. It is “central to most of what we want to do with AI,” he says.
Davis says machines need to master fundamental concepts like time, causality, and social interaction in order to demonstrate real intelligence. “This is the large obstacle that the current approaches are having serious trouble with,” he says.
The latest wave of AI advances, built on a mix of machine learning and big data, has given us gadgets that respond to spoken commands and self-driving cars that recognize objects on the road ahead. They’re amazing, but they have zero common sense. Alexa and Siri can tell you about a species of plant by reciting from Wikipedia, but neither seems to know what happens if you leave one in the dark. A program that’s learned to recognize obstacles on the road ahead doesn’t typically understand why it’s more important to avoid people than traffic cones.
Back at Ferrucci’s computer, the researcher clicks an on-screen “yes” button in response to the question about Fernando’s plant. On a server somewhere, an AI program known as CLARA adds that information to a library of facts and notions—a kind of artificial common-sense knowledge. Like an endlessly inquisitive child, CLARA, which stands for Collaborative Learning and Reading Agent, asks Ferrucci another question about the plant story, then another, and another, attempting to “understand” why things unfold the way they do. “Can we ever get machines to actually understand what they read?” he says. “That's a very hard thing, and that’s ultimately what Elemental Cognition is about.”
Ferrucci has been working at the problem for some time. A decade ago, when he led the development of IBM’s Watson, having a computer answer Jeopardy! questions seemed near impossible. Yet in 2011, Watson crushed several human champions in a widely publicized version of the show. Watson parsed reams of text to find nuggets of trivia suggesting answers to Jeopardy! questions. It was a crowning achievement for AI, but the absence of any real understanding was all too apparent. On live TV, for example, the machine responded to a clue in the category of “US Cities” with “What is Toronto?”
Ferrucci says Watson’s limitations, and the hype around the project, propelled him to try building machines that better understand the world. IBM has since turned Watson into a brand that refers to a bewildering range of technologies, many unrelated to the original machine.
A year after the Jeopardy! match, Ferrucci left to form Elemental Cognition. It has so far been funded by Bridgewater Associates, a hedge fund created by Ray Dalio that manages roughly $160 billion, and three other parties. Elemental Cognition operates on Bridgewater’s campus, in lush woodland overlooking a lake in Westport, Connecticut.
Not long after Watson’s triumph, AI was transformed. Deep learning, a means of teaching computers to recognize faces, transcribe speech, and do other things by feeding them large amounts of data, emerged as a powerful tool, and it has been applied in ever more ways.
Over the past couple of years, deep learning has produced striking progress in language understanding. Feeding a particular kind of artificial neural network large amounts of text can produce a model capable of answering questions or generating text with surprising coherence. Teams at Google, Baidu, Microsoft, and OpenAI have built ever larger and more complex models that are progressively better at handling language.
And yet, these models are still bedeviled by a lack of common sense. For instance, Ferrucci’s team gave an advanced language model the story involving Ferdanando and Zoey, and asked it to complete the sentence “Zoey moves her plant to a sunny window. Soon …”. Failing to grasp the notion that plants thrive in sunlight, it generated a series of bizarre endings based purely on statistical pattern matching: “she finds something, not pleasant,” “fertilizer is visible in the window,” and “another plant is missing from the bedroom.”
CLARA aims to go further by combining deep-learning techniques with more old-fashioned ways of building knowledge into machines, through explicit logical rules, like the fact that plants have leaves and need light. It uses a statistical method to recognize concepts like nouns and verbs in sentences. It also has a few pieces of what’s known as “core knowledge,” like the fact that events happen in time and cause other things to happen.
Knowledge about specific subjects is crowdsourced from Mechanical Turkers and then built into CLARA. This might include, for example, that light causes plants to thrive, and windows allow light in. In contrast, a deep-learning model fed the right data might be able to answer questions about botany correctly, but it might not.
It would take a long time to hand-craft every possible piece of common-sense knowledge into the system, as previous efforts to build knowledge engines by hand have sadly demonstrated. So CLARA combines the facts it’s given with deep-learning language models to generate its own common sense. In the plant story, for example, this might allow CLARA to conclude for itself that being in a window helps make plants green.
CLARA also gathers common sense by interacting with users. And if it comes across a contradiction, it can ask which statement is most often true.
“It's a very challenging enterprise, but I think it's an important vision and goal,” says Roger Levy, a professor at MIT who works at the intersection of AI, language, and cognitive science. “Language is not just a set of statistical associations and patterns—it also connects with meaning and reasoning, and our common sense understanding of the world.”
It’s hard to say how much progress Ferrucci has made towards giving AI common sense, in part because Elemental Cognition is unusually secretive. It recently published a paper arguing that most efforts at machine understanding fall short, and should be replaced by ones that ask deeper questions about the meaning of text. But it hasn’t published details of its system or released any code.
Scaling such a complex system beyond simple stories and basic examples will likely prove tricky. Ferrucci seems to be looking for a company with deep pockets and a large number of users to help. If people could be persuaded to help a search engine or a personal assistant build common-sense knowledge, that could accelerate the process. Another possibility Ferrucci suggests is a program that asks students questions about a piece of text they have read, to both check they understand it and build its own knowledge base.
“If there was an institution that wanted to invest, I’m open to having that conversation,” Ferrucci says. “I don’t need money right now, but I would love to work out a partnership or an acquisition or whatever.”
CLARA isn’t the only common-sense AI in town. Yejin Choi, a professor at the University of Washington and a researcher at the Allen Institute for AI, recently led the development of another method for combining deep learning and symbolic logic, known as COMET. This program gets confused less frequently than pure deep-learning language models when conversing or answering questions, but it still gets tripped up sometimes.
Choi says she’d like to see the inner workings of CLARA before passing judgment. “At a high level it makes sense,” she says when given a rough description. “I think they can make some toy examples, but I find it hard to believe one can really make it work for general-purpose common sense.”
Davis at NYU isn’t sure that common-sense AI is ready for its Watson moment. He suspects that fundamental breakthroughs may be needed for machines to learn common sense as effortlessly as humans. For example, he says, it’s unclear how machines could grasp uncertain meanings. “There seems to be something serious we’re missing,” Davis says. “There are aspects of it that we haven’t gotten anywhere near.”