15.8 C
New York
Sunday, May 19, 2024

How Self-Driving Car Makers Measure Their Own Progress

logoThe AI Database →



Human-computer interaction

Personal services



End User

Big company






Source Data




Machine learning

Machine vision


It’s report card season for self-driving cars. On Wednesday, the California Department of Motor Vehicles released reports detailing how much the companies permitted to test autonomous vehicles in the state drove last year, and how often their human safety operators had to take control from the computer. The “disengagement reports” provide a rare glimpse into the workings of companies developing robots on public streets.

But it’s too bad the reports are nearly useless for gauging how close we are to the age of autonomy. First off, the companies use different jargon to explain various disengagements. They only cover California, while most of the big players do the majority of their testing elsewhere—Waymo around Phoenix, Argo in Pittsburgh and Miami, and Aptiv in Las Vegas, to name a few.

Want the latest news on self-driving cars in your inbox? Sign up here!

More fundamentally, disengagements are a poor way to measure progress. They’re not good for comparing companies, because rivals test in different places (Cruise in complex San Francisco, Waymo in calmer suburbs, and so on). The companies also follow different protocols: Some tell their drivers to take control in school zones or when emergency vehicles are nearby, generating disengagements in spots where the vehicle might have done just fine. Perhaps most damning is that the best way to limit disengagements—racking up miles in easy, well-studied areas—is a bad way to improve an autonomous system. Waymo said Wednesday the reports don't "provide relevant insights" into its self-driving program "or distinguish its performance from others in the self-driving space."

So how do the companies track their progress? Some metrics are straightforward. If your vision system is only detecting 98 percent of pedestrians, your machine-learning algorithm probably needs to study more examples, in the hope of getting beyond 99.99 percent. At least once a month, Matt Johnson-Roberson, CEO of Refraction AI, goes over such stats, along with things like how often the computers crash and how reliably Refraction’s vehicles follow their software’s instructions. Refraction is building a small robot that sticks to the bike lane, making food deliveries in Ann Arbor, Michigan.

While the startup and its competitors have their particular ways of measuring progress, most appear to focus less on how many miles they’ve driven than on the range of situations they can navigate safely.

First step: Consider what the vehicle will have to do. The go-anywhere, anytime robocar is likely decades away; most developers are targeting a niche constrained by geography, road type, and driving conditions. Cruise’s cars will have to handle all of San Francisco, which effectively means they have to be able to do anything a human can—unprotected left turns, four-way stops, roundabouts, the crazy steep streets that made the Bullitt car chase so fun. Optimus Ride and Voyage are going after retirement communities and other circumscribed areas, which require fewer capabilities.

You make a list of those capabilities, something like a syllabus, that you need to teach the car. The companies testing today started with fundamentals like writing the code that tells a car to pick out and stay between lane lines. Then you might add changing lanes, merging onto a highway, or slowing for another driver cutting into your lane. Any time you change the software controlling the car, you first try it in computer simulation, to see how it works and identify bugs. Then you typically put it into a vehicle for testing on a private track in controlled conditions. Once it’s proven there, you can move onto public roads. Waymo, for example, has driven 20 million miles in the real world—and more than 10 billion in the virtual one.

As each function improves, “you can start crossing them off the list,” says Don Burnette, who runs the self-driving truck outfit Kodiak Robotics. “How many features do you still have left to implement? How many features have you included? That is a very good indicator of progress for a company”—one that Kodiak uses internally.

At the same time, you make each feature more capable. If you’re working on lane changing, you start with no other vehicles around, focusing on a human-like trajectory and speed. (Again, this work happens first in simulation, then in the real world.) Then you add a few cars to the scene, then more cars, so yours has to decide when it’s safe to move into smaller and smaller gaps. Eventually, you work on creating a gap, the way a human driver nudges another to let him in. It’s the same way you teach a person a new thing, say how to speak French: Begin with “combien coûte une madeleine,” and work your way up to reading Proust.


Once you’ve crossed everything off your list of capabilities, you have a “feature complete” system. The height of that bar—an environment like a major city calls for a nearly endless list of skills—helps explain why so many self-driving outfits are pursuing more limited business models like truckling and shuttle vans. Unsurprisingly, the ever-confident Elon Musk is the rare person to claim victory. “I think we will be ‘feature-complete’ on full self-driving this year,” Musk said in early 2019. “Meaning the car will be able to find you in a parking lot, pick you up, take you all the way to your destination without an intervention this year.” In an earnings call last month, he explained that “feature complete just means it has some chance of going from your home to work with no interventions."

Still, the gulf between “feature complete” and “mission accomplished” is wide. Take Smart Summon, which Tesla released in September to autonomously guide a car from a parking spot to where its owner is standing. Anecdotal evidence says that mostly it works—except for when the car confuses asphalt and grass, freezes, or pins itself against a garage door.

So once you’ve added a feature to your code base, you have to ensure it works in as many situations as possible. That’s where simulation is crucial, says Chris Urmson, who led Waymo in its early years and is now CEO of Aurora, which is developing self-driving tech for a variety of applications, including trucking. Last year, when Urmson’s team was working on unprotected turns, they first sent out human drivers on fact-finding missions. They were interested in sampling the variety of life: how quickly or slowly human drivers moved through different kinds of intersections, how badly a truck might block a car’s view of oncoming traffic, and so on. They loaded the results into their simulation software, then made variations by “fuzzing” the details—making slight changes to other actors’ positions, speed, and so on. Before trying any actual left turns into traffic, Urmson says, Aurora ran more than 2 million experiments in simulation, continually honing how its system hanged louies.

Then they took their robots to the streets to validate their computer learnings in the real world. Aurora’s safety operators noted unusual situations and moments where the vehicle didn’t behave the way they’d have liked, which typically led to disengaging the autonomous system. Rather than focusing on the number of times they retook control, Aurora’s engineers used those moments as fodder for more simulation, more fuzzing, and more tweaks that improve the car’s skills.

At some point, Urmson and his team will decide their system has flashed its skills in enough scenarios to enter the world without a human behind the wheel. Different developers will pull that trigger at different points, because nobody can agree on the much-fretted-over question: How safe is safe enough? That includes regulators. The federal Department of Transportation has offered only vague guidelines for developing safe systems. Many states have welcomed AV developers without imposing any technical requirements. California stands out: More than 60 companies are permitted to test their tech in the state, but just five have secured permission from the Public Utility Commission to carry passengers.

Don’t expect that light-handed arrangement to change, says Bryant Walker Smith, a professor at the University of South Carolina School of Law who studies automated vehicle policy. These vehicles run complex software in a complex environment. Regulators and the public won’t have the expertise, resources, or time to fully understand how this all works, he adds. No company is likely to drive the number of miles it would take to offer statistical proof its creation is as capable (or more so) as a human. Which means everybody will have to take a leap of faith, or at least a hop, Walker Smith says. “It’s up to the company developing and deploying that technology to be worthy of our trust.”

Refraction AI’s robots are unlikely to hurt anyone too badly, since they move between 10 and 12 mph. So the team can look past safety to another metric: the cost of each delivery. Recently, engineers spent about a month working on four-way stops. They got the robot to a point where it “never failed,” Johnson-Roberson says, but only because it was so conservative, waiting seven or eight minutes to make its move. So they decided to avoid the problem altogether, sending the bot on another route or having a human remotely guide it. (Teleoperation is an under appreciated but vital tool for making any self-driving system work.) This works because Refraction’s future doesn’t hinge on mastering the tricky nature of the four-way stop. The only metric that matters is whether it gets University of Michigan students their burgers and fries before they get cold.

Related Articles

Latest Articles