This spring, services from heavy hitters like Google and Facebook seemed glitchy or inaccessible for people worldwide for more than an hour. But it wasn't a hack, or even a glitch at any one organization. It was the latest mishap to stem from design weaknesses in the "Border Gateway Protocol," the internet's foundational, universal routing system. Now, after years of slow progress implementing improvements and safeguards, a coalition of internet infrastructure partners is finally turning a corner in its fight to make BGP more secure.
Today the group known as Mutually Agreed Norms for Routing Security is announcing a task force specifically dedicated to helping "content delivery networks" and other cloud services adopt the filters and cryptographic checks needed to harden BGP. In some ways the step is incremental, given that MANRS has already formed task forces for network operators and what are known as "internet exchange points," the physical hardware infrastructure where internet service providers and CDNs hand off data to each others' networks. But that process coming to the cloud represents tangible progress that has been elusive up until now.
"With nearly 600 total participants in MANRS so far, we believe the enthusiasm and hard work of the CDN and cloud providers will encourage other network operators around the globe to improve routing security for us all," says Aftab Siddiqui, the MANRS project lead and a senior manager of internet technology at the Internet Society.
BGP is often likened to a GPS navigation service for the internet, enabling infrastructure players to swiftly and automatically determine routes for sending and receiving data across the complex digital topography. And like your favorite GPS mapping tool, BGP has quirks and flaws that don't usually cause problems, but can occasionally land you in major bridge traffic. This happens when entities like internet service providers "advertise a bad route," sending data on a haphazard, ill-advised journey across the internet and often into oblivion. That's when web services start to seem like they're down. And the risks from this BGP insecurity don't end with service disruptions—the weaknesses can also be exploited intentionally by bad actors to reroute data over networks they control for interception. This practice is known as "BGP hijacking" and has been used by hackers around the world, including by China, for espionage and data theft.
A handful of prominent CDNs have already been vocal about implementing BGP best practices and safeguards in their own systems and promoting them to others. After the so-called route leak in April, for example, Cloudflare launched a tool called "Is BGP Safe Yet?" to give regular web users insight into whether their internet service provider has implemented cryptographic route checks and filters. And on Wednesday, Google published an update on its efforts with MANRS to overhaul its own BGP infrastructure and convince industry contacts to do the same.
Organizations like Google and Cloudflare are increasingly motivated to back this change for the overall health of the internet, but also because BGP route leaks that result in outages reflect poorly on them regardless of where the issue actually originates. Those sorts of major organizations are key to driving adoption of these types of voluntary, cooperative technical changes, because they have relationships with infrastructure providers around the world.
"I spent 20 years in financial services doing cybersecurity for big banks, but a little over two years ago I joined Google, because you start to see that the societal dependence on this infrastructure is so great," says Royal Hansen, vice president of security engineering for Google Cloud. "My leverage was going to be so much bigger at a Google than it would ever be in one enterprise."
One of the main BGP safeguards MANRS promotes is RPKI, or "Routing Public Key Infrastructure," a public database of routes that have been cryptographically signed as a testament of their validity. RPKI adoptees publish the routes they offer and check the database to confirm others' routes, but the system can only eliminate route leaks and outages through universal adoption. If lots of ISPs or other organizations aren't using it, providers will still need to accept unsigned, meaning unvalidated, routes.
Over the last two years, though, RPKI has gained some real momentum, and is now in use by ISPs like AT&T, Telia, NTT, and Cogent. On Monday, the European network service provider RETN announced that it has implemented RPKI. And in November, Google completed RPKI registration for more than 99 percent of its routes.
"There is quite a bit of work that is involved in implementing all of this," says Bikash Koley, vice president of global networking at Google Cloud. "And once you have all of that information you have to create ways of utilizing it to apply to your BGP policies in the form of filters etc. It takes significant engineering work and if you do it wrong then you actually can significantly impact users. So with MANRS we're trying to make this as easy a lift as possible for organizations."
In addition to adopting and promoting route filters and RPKI, Koley says that Google also raises awareness about the importance of BGP security through its "peering portal." Peering is connection between two networks to let web traffic flow in a more efficient and stable way, and includes exchange of BGP routing information. Because Google has such extensive global peering relationships, its portal presents an opportunity for the company to essentially give its peers a nudge for how they're doing on BGP best practices. If they aren't applying filtering and other safeguards to the routes they announce to Google, the company will flag that. And early next year the portal will also show peers their RPKI status. Additionally, at the beginning of 2020 Google started contacting peers about potentially invalid routing information. And the company has been working with MANRS to publish information about how Google implements route filtering to make it easier for small organizations that peer with the company to adopt an established approach rather than having to start from scratch.
Though the push for universal adoption of BGP safeguards is far from over, recent momentum offers real evidence that it will be possible to eliminate or at least drastically reduce the risk of outages and data hijacking.
"I feel that there is a common awareness and urgency that is forming across the internet," Koley says. "And we are definitely moving very fast on this, because we need to."