War story: RPKI is working as intended
To be very forward, this really is a story about something that turned out to be no problem at all. But sometimes boring stories deserve to be told. To provide context for this one, we have to go back to February 2008. Back then - through no fault of their own - one of the world’s most popular video-sharing platforms suffered a disastrous multi-hour outage, interrupting millions of video viewings. The impact was so significant that even mainstream media reported extensively on what was essentially an arcane routing incident. But, nowadays we’re hearing less and less about incidents like these, even though the Internet is bigger than ever. Three weeks ago Fastly was the target of a BGP hijack, similar to what happened in 2008, but this time barely anyone noticed. Why is that? Something has changed. In this article, I’ll delve into one of the Internet’s most remarkable, yet untold, success stories.
A crash course on how Internet routing works
At its core, the Internet is a backbone spanning hundreds of thousands of interconnected routers managed by roughly 85,000 organizations to deliver data to millions of digital destinations. To establish what part of the Internet is attached where — what direction to send data packets to reach a given Internet destination (an IP address) — all these routers exchange messages with each other using an industry-standard protocol format called BGP. The totality of this whooshing exchange of routing information oftentimes is referred to as the global Internet routing system.
Internet Map by The Opte Project - Originally from the English Wikipedia, CC BY 2.5, https://commons.wikimedia.org/w/index.php?curid=1538544
One of the key factors for routers to decide which of many paths to use for sending data is the Longest Prefix Match (LPM) algorithm. In a nutshell: more detailed information about a destination is preferred over less granular information. Think of punching into your car’s navigation system your destination’s street and city versus inputting only the city name. Both approaches will bring you closer to your destination, but of course, being more specific is likely to result in a better route. Put differently, the Internet would not work without LPM.
A major contributor to the Internet’s amazing year-to-year growth is that basically anyone can easily connect to it and almost immediately start sending and receiving data. You hook your router up to neighboring routers from other organizations and then use BGP to send a message into the routing system. In doing so, you tell the Internet that your IP addresses are now reachable via a specified “nexthop”. The corollary is that the most obvious vulnerability in the routing system is unauthorized origination of routes to IP addresses. More on that thorny aspect in the next section!
What happened in 2008?
A large nation-state’s incumbent telecommunications operator was instructed to censor a popular video-sharing platform within its national borders. Of the various mechanisms to block access to a particular internet service, BGP is one of the simpler (albeit blunter) ways to blackhole undesired traffic. In the course of normal network operations, not every BGP message is intended or expected to be distributed into the global system. A network operator might intend for some BGP messages to only be distributed to its own routers for its own private purposes, constraining the scope to its own administrative domain.
Unfortunately — due to a configuration mistake — the BGP messages intended to comply with the country’s censorship order were also passed on to adjacent networks outside of the country, who, in turn, distributed them to their adjacent networks, and so on. In the blink of an eye, routers around the world received BGP messages that a specific set of the video platform’s IP addresses (remember the LPM algorithm!) were now being served from infrastructure in Pakistan. As that wasn’t at all where the video platform was actually attached, Internet data packets ended up being dropped on the floor, globally disrupting this video platform’s online presence. RIPE NCC did a good write-up on the technical details and NY Times, CNET, Ars Technica, and NBC News also covered the incident.
Fast forward to 2024
A very similar routing incident happened to Fastly just last week, but this time around no headlines were made. While this incident would’ve severely affected Fastly a few years ago, this time the impact was negligible. What gives? While the specific players and motivations differ from the famous 2008 incident, at its heart, the technical details were the same. In this more recent case, the state incumbent of another large nation generated BGP messages hijacking some of Fastly’s IP address space for the purpose of disrupting Internet traffic. What makes now different from then?
RPKI improves the routing system’s reliability
The big difference between 2008 and 2024 is that nowadays the Internet industry uses a cryptographically verifiable mechanism called RPKI to assess plausibility of BGP messages in a fully automated fashion. The RPKI is a distributed database through which networks can publish their routing intentions in Route Origin Authorizations (ROAs), in turn enabling other networks to validate BGP messages against this database using a service called Route Origin Validation (ROV). By rejecting messages that fail this validation, the RPKI-invalid routes can be kept out of circulation, limiting their ability to cause disruption.
Publishing ROAs is easy! All five RIRs offer RPKI certification services as part of their standard membership services. Since Fastly publishes ROAs for all of its IP addresses, Internet Exchanges and major carriers like NTT, Comcast, AT&T, Cogent, Arelion, and Lumen can automatically ignore problematic BGP messages (like as the ones that were hijacking Fastly’s IP space in this latest incident!). Because the industry at large is using RPKI, the only measurable impact on our traffic delivery was towards the disruptor itself, the rest of the world remained oblivious of this incident. A very serious BGP hijack happened and Fastly came out just fine. RPKI works as intended.
RPKI is a mature technology
RPKI’s story started two decades ago when X.509 certificate syntax was extended to support encoding IP addresses and Autonomous System numbers via RFC 3779. (X.509 is the underpinning of web security mechanisms like “https://” that we’re all familiar with.) In the following years, a design for an architecture materialized, imposing order on the unwieldy ever-growing global routing system (RFC 6480). Then the five RIRs (APNIC, ARIN, RIPE NCC, LACNIC, and AfriNIC) got to work to build user-facing systems through which operators can configure ROAs. In 2018 and 2019, open-source projects like rpki-client and routinator were kicked off to securely bridge the gap between the RIR systems and BGP routers. And finally, in the year 2020, there was a sharp increase in the adoption of RPKI by the largest ISPs, IXPs, and cloud providers enabling the RPKI system to be more effective in providing broad benefits to the Internet.
Conclusion
Realizing fundamental changes, like what RPKI did for the Internet, is a matter of extreme patience and perseverance. This is because the Internet, by design, has no centralized or top-down administration. The Internet’s routing system is a voluntary collaboration between close to 100,000 organizations. Change comes from leading by example, educational outreach to peers and business partners, and an iterative engineering approach to resolve any obstacles discovered along the way. Hundreds of engineers and scientists cumulatively dedicated hundreds of years to meticulously perform heart surgery on a running system, embracing RPKI and improving Internet reliability. Even now work continues in the IETF to further improve the dependability and performance of the RPKI.
It came as no surprise to me when the Executive Branch of the US Government recognized the societal benefits of using RPKI and endorsed the technology to begin to address the vulnerabilities inherent in BGP. Ultimately, the RPKI is a system that helps networks stay in their own lane, allowing everyone to safely zip along the digital highway.
Joel Jaeggli (Fastly), Tony Tauber (Comcast), and Doug Madory (Kentik) contributed to this article.