I was recently at a meeting where BGP RPKI was the topic de jour. While this has been a topic that I have visited on occasion of the last few years and something I wanted to spend significant time on, I have found that setting aside the time has been difficult and sparse, much like the deployment of BGP RPKI. In order to better understand the options available, it’s important to break down the pieces and terminology involved; BGP is daunting enough to those unfamiliar with it and adding PKI on top of that can be even more so. So, what are these bits and pieces and why have we not done more to adopt it?
- BGP – the venerable and foundational protocol that quite literally runs the internet. BGP allows an organization to announce and exchange it’s IPv4 and IPv6 resources with other organizations.
- Resource Public Key Infrastructure – From ARIN: Using cryptographically verifiable certificates, RPKI allows IP address holders to specify which Autonomous Systems (AS’s) are authorized to originate their IP address prefixes. These statements, known as Route Origin Authorizations (ROAs), allow network operators to make informed routing decisions, and help secure Internet routing in general.
- ROA – Route Origin Authorizations – Who is authorized to originate or source these prefixes?
- ROV – Route Origin Validation – Can we validate, cryptographically, that the origin of this resource (for example, 8.8.8.8) is authorized to originate or source these prefixes?
Many of the details of RPKI are actually not even technical, both with ARIN and with vendors, there exists complications that make the technical pieces look like child’s play. Legality of the licensing and housing of authoritative certificates are complicated, shrouded in legalese due to the nature of what they represent. This proves to be a show stopper for a lot of older organizations that have grandfathered address space and may not have gone through the ARIN agreement process any time in the recent past [note: supposedly this process is required for ARIN IPv6 space, so most of that should be covered. Other regions may have easier, lower hurdles – I’ve heard good things about RIPE’s process – but I have no experience with it. The legal issues with contracts and signing certificates may also prove to be troublesome to enterprises, most of which are very conservative and slow to adopt anything new ::cough:: IPv6 ::cough::. Certificate generation and maintenance is also considered difficult in many cases, the details can be confusing and the documentation overwhelming or difficult to find.
On top of those non-technical issues, platforms with large install bases are sparsely supported – although this is changing.
There have been a number of recent, high profile route hijacks. If you’re ever curious to see what they were, BGPMon typically has a post-mortem shortly after they occur. This is a big deal. There are a number of ways they could have been prevented, too one of which is BGP RPKI. Is that worth the overhead? I think anyone that has had to scramble to figure out what was going on during one of these events would argue yes. But even with that, we have minimal adoption. With the significant movement to cloud based work, enterprises, service providers, academic institutions and even savvy end users should ask themselves a few important questions before doing your risk analysis:
- Do your cloud providers do RPKI?
- Does your upstream service provider or peering partner honor invalid routes?
- Does your upstream service provider or peering partner even support prefix or AS-Path filtering?
- How does the effect BGP blackhole routing for security filtering?
- How do you deal with invalid routes? (alert, set preference, drop, etc.)
- How do you handle invalid routes that may be more specific?
- Why have I not deployed BGP RPKI yet?
My perspective:
Unfortunately, the feeling I get is very similar to that of things [that we actually need] such as IPv6 and DNSSec. Tools that make our resources safer, but are often overlooked due to increased operational complexity.
It is really about risk analysis: Is the risk of not having this worth the effort of maintaining it? Since we [as an internetworking ecosystem] been clear in our actions that necessary things as obvious and straightforward as appropriate prefix filtering and IPv6 deployment are in many cases above and beyond, it should come as no surprise that adding complexity on top of “stuff that just works” is’t happening on a large scale. Even with repeated proof that changes would alleviate risk, we don’t do it because what we have has been deemed “good enough”* and change is scary and hard.
What do we need? We need tools. The RPKI dashboard that SURFnet has available is a fantastic example of an easy to use tool that makes a ton of information available. We need a very easy way to deploy this. BGP is actually a pretty simple protocol to make work, which is why is hasn’t changed much in the extremely long tenure it has had as foundation of the internet. We need a very, very straightforward way to get the non-technical pieces done.
Disclaimer:
This is just the tip of the iceberg and I am but a novice with this tech – but my gut feelings are typically right about this type of thing.
* “Good enough” is almost never good enough.
It’s almost always the organisational part that is the hard part.
Look at DNSSEC, the technical part is easy to deploy these days: see PowerDNS for example. Now getting someone in your organisation to deal with DS-record and know what to do when migrating a domain… well, that’s a whole process, that’s the difficult part of the story. I do however believe it could be simpler though. At least part of it could be automated. Just look at the APIs that are available at top-level-domains. Problem is: most organisations are not direct members/customers/whatever of the top-level-domains. They have a middle man. The registrar. They need to have an automated process with a customer-API as well, many do not. And standardization doesn’t seem to exist either.
there’s a cost/benefit planning error, similar to DNSSEC.
if you deploy BGP RPKI in “hard mode” where correct signatures aren’t merely a BGP priority echelon but are *required* for a route to make into your FIB, then any time somebody somewhere FUBARs their keys or signatures, they will be unreachable to you. your competitors will, on those days, operate more reliable networks.
so (and this again parallels the DNSSEC case) you’re left looking at the risks of deploying vs. not. if you deploy, there will be N FUBARs during which your network will be less reliable than your competitors’. if you do not deploy, there will be M attacks in which someone somewhere won’t receive all of their own traffic but you and your competitors will be equally unable to reach that victim’s network that day. and in any case N is at least an order of magnitude higher than M, and always shall be.
the best case scenario is a significant last-mover advantage, wherein we all jostle each other to be the last one into the burning building, but we all get in there eventually, for reasons never feel.
there never was a strategic global business plan for RPKI|DNSSEC|IPV6 deployment. we’re winging it.
While in general I agree that what is required are intelligent / mature implementations that mitigate the risk and complexity of deployment / use (I think we are only just getting there with some DNSSEC products), I would like to note that the risk of error here is slightly different than DNSSEC.
I have yet to see a RPKI origin validation policy mechanism that is the equivalent of “require valid”. Hard mode in RPKI OV is “ignore invalid”. FUBARing your own ROA doesn’t automatically make your route invalid. It requires another valid ROA created by you or your parent bound to a different origin (or with shorter maxlength than your announcement). If this other covering ROA is for a route that your downstreams receives, even there, all you end up with is an alternate route.
Again, fully agreeing that a great deal of focus needs to be given the the robustness, transparency, and security of the underlying trust infrastructure … just pointing out that shooting yourself in the foot is not as easy, or at least different, than in some other technologies.
I have always had a hard time summarizing my thoughts on crypto, but this is a good summary: It’s not the crypto, it’s all the bad practices that end up surrounding it.
I heard about this I think from the main ietf list: http://www.cs.auckland.ac.nz/~pgut001/pubs/crypto_wont_help.pdf
and I *think* this is the same talk: https://www.youtube.com/watch?v=_ahcUuNO4so
There’s truth above that would apply to rpki. Sharon Goldberg has done a lot of work here and I share that concern for maleficent actors in a trusted root hierarchy system. There’s related concerns at the RIR level wrt how one uses arin’s HSM and such.
It’s not the crypto.