I want to preface this by saying that I have not seen or worked on the cumulus networks system yet. This is a stream of consciousness post on my thoughts and opinions based on what I’ve read publicly.
Recently a new network player has emerged on the scene with a very simple, straightforward idea. Take linux and put it on a switch. While this isn’t exactly new (see Juniper and FreeBSD, Arista with Linux, Force10 with NetBSD or the plethora of other vendors using an opensource OS as the underpinnings of their NOS), the angle that cumulus networks is taking is a bit more….raw.
Take a standard Cisco or Juniper box. The engineers that work on these devices know a pseudo-programming language. They program the routers using that language and syntax. They’re familiar with it and it is well documented. Heck, aside from JunOS, Enterasys and the Alcatel-Lucent, 90% of the NOS you see on equipment is arguably derivative of Cisco‘s venerable IOS. Automation is [unfortunately] not nearly as common as it could be on network gear. Sure, there are a handful of opensource tools, overpriced, bloated and [arguably] poorly functioning tools available to “automate” and “manage” equipment, but none are that great, especially if you have a large, diverse environment.
Enter the sysadmin. Sysadmins have been automating tasks for as long as I’ve been around. cfengine, home grown tools and now puppet and the like are marvelous tools that are a makeshift “controller” for managing distributed systems. Big Iron cluster admins have had tools like this forever as well. How often are these people the same? In a large environment where there is a need for automation I will wager that the answer to that question is “not very often”.
This is my concern with cumulus networks. Make no mistake, I love the simplicity and idea behind it. I think that it’s wonderfully simple and innovative. It’s the “duh” of networking OS choices, but it introduces a lot of issues, many of which aren’t really pleasant to solve (and only a handful are actually technical). You’ve got the fractured nature of IT in large environments. Server guys aren’t often the same as networks guys. Often they are in totally different silos. This is problematic due to territoriality not to mention subject matter expertise. Am I as a network engineer of 15+ years comfortable running *IX systems? Sure. Is everyone? Likely not and even less likely on a large scale like this.
Automation is key, but even running puppet systems isn’t exactly a network engineers core competency. I think that building those systems will be either cost prohibitive (either time or salary) or will just be a non-started because it’s “too different”. Let me reiterate that I think it’s a very cool idea and know that it’d probably take a week or so to make work for any decent sysadmin or system savvy network engineer, but the large swath is going to recoil. The lack of a single configuration file is also off putting to many network engineers. Management systems may be able to handle some of these functions, but nothing is easier than nearly zero touch deployment and a single config file is a poor-mans, widely accepted method.
They have a quagga instance their version which provides a “non-model CLI”, and if the quagga is anything like what I’ve used it can sorta be used to manage the network, but is a poor overlay to the system tools that adds very basic routing protocols. It’s encouraging, but feels like a bolt on at face value.
Then there is the issue of the raw OS. It’s just Linux. This statement alone opens up a new vector. Does one need to now worry about having compromised switches? Can my switch be turned into a bit cannon or a warez distribution site? A Tor relay? Hardening a switch or router has always been important, but this feels different to me and how many times has a RE protect ACL or loopback filter been mistakenly forgotten? What happens when there is a kernel exploit? Sure, puppet and cfengine can help with that, but these are also mechanisms and processes that need to be taken into account as far as resources on the switches. How will these extra processes affect the transit traffic? What happens when Linux decides it doesn’t have enough resources and starts killing off processes?
Now, I think there are good precurser to this model that have been around for quite some time. I’d really love to see something like pfSense or RouterOS on a merchant silicon Layer3 switch. The foundations are already there, they are seasoned, well documented and feature rich platforms. RouterOS even has OpenFlow support now and I’d bet money pfSense is looking at integrating SDN of some sort. Both have central management platforms and are Linux or BSD based, so they are very extensible and at their foundation they are Open Source. Both should be portable and most importantly they are more of a intermediate step into a hybrid networking OS.
I know some of this may sound negative, but these are all things that need to be addressed and answered as we move more into the new wave of networking. I am confident that cumulus networks is onto something great with their model. It it too bold? Too soon? I wish I knew.
2 thoughts on “I want to love cumulus networks…..”
Comments are closed.
It will also be interesting to see how they deal with state management in the linux kernel. This has been tried before and struggled to scale due to the high amounts of IPC between processes. While it may make sense in some closet access cases, unless you are built to handle failure, this model will be a challenge to implement and scale. Lastly merchant silicon is constantly changing – don’t underestimate the maintenance need to maintain a consistent working model. Again, I am not saying that this will fail, I like the validation of the need for a highly extensible and programmable OS. Time will tell whether it is as easy as they propose.
When I was looking into a free router/fw appliance I could deploy as VMs, I looked at pfSense, but settled with Vyatta. It has the best of both worlds IMHO, you get a near-native linux experience, bash et al, but, with the benefit of a single hierarchical config file that can be diff’d, scp’d and copied between nodes. If you’ve used JunOS, then you can use vyatta too, I just wish they’d make the break into switching too! – though I guess there’s nothing to stop you loading up an old server with QuadPort Nics and creating loads of bridge interfaces, other than the fact that it’s a cludge 🙂