Networks are insecure, they just are. I once worked with an FBI agent who said that if you wanted a secure network you should turn everything off, smash it, then set it on fire - and I don’t disagree. However, there are things we can do to understand our networks that will aid immeasurably in how they are secured and initially and how they remain secured over time. At their very core networks are not secure based simply on the fact that humans are involved in their construction, operation, and use.  Recently, ICS-CERT released their annual report detailing the state of critical infrastructure and their report details exactly what anyone in the trenches should already know: we don’t pay enough attention to this stuff.  Their assessment pointed out some pretty glaring holes, most of which we probably already knew (or should have) and some of which we may need to spend time learning about. In our never-ending quest for convenient, ubiquitous access to all things, we often forget to think about the little things, the “dirty jobs” of IT. Ive said it so many times I’m almost sick of hearing it myself: building a network is easy, running a network is hard. Strip away all of the shiny, hype machine, marketing generated razzmatazz surrounding all things “cloud” and “SDN” and “intent-driven”. Those things are great and I’m neck deep in a lot of them, but building fancy things on broken foundations is clearly a bad idea, and a lot of what I see in the trenches is just that: broken, neglected, outdated, or just plain done poorly or done well and never maintained. When most people think about maintaining a network ecosystem they probably imagine plumbing vlans, changing peerings, creating LSPs or adjusting switchports. Realistically, that’s a tiny part of operations. In reality, most network issues could either be more easily solved by better understanding of a complex system or, in many cases, avoided completely by knowing your network. When I say “avoided completely”, what I mean is that be knowing the baseline of something you can identify deviations from that baseline which can indicate an issue that needs to be addresses. It’s no secret that I fiormly believe that the key to operating anything in IT is complete visibility and baselining of every facet that can be recorded, and I also assert that you cannot secure something until you truly understand not only how it works but how it fails. Anything less is an exercise in risk management. So, to that end, I will give a brief outline of what I believe is a foundation that we should all start from when building network analytics and monitoring.   A gigantic swath of this issue can be alleviated be three things: Knowing your network, Access control of your network, and attention to your network

  • Network flow data (netflow, jflow, sflow, ipfix)
  • Centralized logging of *all* devices on the network
  • Traffic statistics (interface counters, error counters, etc.)
  • Configuration management of all network devices
  • Template configurations for all commonly configured items (common firewall filters, RE filtering, apply groups, ZTP, VM templates, etc.)
  • DNS query data
  • Latency and jitter information to/from all key locations in the network, including ingress/egress of the network
  • Routing table information (IGP and EGP)
  • optical power readings
  • hardware inventory (linecards, serial numbers, chassis models)
  • Environmental data (heat, power, cooling)
  • Full packet capture at network egress (if possible - ISPs obviously cannot realistically do this)
  • Anomalous policy / signature detection at all strategic locations

OK, now you have the firehose. That’s actually the easy part. Now, once you have this data you have to keep it up to date and keep watching it. Given the nature of what it is, you’re likely going to have to visit several systems to monitor it. There are some “meta” packages that can do a lot of it, librenms can tie together a bunch of these elements, as cam