Full disclosure: I've been employed by several companies that were customers and/or vendors of SolarWinds. However, I have never been employed by SolarWinds and I was not compensated for this post.
On December 13th, digital security firm FireEye published a post to their blog with the comprehensive title "Highly Evasive Attacker Leverages SolarWinds Supply Chain to Compromise Multiple Global Victims With SUNBURST Backdoor". The post identified a digitally-signed component of the Orion software, SolarWinds.Orion.Core.BusinessLayer.dll, that contained a backdoor. Multiple signed updates contained additional malware. Traffic from infected hosts was disguised using traffic resembling normal SolarWinds activity and avoided using IPs that were part of non-U.S. netblocks or assignments registered to "bullet proof" hosts that are frequented by criminals.
Orion's compromised distribution platform was then leveraged to infect a wide variety of organizations. According to FireEye, the "victims have included government, consulting, technology, telecom and extractive entities in North America, Europe, Asia and the Middle East".
SolarWinds was founded 21 years ago by brothers David and Donald Nonce around a pair of network monitoring products named "Trace Route" and Ping Sweep. The programs were just what they sounded like - simple ICMP packets to identify hosts. The novelty came from their availability within Windows within a GUI and these features would be a mainstay of SolarWinds and an early differentiator from linux-based products that tended to offer better functionality but a higher bar for usability. Over time, SolarWinds began to integrate more advanced monitoring capabilities, for example by using SNMP. SolarWinds offered an easy-to-install, Windows-based way to monitor lots of servers. The company grew quickly and has remained aggressive in its acquisition of smaller firms to introduce additional functionality to their monitoring products.
Before virtualization hit the scene, one of the big value adds for system administrators were applications that allowed for centralized management of server OS and applications. Creating, managing and fixing the often highly-customized systems that allow for that centralized management to happen is a large part of the system administrator's job. Monitoring systems like those offered by SolarWinds are a logical place to integrate that kind of functionality; monitoring software was already designed to communicate with large numbers of servers using protocols such as SNMP. Even better, monitoring systems tend to be modular i.e. supporting custom plugins for monitoring custom applications. The same plugin structure is used for management purposes. Eventually, companies began offering software services to replace these custom data center control systems and it is in this context that Orion should be understood. The SolarWinds product page explains that Orion provides "centralized monitoring and management of your entire IT stack, from infrastructure to application". Furthermore, Orion is marketed as an enterprise-scale application that can monitor or control "400,000 elements on a single Orion Platform instance" (an element is a single network node, interface or volume ... a single server of virtual machine may contain many elements, but even with this being the case the software is designed to support environments of 1000+ servers).
Arguably, a system like this violates some of the principal rules of data security best practices. Like many people, one of the first things that I do when I install linux on a new computer is that I set up a new user account for myself; I then disable the root account (or at least its access to SSH). The idea is that, even when I am the only person using the computer, I am always trying to use the least access necessary to accomplish the task at hand. This isn't an idea specific to user account management or even security best practices. For example, minimizing access also forces users to evaluate their decisions prior to committing to them. The "Do you REALLY want to do this? Y/N" prompt isn't necessarily a part of user interfaces for security purposes, but you are still minimizing access to resources, making the user jump through a hoop.
SolarWinds repeatedly makes it clear that Orion is designed to be a
"full stack" solution that offers monitoring and management of all of
the OSI layers. But products like Orion, by centralizing so many functions, make it more likely that user error, system failure or security breaches have devastating, systemic consequences. It is being reported that a SolarWinds update server could be accessed using the password "solarwinds123"; breaking into the update servers was a key part of the Orion hack. Assuming this is true, I would bet a dollar that password was an internal placeholder and some poor technician just forgot to reset it before placing it production. But it is would also mean that SolarWinds wasn't doing their diligence in terms of security auditing. Apparently, SolarWinds was warned of the password issue by a researcher, but it also would not have been difficult to set up SolarWinds' own monitoring software to check for insecure password hashes.
Outside of data center administrators, few people even in the IT industry will have much experience with products with Orion. But that doesn't mean that you or your business is not impacted by this compromise. The list of organizations that use Orion includes a laundry list of US Federal government agencies, including the Department of Homeland Security and the Centers for Disease Control. There is reason to believe a state actor - most likely Russia - was behind the attack. A fair number of hosting providers and data centers use Orion; Orion may not be hooked into your website, but the bare metal running the VM that hosts your website could very well be hooked into Orion. The list of effected entities I've seen recently numbers around 18,000 - but many of those entities host infrastructure responsible for thousands of other organizations. As a result, it may not be possible to have a full appreciation for the scale of this attack at the moment.
FireEye's research indicated the compromise has most likely been in place since March, or nearly nine months. This is both an extremely long and an extremely short period of time. It is a long period of time in that it is extremely difficult to keep a compromise of this depth and breadth alive for this long, and an enormous amount of information can be obtained in that period of time. It is a short period of time in that the sophistication needed to keep the operation going also limited the amount of data that could be obtained. Massive increases in network throughput would (hopefully) have been identified sooner by traditional intrusion detection systems. This means it is more likely they were looking for private encryption keys, SSL certificates, things that are relatively small, easy to obfuscate, and that can be re-weaponized to obtain access to additional systems. It is less likely they were scraping patient health data from the CDC.
I wish I could wrap this up with some quick recommendations that could have avoided this situation, but I'm not sure I have any easy solutions for this one. Everything is driving the industry toward further consolidation; admins are responsible for more and more devices per staff member. COVID and the remote working explosion it has caused are driving this. There has been some attention on how working from home has introduced systemic strain to the "last mile" of internet service. All of those miles of beautiful fiber and metro-ethernet that has been run to commercial real estate over the years has been replaced by 1,000 shaky home broadband connections. The success that ISPs have had in adjusting their networks to this unprecedented change in user behavior is one of the few brightspots in the US response to the disease, and the government had nothing to do with it.
But even before the epidemic, there has been a general assumption that the ratio of managed devices to administrator staff will simply continue to increase, forever. Why even have a skilled systems administrator on staff at all, when it is possible to have a control panel system installed for a one-time fee that entry-level employees could be trained on the job to use? This sort of centralization is essentially automating the most difficult tasks.
Ultimately, Orion exemplifies the risk in this approach. Unskilled, inexperienced staff can be easily trained for single products, but will not be able to think critically about how that product works in order to improve it, fix it, or secure it. Likewise, increasing the workload of skilled workers increases the risk of careless errors that have real costs. I'm afraid there may not be an "app for that".