As the threat landscape expands, federal agencies are preparing to enhance resiliency in their systems to protect the growing amount of critical data that systems generate.
Keeping systems resilient is about more than security.
“There are so many challenges today with system resilience. Of course, a lot of people tie it to cybersecurity. In some sense, systems resilience is a much larger issue than just cybersecurity. There are a lot of things that go into the development of a system that can help make it resilient,” Ronald Ross computer scientist at the National Institute of Standards and Technology (NIST) told GovCIO Media & Research in an interview.
NIST defines system resilience as “the ability to quickly adapt and recover from any known or unknown changes to the environment through holistic implementation of risk management, contingency and continuity planning;” however, the rapid growth of the IT landscape has made it especially challenging for agencies to keep up with the pace of change.
The IT landscape is also experiencing a rise in threats that impact resilience like cyberattacks, operational stress, and compromised or vulnerable systems.
“Every piece of software, hardware and firmware adds to the complexity of those systems. The more things you have as part of your system, you're increasing the level of attack surfaces," Ross said. “The real issue is how do agencies deal with that complexity? How do they manage it? How do they reduce it? What can they do to strengthen those systems?”
Ross explained most federal agencies have a “one-dimensional strategy” when it comes to cybersecurity, meaning they focus on the perimeter of their IT systems. Because of the rise of complex cyberattacks, agencies are now faced with shifting to a what Ross calls a “multi-dimension protection strategy,” where agencies assume adversaries are undoubtedly going to infiltrate a system.
“Penetration resistance is the first dimension — you stop them if you can. The second dimension: limit the damage the adversaries can do if they get into your system,” he said.
Guidance, Strategies to Improve Resiliency
Both NIST and the Cybersecurity and Infrastructure Security Agency (CISA) have developed guidance to help agencies and organizations bolster IT resiliency.
A handful of publications from NIST outline standards around engineering trustworthy, resilient and secure systems, including SP 800-160 volumes 1 and 2 and SP 800-53. The guidance addresses the importance of zero trust and virtualization to effectively build out a “multi-dimension protection strategy.”
“The zero trust architecture concepts are shrinking that large external boundary, and they're putting boundaries around smaller groups of resources,” Ross said. “The eventual goal is to limit your ability to move laterally through your system, get to someone else's system and have these very devastating transitive attacks.”
The flagship publication outlines how organizations can integrate security into a system engineering process to fundamentally build security in front of the initial design phase, what Ross calls “secure by design.” The second iteration focuses on cyber resiliency goals, objectives and specific techniques to achieve resiliency.
“[SP 800-53] has a whole assessment guideline to assess the controls that would tie back to see if that is helping your system be more resilient. You can't talk about this in slices or stovepipes. You must look at the problem on a macro level. This just produced a very large set of guidelines that are interconnected. They’re mutually supportive of one another to measure whether systems are resilient,” Ross said. “The ability for you to flush out malicious code very quickly in those segments and those in virtualizations, you're able to refresh that code very rapidly. If there is a cyberattack, and it's malicious code, you can clean it out and bring it back to a known secure state."
CISA leads a program, the "Resilience Planning Program," supporting systems resilience and security for critical infrastructure. Being resilient, CISA advises, involves four components:
- Preparing by planning, acquiring knowledge, and conducting training and exercises.
- Adapt by adjusting to new conditions and circumstances.
- Withstand by remaining unaffected by disruptions.
- Recover by returning systems to normal after a disruption.
When it comes to measuring the efficiency and resiliency of its own systems, CIO Robert Costello said the agency turns to automation to manage its metrics.
“CISA utilizes automation in many systems to manage metrics on these and other critical elements. CISA is also investing in technology for active application monitoring and expanding usage of cloud environments to improve our resiliency,” Costello said.
DOD, 5G and Resiliency
Future military success depends on the Defense Department’s ability to develop and sustain responsive systems to support increasingly complex and dynamic missions, while keeping pace with evolving threats. DOD leadership is managing and mitigating these risks by developing and procuring more resilient systems.
As DOD transitions to 5G — the next version of a mobile network that promises to transform the military's operations — it has to ensure the systems it is deploying remain secure and resilient within in the new technical environment.
For the DOD, the nascent technology will increase the efficiency of logistics operations, enhance the security of both military and commercial systems, improve command and control survivability and bring the ability to make quick decisions.
Director of the Operate Through 5G Initiative Dan Massey measures whether the system is resilient based on several factors, including whether the system is available when needed and whether it performs to the degree expected to perform.
Specifically, metrics, such as quality of service (QoS), latency, bandwidth and jitter, among others, allow measuring the network's ability to provide the level of service needed on the battlefield or throughout the supply chain and logistics network.
It is equally important to adapt to the system's inevitable disruptions — like adversaries trying to disrupt the system or normal day-to-day operations — absorb the adverse impacts and quickly bounce back.
"Resiliency is reliability. Is it there when I need it to work at the level I want, and how quickly does it recover from what will inevitably be faulted, whether those faults or attacks or glitches or whatever you might call them — is that recovering that reliability? So those are my two big metrics," Massey told GovCIO Media & Research.
Yet, it depends on the application used, since a system can remain extremely resilient even when users experience issues with latency.
"In a case where it's not really a latency-sensitive application, I will still consider that to be a resilient network. Even though the latency is double, triple order of magnitude off of what I expect it to be — if latency doesn't matter," Massey said. "We want to start with that reliability, ability to recover, and then we want to tune it for the particular application."
The Pentagon's joint FutureG & 5G Office is also kicking off a new project called Resilient Comms that will provide multi-path solutions to deliver better results and improve reliability.
When disruptions inevitably occur, whether a public 5G network happens to experience issues or electronic warfare happens to take one of the systems down, the multi-path solution will allow quick recovery from failure.
"Could I build a truly resilient system when I have these multiple paths? Let's send it over our corporate network, and over the public Wi-Fi, and over a commercial 5G network, and over our sat phone. And can I make it so that you and I don't really know or care about that? Behind the scenes, some smart system is taking that message and sending it over all four paths," Massey said. "All I know is the video and audio working good behind the scenes."
Resilient and Intelligent NextG Systems
The National Science Foundation (NSF) launched its Resilient and Intelligent NextG Systems (RINGS) program in 2022 to accelerate research, drive innovation and increase U.S. competitiveness in NextG networking and computing technologies.
The program is a partnership between the Office of Undersecretary of Defense for Research and Engineering, NIST and nine industry partners, born out of the recognition that networks have become an essential utility for society and the economy to function. Computing and communication networks have to be robust and resilient to disruption, intentional or otherwise, without sacrificing performance significantly.
“We've come to really rely on networks, networking computing systems, extensively,” Murat Torlak, program director of the Division of Computer Networking Systems at NSF, told GovCIO Media & Research in an interview. “The program essentially seeks innovations to enhance not only resiliency, but also the performance across various aspects of next generation communication, networking and computing systems.”
The program houses 41 projects that seek to enhance resiliency of the networks defined as a combination of robustness, autonomy, adaptability and security. The program’s goal is to capture resiliency challenges at a fundamental level, then absorb them into what NSF calls “research vectors” to serve as building blocks of any successful network.
“It may take some time for the rest of the community to catch up, but the important thing is that the results will be out there and available for everyone to look at and benefit,” Torlak said.
Systems should be engineered to ensure there is a vital minimum service to support safety and emergency services. Autonomy, adaptability and security are the important aspects of resiliency. The network system should be able to react to any potential disruptions to the network within millisecond timescales, Torlak explained, and AI could help.
"Some of the proposals proposed to use AI to ensure autonomous operations — to self-manage based on perhaps such disruptions in the past — how this happened in the past, what was the best way to handle it? But then the AI algorithms must learn, right? So that's one way that networks can be designed with AI. Adaptability is a key design principle,” Torlak said.
Security is another key operational principle. NSF is integrating continuous search and detect capabilities to identify unauthorized access and provide security assurances.
“Resilient network systems can have detection mechanisms that leverage different challenge models because not every challenge will be the same and can have different recovery strategies for such systems under different challenges ... then develop an operation mechanism that would involve defensive measures [and] dynamic adaptation strategies,” Torlak said.
The RINGS program not only aims to improve network performance, which can be measured via connection speed, latency and coverage, but also keep the network resilient to expected or unexpected issues in the network. But resiliency metrics are less established than performance metrics. Moving forward, NSF will experiment with new measures of resiliency to test networks.
“RINGS researchers have proposed to use experimental metrics like how fast the network when challenged (if you challenge networks via some disruption) can recover from a degraded service to an acceptable level of service to normal level of service. When you disrupt the network, it must protect itself, ... but then how fast can it come back to a normal level of service from that state? It is not that easy to judge this. It is an important issue to design resilient systems so there could be other innovative metrics that our community may propose, as they make progress in their RINGS projects,” Torlak said.