When a Coast Guard cutter is underway there is an expectation that, if the ship is damaged, the crew will be able to respond appropriately and “fight the ship.” This trained resiliency is well defined in Coast Guard policy and entrenched in the culture of the afloat community. So now, one must ask, why does the service not expect the same standardization and level of performance when it comes to its information technology (IT) systems? Every Coast Guard member, from the newest nonrate to the most experienced officer, has experienced the frustrations and feelings of helplessness induced by IT services. These frustrations are reflected in the thousands of operational hours lost to system failures and the hundreds of hours the C5I Service Center must spend restoring these services. Contrary to the state of Coast Guard affairs, organizations all over the world are leveraging modern IT best practices not only to reliably enable operations, but also to reveal new correlations in data, empowering leaders to make more accurate and impactful decisions. The difference is found in the way the Coast Guard develops and deploys applications.
An application can be thought of as a collection of services provided to an end user. For example, the application CGPortal provides several services, such as the site login function via the common access card (CAC), a SharePoint search, the Enterprise Mission Platform Operational Readiness Dashboard (EMPORD), and even Coast Guard medical care scheduling. The tight coupling of services, where changes to one service can bring down the entire application, is commonly referred to as a monolithic architecture, and the applications built using this methodology are called monolithic applications. The opposite of this philosophy is to break an application into the simplest services it provides and maintain these services independently, which is called microservices architecture, and the applications built using this standard are called microservices. While the difference between these architectures may seem subtle, monolithic applications are more prone to prolonged catastrophic failures because of their complexity. Microservices are the key to success in organizations that require high application availability and the flexibility to quickly tailor their applications to changing operational requirements.
The Coast Guard’s adherence to monolithic architecture principles have led to expensive outages, a lack of accountability, and an inability to evolve in a rapidly changing landscape. The IT services should be modernized to minimize the impact of IT outages and bring an operational mind-set to a legacy domain.
The High Cost of Outages
It is well known that private and public organizations have different priorities when it comes to IT systems and services. A commonly heard argument in IT is that because private businesses need their IT services to generate revenue, “of course they would invest in three nines uptime.” The implication of this argument is that only private organizations need to prioritize service. That is 99.9 percent availability, which translates to 8.77 hours of service outage per year availability, and that public organizations that do not generate revenue only need to “keep the lights on.” While positive revenue makes the math more transparent when it comes to justifying IT expenditures, the effect of “lost revenue” is not limited to private organizations. Consider Coast Guard IT incidents. The USCGC Healy (WAGB-20) had an issue syncing her virtual domain controller, which caused her to turn around while on mission and spend an extra two weeks in port. Looking at the Coast Guard Standard Reimbursable Rates for 2020, the cost of lost operations can be estimated as in Table 1 below:
Given this estimate, the two weeks of lost operational time cost the Coast Guard $8.2 million. Similar calculations can be done to estimate the cost of a Coast Guard–wide Outlook email outage.
Figure 1 shows the estimated cost of a Coast Guard–wide email outage as a function of efficiency loss and time. An efficiency loss of 25 percent indicates that individuals are only able to accomplish 75 percent of the work they would have accomplished were it not for the outage. To demonstrate the severity of this, a 25 percent efficiency loss lasting 12 weeks, in which we estimate members working 40 hours per week, would cost the Coast Guard $246 million. This figure represents more money than the Coast Guard spent on C5I investments over five years, 2017–21. It is also the same amount the Coast Guard spent on fast response cutter improvements in both 2017 and 2018, the same amount spent on national security cutter improvements in 2018–21, or roughly double the amount spent on aircraft investments each year from 2018 to 2021. This is not only a hypothetical; outages of this magnitude do occur.
The goal of these examples is to illustrate the significant costs associated with Coast Guard IT outages. It is important to note that this analysis only considered losses related to operating costs. A more thorough examination would reveal additional costs for procurements stemming from corrective maintenance and include hourly costs to operate Coast Guard assets. It also would capture follow-on effects and situations such as cascading outages in which the loss of a single application results in the failure of other applications.
Accountability
Another consequence of continued adherence to monolithic architecture principles is a lack of accountability when it comes to IT outages. The main drivers for this phenomenon are the complexity of monolithic applications and the turnover of skilled technical staff. Organizations that maintain and rely on IT systems for operations routinely experience a never-ending cycle of plan-to-acquire/develop-a-new application; stand the system up, often with the help of outside contractors; use the system; support contract expires; experienced IT staff who helped set up the system leave the organization; system experiences a catastrophic outage. At this point, for monolithic applications, much of the institutional knowledge gained when the system was first implemented is now gone. Suddenly the question, “Who knows how to do <insert task>?” is met with the reply, “That was Engineer Y, but he/she no longer works here.”
Systems that, oftentimes, were deployed over a decade prior contain little-to-no documentation and have expired support contracts. This is a problem all too familiar to members of the C5I community, and is a pattern seen time and again with complex systems, most notably in which the consequences are deadly (e.g., New Orleans’ levees during Hurricane Katrina, Deepwater Horizon, Boeing 737 Max, Space Shuttle Columbia, etc.). There is a common thread here: Complexity obscures sources of risk, and the first step to mitigating risk is to identify it. Monolithic systems suffer from the same problems introduced by complexity, namely that it obscures sources of risk, which makes it difficult to hold engineers and managers accountable.
Technological Progress
Technology is an ever-shifting landscape, but not all fields of technology progress at the same rate. The rate of technological change is sometimes measured on a national level, by GDP, or by hourly productivity-based metrics. In the field of finance, a metric called the “Solow Residual” assumes that all changes in output that cannot be attributed to either capital stock or wages are a result of technological progress. These metrics rely on the production of capital, something that is difficult to relate to for a governmental organization. For the purposes of this article, technological change is defined as the development of new equipment that must necessarily be adopted by competitor organizations for them to continue to operate effectively.
Organizations in the computer industry are forced to continually implement new technologies at a faster rate than almost any other industry, otherwise they will fail to compete. Similarly, it is not reasonable to expect ship or aircraft designs to follow the same rate of technological change as computers and IT services. It is important to recognize this difference as new industries, such as cybersecurity and artificial intelligence, also will fall somewhere on this technological-rate-of-change spectrum.
Monolithic architecture principles can conflict with the ability to keep pace with technological change. This is apparent in the field of cybersecurity, in which technological advances occur every three to five years. Current Coast Guard acquisition practices cannot keep pace with demand. In addition, systems designed using monolithic architecture principles have dedicated databases that are not created to share their contents across applications. The resulting data silos hinder correlation in security investigations, collaboration between teams in large organizations, and data-driven decision making. Any attempt to remedy these faults without changing the underlying architecture can fall victim to unforeseen failure modes caused by the complexity of monolithic systems.
A Shift to Microservices
It has been established that monolithic systems present a level of complexity that encourages IT service outages, and, over time, these outages cost the Coast Guard hundreds of millions of dollars in operating costs. In addition, the complexity of monolithic systems makes it difficult to hold system owners accountable for their performance and creates data silos, reducing organizational flexibility and the ability to leverage data when making decisions. There is, however, a solution: A shift to microservices.
Microservices architecture principles stand opposite to those of monolithic architecture. The basic idea is that every application is comprised of many services and that these services can be hosted and maintained independently if the essential inputs to and outputs from the services are properly managed.
By splitting a monolithic system into multiple services, each service can be built, configured, and hosted in a way that makes the most sense for that service, all without affecting the operations of other services to which it is connected. This is something that is impossible with monolithic architecture.
Other benefits of shifting to microservices are flexibility and resiliency. With each service maintained separately, the overall complexity is reduced, which enables application developers to more confidently make improvements. It also affords additional resiliency as it enables containerization and automatic management of containers. This means many different versions of a service can exist at any one time while only one is in production. If, for whatever reason, the service crashes, the service can seamlessly failover to its last working version. This failover occurs in seconds and would be barely perceptible (if at all) to the end user. In addition, IT service staff can train for and test the resiliency of containerized microservices analogously to how Coast Guard members run damage control drills on a ship. There are well-developed tools to simulate service outages that allow for the development of robust, automated incident response—something impossible to do with monolithic architecture.
Microservices also reduce complexity behind system ownership. A developer or team of developers can own a service. It is straightforward to hold someone accountable for a service failure. If the service does not handle inputs properly or produce properly documented outputs, that is a trackable metric for which the developer alone is responsible. With dedicated developers, services can also be improved more quickly and even provide an operational edge to Coast Guard commanders. Suddenly, it becomes possible to deploy an application that meets the needs of operators in the Pacific area, while deploying a different version of the same application to the Atlantic area. Services can be subscribed to on an as-needed basis, and the issue of data silos is reduced by the improved development-operations cadence. These improvements are especially relevant in the field of cybersecurity, in which capabilities are directly tied to the rate at which new technologies can be deployed.
The Coast Guard is experiencing real costs associated with not adhering to modern IT best practices. A shift to microservice architecture would enable a more forward-leaning posture that is responsive to the needs of operational commanders. Large organizations have been using microservices for more than a decade, and the Coast Guard must do the same.