Every massive corporate IT disaster usually begins with a minor inconvenience. Six months ago, the CEO was unable to join a Zoom call for four minutes because of a brief ISP blip. This localized annoyance triggered a sweeping, enterprise-wide panic. The directive came down from the Ivory Tower: the enterprise must immediately achieve Five NinesFive NinesA mathematical impossibility promised by sales teams that guarantees you will be paged at 2:00 AM on Thanksgiving. of uptime. We must completely eliminate every Single Point of FailureSingle Point of FailureYou. Specifically you, when you try to take your first week of PTO in two years..
Thus began the "Strategic Resiliency and Business ContinuityBusiness ContinuityPretending a dusty, untested backup server running Windows 2008 in a remote closet will magically save the company from ransomware. Initiative."
For the next three months, the engineering team was pulled off all actual project work. Instead, we sat in endless steering committee meetings, generating heavily branded Visio diagrams for the board. We drafted a massively expensive plan to build a completely redundant, highly available network architecture. We bought duplicate next-generation Palo Alto firewalls, deployed them in active/passive High AvailabilityHigh AvailabilityPlugging two firewalls into the same unstable power strip and calling it a robust redundant architecture. pairs, and upgraded all our SD-WAN edge routers. We spared no expense to build a Best of BreedBest of BreedA marketing buzzword used by the CTO to justify why we spent our entire annual budget on a single appliance. fortress.
But a router cannot route what it cannot reach. So, to ensure true resilience, the purchasing department was tasked with securing Carrier DiversityCarrier DiversityPaying two different ISPs who secretly lease the exact same physical fiber line running into your building..
The Checkbox of Carrier DiversityCarrier DiversityPaying two different ISPs who secretly lease the exact same physical fiber line running into your building.
To the executive team, redundancy is just a line item on a spreadsheet. They mandated that we purchase two separate 10Gbps fiber circuits from two completely different telecommunications vendors.
Vendor A was our primary carrier. Vendor B was brought in as our failover. The procurement team proudly presented the contracts to the C-suiteThe C-SuiteThe people who approve a $5M cloud migration but deny your request for a $50 keyboard., demonstrating how they negotiated a strict, penalty-backed SLASLAA metric we are currently failing, but will creatively report as 'green' by redefining the outage. (Service Level Agreement) with both providers. Management gave each other high-fives. They achieved cross-departmental alignmentAlignmentForcing everyone to nod on a Zoom call so no single individual takes the blame when it fails.. They checked the box. We were now a fault-tolerant enterprise.
What the procurement team didn't understand, and what the executives refused to hear, is that corporate telecommunications is essentially a giant shell game.
The Layer 1 Reality Check
You can buy the most expensive, AI-driven, cloud-managed SD-WAN overlay on the market. You can spend $40,000 a month in meeting payroll just to talk about your Business ContinuityBusiness ContinuityPretending a dusty, untested backup server running Windows 2008 in a remote closet will magically save the company from ransomware. posture. But at the end of the dayAt the end of the dayI am tired of arguing with you, so I am just going to state my opinion as a definitive fact., the internet is just a physical piece of glass buried in the dirt. This is the inescapable reality of Layer 1 infrastructure.
When you buy a circuit from Vendor B, they rarely dig their own trench to your building. It is vastly cheaper for them to simply lease the "last mile" of fiber from Vendor A.
So, while the billing dashboard showed two different vendor logos, and the executives slept soundly knowing they had purchased total redundancy, the physical reality was entirely different. Both the primary circuit and the "redundant" failover circuit entered our building through the exact same conduit, sitting right next to each other in the exact same PVC pipe, buried three feet under the street outside our lobby.
We didn't buy Carrier DiversityCarrier DiversityPaying two different ISPs who secretly lease the exact same physical fiber line running into your building.. We just paid two different companies for the privilege of sharing the exact same vulnerability.
The Incident
The multi-million-dollar resilience architecture met its ultimate match on a Tuesday afternoon. It wasn't a sophisticated nation-state cyberattack. It wasn't a zero-day vulnerability.
It was a local municipal water authority contractor named Gary, driving a rented John Deere backhoe.
Gary was trying to fix a water main. Gary misread the utility markings on the pavement. At 2:14 PM, Gary dragged a heavy steel bucket through the earth and effortlessly severed the primary fiber line. Milliseconds later, the SD-WAN appliances detected the drop and flawlessly initiated the failover sequence to Vendor B.
But Vendor B’s fiber was in the exact same trench. Gary had severed that one, too.
In a fraction of a second, the entire enterprise went dark. The remote workers were severed from the data center. The SIP trunks collapsed. The automated warehouses came to a grinding halt. The highly available Palo Alto cluster sat in the server rack, perfectly synced, completely healthy, and absolutely useless.
The Root Cause AnalysisRoot Cause AnalysisA 10-page document written to politely explain that an intern unplugged the core router to use a vacuum cleaner. Theater
The fallout was spectacular. The executive team immediately convened an emergency "War Room" bridge to demand actionableActionableA buzzword used to reject a perfectly good report because the boss didn't want to read it.-insights/" class="jargon-tooltip-link relative group text-neon underline decoration-dim decoration-dashed underline-offset-4 cursor-help font-bold">actionableActionableA buzzword used to reject a perfectly good report because the boss didn't want to read it. insightsActionableActionableA buzzword used to reject a perfectly good report because the boss didn't want to read it. InsightsA meaningless executive phrase used to reject a highly accurate 50-page technical report because it didn't contain enough colorful pie charts.. How could this happen? We spent a quarter of a million dollars on redundancy! We have an SLASLAA metric we are currently failing, but will creatively report as 'green' by redefining the outage.!
I spent forty-five minutes on a muted microphone listening to a VP of Infrastructure threaten to sue Vendor B for breach of contract. I tried to explain that an SLASLAA metric we are currently failing, but will creatively report as 'green' by redefining the outage. is just a financial refund policy, not a magical forcefield that protects glass from heavy construction equipment. If someone digs up the street, a 5% credit on next month's bill doesn't bring the network back online.
Instead of accepting the physical reality of the telecom monopoly, management decided the only logical path forward was to launch a new, intensive committee. We now have a weekly, two-hour "Vendor Risk Mitigation" meeting. We are burning thousands of dollars in payroll every Friday to aggressively debate telecommunications policy with people who don't know what a demarc extension is.
We didn't fix the redundancy issue. We just scheduled more meetings to talk about it.
The next time your executive team spins up a multi-month, cross-functional task force to discuss cloud resiliency, remember the backhoe. You cannot out-architect physics, and you cannot fix physical infrastructure with a PowerPoint presentation.
Curious exactly how much capital your company is burning in meetings while trying to engineer around a guy with a shovel? Stop guessing. Calculate the exact financial damage of your next Business ContinuityBusiness ContinuityPretending a dusty, untested backup server running Windows 2008 in a remote closet will magically save the company from ransomware. planning session with the Corporate Burn Rate Calculator.