Back in November, there was a piece on KrebsOnSecurity about the Cloudflare outage — particularly companies that chose to bypass Cloudflare entirely to get their services back online.
I wrote an internal analysis / lessons learned and sent it to my IT Peers on it at the time. Over the past month it’s come up in a few conversations, and this week while working on some blog posts for January, it surfaced again. There’s an angle here I think got missed in the initial coverage.
So here’s my read on it, with a month of distance.
(Cloudflare published a detailed post-mortem of the incident, which is worth reading for the technical depth. What follows is not about Cloudflare’s response — which was transparent and thorough — but about what happened downstream when companies and SaaS vendors chose to bypass Cloudflare entirely during the outage.)
The Operational Decision That Made Sense
Operationally, bypassing Cloudflare made sense in the moment. Website’s down, customers are waiting, business is bleeding. Route around the problem and get back online. Nobody’s going to argue with the urgency.
According to reporting from KrebsOnSecurity, there was roughly an eight-hour window when several high-profile sites decided to bypass Cloudflare for the sake of availability. Some companies were able to pivot away temporarily; others couldn’t because their DNS was also hosted by Cloudflare or because the Cloudflare portal itself was unreachable.
But here’s what kept sticking with me: Cloudflare wasn’t just a CDN or performance layer for a lot of these companies. It was a significant part of their defense-in-depth strategy.
And here’s the critical nuance that I think matters: Cloudflare isn’t actually a single layer of defense-in-depth. It’s a concentration of multiple security controls delivered through a single platform.
What Actually Got Removed
When you pulled Cloudflare out of the path — even temporarily — you didn’t just remove “a layer.” You removed several interacting controls at once:
- DDoS mitigation (L3/L4/L7)
- Bot management
- Rate limiting
- WAF rule enforcement
- Request normalization and sanitization
- TLS termination and policy enforcement
- IP reputation filtering
- Geo-based access controls
- Abuse and anomaly detection
If you didn’t have a mature, well-tuned WAF of your own sitting behind Cloudflare — and more importantly, if you didn’t have comparable controls for rate limiting, bot detection, IP reputation, and request scrubbing — you may have just exposed yourself to multiple attack vectors simultaneously. Attack vectors that had been quietly mitigated for years, to the point where you forgot they existed.
As Aaron Turner from IANS Research pointed out to KrebsOnSecurity: “Your developers could have been lazy in the past for SQL injection because Cloudflare stopped that stuff at the edge. Maybe you didn’t have the best security QA for certain things because Cloudflare was the control layer to compensate for that.”
That’s the risk of outsourcing security controls without understanding what you’re outsourcing.
Those controls compound each other. They’re designed to work together. Losing them together is far more dangerous than losing a single, isolated control.
And here’s an important nuance from Cloudflare’s post-mortem: the outage impacted their Bot Management system and caused widespread HTTP 5xx errors across their core CDN and security services. But not all of Cloudflare’s protections failed at the same time. When vendors bypassed Cloudflare entirely to restore service, they weren’t just removing failed protections — they were removing all of Cloudflare’s protections, including DDoS mitigation, WAF rules, and rate limiting that were still functioning.
The Questions That Should Have Been Asked
So the question becomes: did anyone pause to think about that in the moment?
Or did they just act?
Did Security have a say in the decision to bypass? Did they understand they were dropping multiple layers of protection at once? Did they have equivalent controls ready to absorb the gap — not just a WAF, but rate limiting, bot detection, abuse monitoring, and more?
Or was the decision made in a war room where Security wasn’t even present?
The Structural Problem at Smaller SaaS Vendors
Or — and I think this is closer to reality for a lot of SaaS vendors — was there no separation between the person making the operational decision and the person responsible for security?
Because here’s the thing: SaaS vendors come in all shapes and sizes now. DevOps, DevSecOps, small engineering teams where the “senior” engineer is also the security person. Or even smaller vendors with outsourced vCISOs who aren’t involved in real-time operational decisions at all.
When the person responding to an outage is wearing both the operations hat and the security hat, where does their mind default to under pressure?
Can they even think about security and operations simultaneously in that moment? Or does “get the service back online” override everything else because that’s the immediate, visible, measurable problem in front of them?
What Probably Actually Happened
I suspect in a lot of cases, Security wasn’t bypassed because someone made a conscious risk decision. Security was bypassed because the person making the call didn’t have the bandwidth — or the organizational structure — to think about it as a security decision at all. It was purely operational.
And here’s the kicker: a lot of those vendors probably never did an internal RCA on their own actions. The root cause was “Cloudflare outage” — pointed finger, case closed. They never analyzed the downstream implications of their bypass decision. They may not have even realized they collapsed multiple layers of defense-in-depth — those layers had been invisible to them all along.
That’s not malicious. That’s just reality for a lot of smaller SaaS providers operating without dedicated security staff or mature incident response processes.
But it doesn’t change the risk their clients just inherited.
The Pressure Is Real, But So Are the Implications
I’m not being naive here. Business continuity pressure is real. Uptime SLAs are real. Executive pressure during an outage is very real. But bypassing a constellation of critical security controls without understanding the downstream implications is exactly how well-intentioned decisions introduce avoidable risk.
And I suspect in many cases, the decision wasn’t “we understand the risk and we’re accepting it.” It was “get the site back up” — and nobody stopped to think about what they were removing in the process, because there was nobody whose job it was to stop and think about it.
This is a reminder that defense-in-depth only works if every layer is understood, maintained, and included in decision-making during an incident. And when those layers are consolidated into a single vendor platform, it’s easy to forget just how much you’re actually relying on until it’s gone.
The SaaS Vendor Problem
But here’s the part that really stuck with me — and the reason this kept surfacing over the past month.
What about the SaaS providers who also use Cloudflare — especially the ones serving critical industries like Financial Services?
Did they bypass Cloudflare to restore service? Almost certainly some did.
Did they communicate that decision to their clients? Did they explain the risk they were accepting on behalf of those clients?
Or were customers never informed that multiple major security controls — controls they assumed were present because they had been for months or years — were suddenly bypassed during an outage window?
Here’s where it gets even messier: a lot of SaaS vendors don’t explicitly tell their clients they’re using Cloudflare. They just list the security controls in their sales deck or RFP response: “We have DDoS protection. We have WAF enforcement. We have bot mitigation. We have rate limiting. Robust security.”
And they do — because Cloudflare is providing it.
But the client doesn’t necessarily know that. They assume those controls are baked into the vendor’s architecture. They breeze through due diligence because the vendor checked all the boxes. The security questionnaire gets approved. The contract gets signed.
And then one day, the vendor bypasses Cloudflare to restore service during an outage — and suddenly, those security controls the client thought were intrinsic to the platform? Gone. Temporarily, maybe. But gone.
The client had no idea those capabilities were outsourced. They had no idea they were dependent on a third-party service. And they had no idea that “restoring service” meant removing DDoS protection, WAF enforcement, and bot mitigation all at once.
That’s the piece that keeps surfacing in conversations.
Cloudflare themselves were transparent about the November outage — publishing a detailed post-mortem with root cause analysis, timeline, and remediation steps. That’s the kind of communication you’d expect from a mature infrastructure provider.
But that transparency doesn’t extend to what their customers did in response. Did those SaaS vendors communicate to their clients that they bypassed Cloudflare? That they temporarily removed the security controls they’d been relying on for months or years?
Most likely, no.
The Visibility Gap
Financial institutions perform vendor due diligence under the assumption that their SaaS providers’ architectures remain stable unless there’s a formal change, a review cycle, or some kind of communication (an expectation reinforced by regulatory guidance on third-party risk management). If a vendor quietly removes DDoS protection, WAF enforcement, bot mitigation, and rate limiting all at once during an incident, that changes their risk profile immediately. But unless the vendor is transparent enough to say it out loud, the client has no visibility into that decision.
And here’s the longer-term question:
What happens if, in the coming weeks or months, one of these SaaS vendors announces a breach?
What if the root cause turns out to be traceable back to mid-November — to the days or hours where they bypassed Cloudflare and exposed themselves in ways they hadn’t in years?
Will anyone connect the dots?
Or will it be one of those hindsight moments where someone finally realizes: “Oh. That’s how they got in. We dropped DDoS protection, rate limiting, and bot detection all at once to restore uptime, and an attacker walked right through the gap.”
And attackers were watching. As Turner told Krebs: “Let’s say you were an attacker, trying to grind your way into a target, but you felt that Cloudflare was in the way in the past. Then you see through DNS changes that the target has eliminated Cloudflare from their web stack due to the outage. You’re now going to launch a whole bunch of new attacks because the protective layer is no longer in place.”
According to Cloudflare’s timeline, the outage lasted roughly 5.5 hours, with severe impact for about 3 hours. But the window where companies bypassed Cloudflare was reportedly around 8 hours. If a SaaS vendor bypassed Cloudflare around noon UTC and waited until evening to restore it (to be safe), that’s potentially a 6-8 hour window where they were operating without DDoS protection, WAF enforcement, bot mitigation, and rate limiting.
Six to eight hours is a long time for an attacker scanning for newly exposed infrastructure. And this was a highly publicized incident happening in real-time, not a silent configuration change. As Turner told Krebs, attackers tracking specific targets could see through DNS changes the moment Cloudflare was removed — and immediately launch attacks they’d been planning but couldn’t execute while Cloudflare was in the way.
Cloudflare’s post-mortem explicitly states the outage “was not caused, directly or indirectly, by a cyber attack or malicious activity of any kind.” It was a technical failure in their Bot Management system.
But when SaaS vendors bypassed Cloudflare in response to that technical failure, they may have inadvertently created a security exposure that could be exploited by actual attackers. That’s the risk of making operational decisions under pressure without thinking through the security implications.
The Liability Question Nobody’s Asking
And then there’s the question that really interests me: where does the liability land?
If a client opened a support ticket demanding the service be restored immediately, does that constitute implicit approval to drop security controls? If it’s documented in the ticket — “customer requested immediate restoration” — does that shift accountability back to the client?
Or is the vendor still responsible for explicitly communicating the trade-off: “We can restore service, but it means removing DDoS protection, WAF enforcement, bot mitigation, and rate limiting. Do you accept that risk?”
I’m not a lawyer, but I’d be fascinated to see how this plays out in litigation if it ever gets there. How would a plaintiff’s attorney position this? How would defense counsel respond?
Because we’re already seeing a shift in how breach litigation unfolds. It’s not just the company that got breached anymore — it’s the company and their SaaS vendor, named together in the lawsuit. The argument being: the vendor was responsible for securing the data, the client was responsible for choosing and overseeing the vendor, and both failed.
If a breach traced back to a Cloudflare bypass during this outage, you could see arguments going both ways:
Plaintiff’s side:
“The vendor removed critical security controls without informing the client, materially changing the security posture the client relied on during due diligence and contracted for. The vendor failed in their duty to protect the data entrusted to them.”
Defense (vendor) side:
“The client demanded immediate service restoration. We documented the request. Restoring service required architectural changes. The client’s demand implicitly accepted the operational trade-offs necessary to meet their requirement.”
Defense (client) side:
“We requested service restoration, not security degradation. We had no visibility into the vendor’s architecture. We were never informed that ‘restore service’ meant ‘remove DDoS protection and WAF enforcement.’ The vendor should have communicated the security implications before acting.”
I genuinely don’t know how that shakes out. But the ambiguity alone is a problem.
Because right now, most SaaS contracts don’t clearly define who’s accountable when operational decisions during an incident materially change the security posture. And until we see some case law or regulatory guidance, both sides are operating in a gray area that could get very expensive to navigate after a breach.
Two Different Kinds of Defense-in-Depth
This incident highlights something we don’t talk about enough: there are actually two layers of defense-in-depth at play in modern infrastructure.
First, there’s our own defense-in-depth — the controls we design, deploy, maintain, and fully understand. If we remove one of our layers (or several at once), we understand the risk and can control the compensating actions.
Second, there’s the defense-in-depth of the SaaS providers who store or process our customer data. They are effectively an extension of our infrastructure. Their architectural decisions directly impact the security of our data.
The Authority vs. Responsibility Gap
But here’s the problem: we don’t control their day-to-day decisions. We’re not in the room when they make operational trade-offs during an outage. We perform due diligence at onboarding and during periodic reviews, but we don’t have decision authority over what they do in the moment.
So when a SaaS provider modifies or bypasses multiple security controls to restore service, the downstream exposure shifts directly to us. We’re still accountable for the data we’ve placed in their platform, but we have no practical way to influence or halt the decision they made in real time.
That’s the inherent challenge with the SaaS model: we carry the responsibility, but not the decision authority.
And in that space — yes — it becomes an unavoidable form of blind trust, simply because the model offers no other option unless the vendor communicates proactively.
What This Means Practically
If You’re a SaaS Provider
If you provide SaaS services to regulated industries or handle sensitive data on behalf of clients, this outage should be a forcing function for how you think about transparency during incidents.
When you make architectural changes under pressure (bypassing a CDN, turning off a WAF, relaxing rate limits, disabling bot protection — whatever the combination), your clients need to know. Not three months later in a compliance report. Not after a breach investigation uncovers it. In the moment, or as close to it as possible.
Because the risk you’re accepting isn’t just yours. It’s theirs too.
If You’re a Client of SaaS Vendors
If you rely on SaaS vendors to protect critical data, this is a reminder that due diligence can’t stop at onboarding. You need ongoing visibility into how your vendors operate during incidents. You need contract language that requires transparency around security posture changes — especially when multiple controls are bypassed at once. You need to ask the uncomfortable questions about what happens when something breaks and they need to route around their own protections.
Because “we trust our vendor” is not a control. It’s a hope.
And hope is not a strategy.
The Questions You Should Be Asking
Nicole Scott from Replica Cyber called the Cloudflare outage “a free tabletop exercise, whether you meant to run one or not.”
She’s absolutely right. Whether you’re a SaaS provider or a client of SaaS vendors, this outage should prompt some uncomfortable internal questions:
- What was turned off or bypassed (WAF, bot protections, geo blocks), and for how long?
- What emergency DNS or routing changes were made, and who approved them?
- Did people shift work to personal devices, home Wi-Fi, or unsanctioned SaaS providers to get around the outage?
- Did anyone stand up new services, tunnels, or vendor accounts “just for now”?
- Is there a plan to unwind those changes, or are they now permanent workarounds?
- For the next incident, what’s the intentional fallback plan, instead of decentralized improvisation?
If you bypassed Cloudflare during this outage and you can’t answer these questions, you’re not ready for the next one.
Final Thought
I don’t know if any breaches will surface in the coming weeks or months that trace back to this outage window. I genuinely hope they don’t.
But the possibility is real enough that it’s worth thinking through now — not after the fact, when you’re sitting in an incident review trying to figure out how an attacker got in through a gap that didn’t exist in early November.
Defense-in-depth only works if you actually know what the layers are, who controls them, and what happens when several of them disappear at once.
If a SaaS vendor is carrying your DDoS protection, WAF enforcement, bot mitigation, rate limiting, and abuse detection — and you didn’t even know it — that’s not defense-in-depth. That’s someone else’s architecture that you’re depending on without visibility or control.
And when that architecture changes during an outage, you inherit the risk whether you knew about the change or not.
This outage didn’t create that problem. It just made it visible.
Podcast: Download (Duration: 19:06 — 10.4MB) | Embed
Subscribe to the Cultivating Security Podcast Spotify | Pandora | RSS | More