From Safe to Sorry: When Cloudflare "Log" Rules Started Blocking

TL;DR

Monday morning, 9 AM. Checkout conversions down 15%. API errors spiking from 0.2% to 3.8%. Support tickets flooding in. Mobile app logins failing. The on-call engineer starts debugging: database queries look fine, application logs show nothing unusual, network partitions ruled out, no recent deploys. An hour passes. Finally, someone suggests checking the WAF.

There it is. Cloudflare managed rules—deployed October 27 but scheduled for November 3 in the changelog—colliding with override patterns configured months ago and long forgotten. Beta rules with broader matching logic, intended to run in "log" mode, instead blocking production traffic on the company's primary gateway. Single point of failure, now failing.

This is a real incident (customer anonymized), a technical detective story, and a reminder that your WAF integration deserves the same monitoring vigilance you give your database or payment processor. This happens more often than you'd think. It's completely preventable. And it's exactly why we're building Huskeys.

‍

The incident: When "log" becomes "block"

Monday morning chaos

The symptoms appeared suddenly but inexplicably—no deploys over the weekend, no infrastructure changes, no marketing campaigns launched. Everything that usually breaks things looked clean. But something was definitely wrong:

Checkout conversion rate dropped 15%
API error rate spiked from 0.2% to 3.8%
Support tickets increased 4x
Mobile app login failures jumped to 12%
Complete downtime a few main apps were unreachable for hours

The engineering team started where you'd expect: database queries, application logs, network partitions, Redis connections, CDN health. Everything looked fine. Metrics were clean. Traces showed nothing unusual. The application was healthy; traffic just wasn't making it through. It took an hour of head-scratching before someone—newer to the team, less steeped in assumptions—suggested checking the WAF. "Has anyone looked at Cloudflare?"

‍

The scheduled release (that wasn't quite scheduled)

Here's where things get interesting. The Cloudflare changelog showed a scheduled update for November 3, 2025. But the rules actually went live on October 27—a full week early. The changelog date didn't match the deployment date.

This timing discrepancy meant no one was watching for changes. The team wasn't expecting updates for another week.

The release included several new threat detections—beta rules, still being refined:

Rule IDTypeDate AddedDefault ActionScheduled Live DateActual DeployCVE-2025-54236Adobe Commerce RCE2025-10-28Log2025-11-032025-10-27CVE-2025-54241WordPress Plugin XSS2025-10-29Log2025-11-032025-10-27CVE-2025-54248Laravel Debug Exposure2025-10-30Log2025-11-032025-10-27OWASP-SQLI-0847SQL Injection Pattern2025-10-31Log2025-11-032025-10-27

All rules defaulted to "log" mode. Safe, right?

‍

The collision: When beta overrides production

Here's what made this incident particularly painful. Months earlier, the security team had configured some broad override patterns:

Code snippet

Show moreCopy

# Customer overrides configured months ago overrides: - pattern: "CVE-*" action: block reason: "Block all CVE detections by default" - pattern: "OWASP-SQLI-*" action: block reason: "Block SQL injection attempts"

These made sense at the time—be aggressive about known CVEs and SQL injection. But nobody documented them well, and over time, people forgot they existed.

When the new managed rules landed on October 27 (remember, a week before anyone expected them), they matched these patterns:

CVE-2025-54236 (intended: log) → actual: block
OWASP-SQLI-0847 (intended: log) → actual: block

But here's the critical detail: these were beta rules that interacted unexpectedly with already-tested production overrides. The team had spent weeks carefully tuning their security posture. Then these new beta-quality rules—with broader matching logic than previous versions—landed with no log-first observation period, no gradual rollout, straight to blocking mode because they matched those forgotten override patterns.

The new rules caught legitimate API calls that looked slightly suspicious under the broader detection logic. In "log" mode, this would have been fine—the team could review and tune gradually over days or weeks. In "block" mode, it caused unexpected blocking on their most critical application: the primary gateway serving all downstream business systems. Single point of failure, now experiencing failures.

And when they checked Cloudflare's status page during the incident? No issues reported. Everything showed green on the vendor side while production metrics were spiking red.

‍

The detection problem: Why did this take an hour?

An hour is an eternity when your checkout is broken. So why did it take so long to identify the WAF as the culprit?

The answer reveals a critical gap in how most teams monitor their security infrastructure:

No real-time visibility into rule behavior changes. The team could see that requests were failing, but they couldn't see that WAF rules had suddenly become active and started blocking traffic. There was no alerting system watching for dormant rules suddenly coming alive and affecting production.

No automated changelog tracking. Someone would manually check Cloudflare's changelog periodically, but there was no automated system watching for updates and cross-referencing them against existing override patterns. The team had no idea that new rules matching their override patterns were being deployed—especially not a week early.

No correlation with vendor status. When production metrics started spiking, the natural instinct was to check if the vendor was having issues. But Cloudflare's status page showed green across the board. There was no way to know whether this was a vendor-side change, a legitimate attack being blocked, a false positive wave, or an application issue.

Missing operational integration. The team's existing security workflow was: WAF → SIEM → Slack for alerting. But they hadn't connected their WAF logs to this pipeline yet. When the incident hit, they had no centralized visibility, no historical baseline to compare against, and no automated alerts. Just a growing sense that something, somewhere, had changed.

This wasn't negligence. This was a sophisticated team running production infrastructure at scale. They just hadn't treated their WAF integration as a critical, actively-managed dependency that requires the same monitoring rigor as databases, APIs, and payment processors.

‍

The resolution

Once the team identified the WAF as the culprit, the fix was quick: disable the conflicting override patterns, refine them to be more targeted, and restore service. Twenty minutes later, production was stable.

But the real work came next: building the monitoring infrastructure to prevent this from happening again. They connected WAF logs to their existing SIEM, implemented automated change tracking for vendor updates, and established a testing philosophy where even managed rules go through observation periods before blocking. The tactical fix took 20 minutes; building the operational safeguards took two weeks. It fundamentally changed how they thought about managed security tools: not "set it and forget it," but active integrations requiring continuous monitoring.

‍

The bonus discovery: Dead rules

While investigating, the team uncovered something even more troubling—dead rules.

These are rules that can never fire because earlier rules in the chain always match first:

Code snippet

Show moreCopy

# Rule #42: Block common XSS patterns - pattern: "<script.*?>" action: block # ... 50 more rules ... # Rule #93: Log potential XSS for review (DEAD - never fires!) - pattern: "<script>" action: log note: "Review these for false positives"

Rule #93 was added later to catch and review potential false positives. But Rule #42's broader pattern always matches first, so #93 never sees any traffic. For months, the team thought they were reviewing edge cases—they weren't.

This creates two problems:

False confidence - You think you have coverage you don't actually have
Intent drift - Your security posture gradually diverges from what you intended

‍

Why WAF integration health matters

Cloudflare's managed rulesets are powerful—they push updates regularly to protect against emerging threats, often weekly. This velocity is a strength: you get cutting-edge threat protection automatically. But it also means your WAF becomes a constantly evolving system, not a static configuration. Your custom overrides create unique interactions that emerge over time as new rules land.

In this incident, the WAF wasn't just protecting one application—it was the primary gateway for all business-critical systems. When the unexpected rule interactions caused blocking, the blast radius was enormous: checkout, mobile auth, APIs, internal tools—everything downstream failed. This is the reality for many production environments: the WAF sits at a critical junction point that touches everything, yet it's often treated as "set it and forget it" rather than the mission-critical integration it is.

‍

Why we invest deeply in Cloudflare integration

From working with production deployments, we've learned that maximizing value from managed WAF providers requires deep integration understanding—not just surface-level API calls, but intimate knowledge of how rulesets evolve, how custom configurations interact with managed updates, and where collision points emerge.

We’re not here to replace Cloudflare—their data layer and performance are exceptional. We’re the missing control plane. Think of it as the new brain for network security that works seamlessly with your existing stack, making everything just work—simple, dynamic, and built for the modern app age.

‍

Takeaways: Three principles for WAF integration health

1. Monitor like any critical dependency. Your WAF provider is like your database or payment processor—you wouldn't run production without monitoring those integrations. You need real-time visibility into how your WAF behaves in production, not just how it's configured. Without continuous monitoring, you're debugging blind when something goes wrong.

2. Build in observation periods. The difference between a quick fix and a prolonged outage is having a buffer between "new rule deployed" and "new rule blocking production." Even vendor-managed updates should operate in observation mode first. This validates that new detections work with your specific traffic patterns before they can cause availability incidents.

3. Validate your actual protection. Configuration drift is silent. Dead rules, conflicting overrides, and inadvertent gaps emerge as your configuration and vendor rulesets evolve. Continuous validation ensures what you think is protecting you actually is.

What emerges from this incident: the need for a monitoring layer that acts as a continuous guardian of your WAF integration health. Not replacing your WAF, but ensuring it works the way you intend. That's the space Huskeys occupies.

‍

Summary

This incident is real—a sophisticated team running production infrastructure carefully, with thoughtful testing processes. They still experienced an unexpected interaction between managed rule updates and their existing configuration. Not a vendor problem or a customer problem, but an integration monitoring challenge that emerges naturally when powerful systems evolve independently. That's why we are here.

‍

We think about these problems constantly. If you're running production security infrastructure and want to chat about WAF integration health, let's talk. 🐕

‍