The Friday Deploy Debate Is Asking the Wrong Question

I've been in enough teams to know how the Friday deploy conversation goes. Someone pushes a change at 3pm on a Friday, something breaks, two engineers spend their Saturday debugging it, and by Monday there's a new rule in the team wiki: "No deploys after noon on Fridays." Everyone nods. Problem solved. Until the next incident happens on a Wednesday and nobody understands why the rule didn't save them.

From experience, most teams arrive at "no Friday deploys" after a painful weekend. It makes sense in the moment. You get burned, you put a fence around the hot stove. But the fence isn't solving the problem, it's just making the stove harder to reach. The stove is still hot.

What the debate is really about

The Friday thing has been going back and forth in the engineering world for years. Charity Majors wrote a famous post essentially saying that if you can't deploy on Fridays, your CI/CD pipeline is broken. And she's not wrong. But the teams who say "never on Fridays" aren't wrong either. They're being honest about where they are. They know their deploys are risky and they know their response capacity on weekends is limited. The rule is a patch for a deeper problem they haven't had time to fix.

And that's the thing. It's never about Friday specifically. It's about conditions. Can someone respond if this deploy goes sideways? Is the system healthy right now? How many users does this change touch?

I think about it like driving. You don't avoid driving on Fridays because the roads are statistically more dangerous. You slow down when it's raining, when visibility is low, when you're tired. The day on the calendar doesn't change the physics of the car. The conditions do.

The conditions that actually matter

After spending time thinking about what makes deploys go wrong, I've noticed it usually comes down to three things, and none of them care about the day of the week.

The first one is response capacity. Not how many people are on your team, but whether the person who understands the change being shipped can respond quickly if things break. An engineer deploying a database migration and then going offline is risky on a Tuesday just as much as on a Friday. The problem was never the day. It was that nobody was there to catch the fall.

The second is system state. Deploying into a system that's already degraded is like adding more weight to a bridge that's cracking. This is probably the one teams think about the least. If there's an active incident, if error rates are already elevated, any new change is adding variables to a situation that already has too many. Roblox had a 73-hour outage in 2021. During the recovery effort, the team upgraded their infrastructure to more powerful machines, thinking it would help. It made things worse. A well-intentioned change, made during active instability, extended the outage instead of shortening it.

The third is blast radius. A change that reaches 100% of your users instantly is fundamentally different from one rolled out to 1% behind a feature flag. Cloudflare learned this the hard way, multiple times. A WAF rule change in 2019 went global in seconds, consumed 100% CPU on every machine, and took their entire network down. The exact same change, deployed to a single data center first, would have been caught and rolled back before anyone noticed.

Why teams don't fix the real problem

This is where it gets a little frustrating, and I don't mean frustrating at the engineers. I call it the "escalator problem." You keep walking up but the escalator is going down. Most teams know they should invest in better deploy pipelines, better monitoring, better rollback strategies. But they're so deep in feature work and fire-fighting that the improvements keep getting pushed to the next sprint. And the next one. And the next one.

It's the same dynamic I see with incident response in general. Teams put fires out constantly but barely get the time to reflect on what didn't work and put the right fixes in place. The "no Friday deploys" rule is a symptom of this cycle. It's the cheapest possible mitigation. Add a rule to the wiki instead of fixing the pipeline. And it sticks around because fixing the pipeline requires time that leadership usually allocates somewhere else.

Condition-based thinking

I think the better conversation isn't about which day to deploy, but about whether you know, in real time, if conditions are safe for a deploy. That's a different question entirely.

Is someone on-call who understands this change? Is the system currently healthy, or is there an active incident? How quickly does this change propagate and how quickly can it be rolled back?

If those three things check out, Friday at 4pm is fine. If any one of them is off, Tuesday at 10am isn't safe either.

The teams that deploy 50 times a day on any day of the week aren't reckless. They've invested in systems that make every deploy answerable to those conditions automatically. Canary rollouts, automated rollback, feature flags, real-time observability. The day doesn't matter because the system knows when to stop.

The part we often miss

The Friday deploy debate is really a proxy for a bigger question about engineering discipline. It's the same tension that shows up everywhere. Do we move fast now and pay for it later, or do we invest in doing things properly even if it costs us speed today?

From experience, the practices that feel slow in the moment are the ones that make everything faster six months from now. A deploy pipeline that can roll back in seconds. Monitoring that catches problems before users do. Merge protection that activates automatically when an incident fires instead of depending on someone posting "DON'T MERGE" in Slack and hoping everyone reads it.

The question worth sitting with isn't "should we deploy on Friday." It's whether your team has the systems and the discipline to know when it's safe to deploy, on any day, and when it isn't. And if the honest answer is no, that's okay. But the fix isn't a calendar rule. The fix is building the infrastructure that makes the question irrelevant.

Navigation

What the debate is really about

The conditions that actually matter

Why teams don't fix the real problem

Condition-based thinking

The part we often miss