Chapter 7: When the Business Wants It Yesterday
The bearing of a child takes nine months, no matter how many women are assigned.
The Mythical Man-Month, 1975
Will Larson has been CTO or head of engineering at Digg, Uber, Calm, Stripe, and Carta. Every time he meets fellow engineering leaders, he finds himself in the same conversation — what he calls the engineering leader’s version of Groundhog Day. "The biggest challenge these leaders are facing is pressure from their CEOs to drive engineering velocity. The message is usually: ship more value, more quickly." The problem, Larson observes, is that "no one has a whole lot of conviction that there is a simple way to do this. There’s no lever like there is in sales."[1]
The conversation recurs because the pressure is structural. It is not a symptom of a bad CEO-CTO relationship, a poorly managed sprint, or a temporary resource shortage. It is a permanent feature of the CTO role. The CEO is accountable to the board for growth. Growth depends on features, fixes, and product improvements that engineering must deliver. The CEO’s instinct — more output, faster — is rational from where they sit. The CTO’s knowledge — that rushing and speed are different things, that shortcuts compound, that the outage caused by cutting corners will cost more than the feature delay it was supposed to prevent — is equally rational but harder to communicate.
This chapter is about how to manage that tension without losing either the relationship or the product. Chapter 6 established that shipping speed is the startup’s primary competitive advantage. This chapter addresses what happens when the business pressure to ship exceeds what the engineering team can safely deliver — and why the CTO who navigates that pressure well is practising the business acumen that Chapter 10 will argue is the difference between the CTOs who survive and the ones who are replaced.
The pressure is constant, but its character changes with the market. During the zero-interest-rate boom of 2020–2021 — the first period in SaaS history when, as Jason Lemkin observed, "efficiency actually didn’t matter" — urgency came from growth.[2] Ship fast to capture market share. Hire aggressively. Run multiple experiments in parallel. The CTO was rewarded for breadth: more features, more teams, more surface area. Then the correction arrived. Sequoia Capital’s "Adapting to Endure" presentation in May 2022 told portfolio companies to "confront reality and act decisively," warning that "the cost of capital has fundamentally increased."[3] Urgency shifted from growth to survival. Ben Brown, CTO of Flock, captured the whiplash: "In the past, we asked for three more teams and the answer was to go off and hire them now. Now we hear there’s no money for it."[4] Paul Graham’s "default alive or default dead" diagnostic — always relevant, suddenly urgent — became the framing every board conversation started with: "If the company is default alive, we can talk about ambitious new things they could do. If it’s default dead, we probably need to talk about how to save it."[5] And by 2025, a third mode emerged: uncertainty urgency, driven by tariffs, geopolitical fragmentation, and AI disruption so rapid that planning horizons compressed to quarters rather than years. Tobi Lütke’s April 2025 Shopify memo — requiring teams to "demonstrate why they cannot get what they want done using AI" before requesting headcount — is the defining artefact of this era.[6] It is neither growth-at-all-costs nor survival-mode austerity. It is optionality: keep the organisation flexible enough to absorb capability shifts that nobody can predict.
The CTO must recognise which mode they are operating in, because the appropriate engineering response differs. Growth urgency rewards breadth — more experiments, more features, more parallel bets. Survival urgency rewards depth — fewer features, done well, that generate revenue now. Uncertainty urgency rewards reversibility — small bets, fast learning, architecture that can pivot without a six-month rewrite. The CTO who applies a growth playbook during a bust will burn cash on experiments that never reach customers. The CTO who applies a survival playbook during a boom will lose the market to competitors who moved faster. And the CTO who does not recognise uncertainty as a distinct mode — who mistakes it for either boom or bust — will optimise for the wrong thing entirely.
Rushing Is Not Speed
The most important distinction in the CTO’s vocabulary is the one that most business conversations collapse: the difference between rushing and speed.
Rushing is skipping safeguards. It is deploying without testing, merging without review, shipping without monitoring. It feels fast. It produces output quickly. And it accumulates the kind of debt that Chapter 5 described — not the strategic debt of a conscious trade-off, but the reckless debt that compounds until the 3 a.m. phone call arrives.
Speed is something else entirely. Speed is high deployment frequency with proper safeguards in place. Speed is the CI/CD pipeline from Chapter 6 that lets the team ship ten times a day because each deployment is small, tested, and reversible. Speed is the discipline of scope reduction — shipping less scope more often — rather than the chaos of shipping everything at once with fingers crossed.
Charity Majors, CTO of Honeycomb, an observability platform, states the counterintuitive principle that a decade of DORA research supports: "Speed is safety in software."[7] The teams that deploy most frequently are not the ones with the most outages. They are the ones with the fewest. The teams that rush — that bypass testing, skip staged rollouts, and deploy large batches of changes infrequently — are the ones that break things.
The DORA team’s data, now spanning more than a decade of research across tens of thousands of engineering professionals, has demonstrated this consistently. Elite-performing teams deploy on demand — often multiple times per day — and have a change failure rate of roughly 5%. Low-performing teams deploy less frequently and fail more often. The 2024 report quantifies the gap: elite performers deploy 182 times more frequently than low performers and recover from failures 2,293 times faster.[8] Nicole Forsgren, who co-founded the DORA research programme, summarises the finding in Accelerate: "High performers understand that they don’t have to trade speed for stability or vice versa, because by building quality in they get both."[8]
Dave Farley, co-author of Continuous Delivery, states the principle more directly: "The real trade-off, over long periods of time, is between better software faster and worse software slower." He draws the manufacturing parallel: "It wasn’t that Toyota built crappy, cheap cars more quickly. They built higher quality cars, more cheaply and more quickly."[9] The DORA research confirmed that the same dynamic applies to software: the teams that invested in quality infrastructure — testing, monitoring, deployment automation — did not sacrifice speed. They achieved speed because of the investment, not despite it.
Martin Fowler frames this as the "tradable quality hypothesis" — the belief, widespread among business leaders, that quality can be reduced to gain speed. Fowler’s argument: "It’s vital to focus on the true value of internal quality — that it’s the enabler to speed. The purpose of internal quality is to go faster."[10] The CTO who accepts the framing that speed and quality trade off has already lost the negotiation. The correct framing — the one supported by a decade of data across tens of thousands of teams — is that they are complements. The CTO’s job is to build the system where that complementarity is visible, and to communicate it in language the business can evaluate.
Facebook’s decade-long cycle illustrates the tension. Mark Zuckerberg’s original motto — "Move fast and break things" — became a cultural touchstone for the startup world.[11] By 2014, Zuckerberg had revised it: "Move fast with stable infrastructure. Because when you build something that you don’t have to fix 10 times, you can move forward on top of what you’ve built."[11] By 2023, facing pressure from Wall Street, Meta returned to aggressive speed language. The cycle reveals a truth that the startup CTO should internalise: the velocity conversation never resolves. It oscillates. The CTO who expects it to end is expecting something that has not happened at any company, at any scale, in the history of the software industry.
|
AUTHOR: Your version of this conversation at CorralData — the moment when business pressure met engineering judgment. The healthcare B2B context makes the tension sharper: the cost of moving too fast in a regulated vertical is not just an outage. It is a compliance incident, a customer data exposure, a trust violation that may be irrecoverable. |
When Rushing Creates the Crisis
On the morning of August 1, 2012, Knight Capital Group deployed new trading software to connect with a New York Stock Exchange programme called the Retail Liquidity Program. A technician had manually updated the software on seven of the company’s eight servers. One server was missed. On that server, the old code — a defunct function that had been repurposed but not removed — began executing trades at high volume without the controls that the new code was supposed to provide.[12]
In the 45 minutes before the market opened and the error was contained, Knight Capital’s router sent more than four million orders into the market while attempting to fill just 212 customer orders. The company traded more than 397 million shares, acquired several billion dollars in unwanted positions, and suffered a loss of more than $460 million — enough to threaten the firm’s survival.[12]
The SEC investigation revealed that an internal system had generated 97 automated error emails before the market opened, referencing the problematic router and identifying the failure. Nobody acted on them. Knight Capital had no documented procedures for deployment verification, no automated checks to confirm that all servers were running the same version, and no incident response protocol that would have caught the discrepancy before it became catastrophic.[12]
The lesson is not "be more careful." The lesson is that the safeguards Knight Capital lacked — deployment verification, automated consistency checks, incident response protocols — are precisely the infrastructure of speed that Chapter 6 described. Knight Capital did not fail because it moved too fast. It failed because it rushed: it deployed a critical change manually, without verification, without rollback capability, and without monitoring that would have caught the error before the market opened.
On July 19, 2024, CrowdStrike provided the highest-profile illustration of the same principle at a different scale. A content configuration update for CrowdStrike’s Falcon security platform — not a code change, but what the company called "Rapid Response Content" — was pushed to all customers simultaneously. The update contained a parameter count mismatch: the template defined 21 input fields, but the code supplied only 20 values. The mismatch evaded multiple layers of validation because the test cases used wildcards for the 21st field. Within 78 minutes, 8.5 million Windows devices crashed.[13]
CrowdStrike’s code deployment process was rigorous. Code changes went through dogfooding, staged rollouts, and customer-controlled deployment windows. But Rapid Response Content — the configuration updates designed to respond quickly to emerging threats — bypassed all of it. As Adam Meyers, CrowdStrike’s senior vice president, testified before Congress: "The updates were distributed to all customers in one session. We’ve since revised that."[14]
The financial impact was staggering. Parametrix estimated that Fortune 500 companies alone suffered more than $5 billion in direct losses. The healthcare and banking sectors were hardest hit, with estimated losses of $1.94 billion and $1.15 billion respectively. Delta Air Lines canceled 7,000 flights and manually reset 40,000 servers, reporting a $500 million loss over five days.[15] Only 10 to 20% of the losses were estimated to be covered by cybersecurity insurance.[16]
The irony was inescapable: a security company, whose product existed to protect others from system failures, caused one of the largest IT outages in history by bypassing its own deployment safeguards for the sake of speed. CrowdStrike’s testimony contained a sentence that should be posted on the wall of every CTO’s office: "Trust takes years to make and seconds to break."[14]
Equifax provides the third case study, this one about speed of omission rather than speed of action. In March 2017, a critical vulnerability in Apache Struts was disclosed, and a patch was made available. Equifax’s internal security team ordered all vulnerable systems to be patched within 48 hours. They did not follow up to verify that the order was carried out. It was not. An SSL certificate used for intrusion detection had been expired for nine months, blinding the system that would have caught the breach. When the breach was finally discovered in July, 147.9 million Americans had their personal data exposed. The total cost exceeded $1.38 billion.[17]
The common pattern across all three cases is not technical failure. It is the absence of the safeguards that turn rushing into speed: automated verification, staged deployment, monitoring, incident response, and — most fundamentally — a culture that treats these safeguards as non-negotiable rather than optional when the business wants something yesterday.
|
AUTHOR: A CorralData near-miss or incident caused by speed pressure — even a minor one — would ground this section. In healthcare B2B, what does the "3 a.m. phone call" look like? What is the cost of an outage to your clients? |
The Non-Negotiables List
The CTO cannot fight every battle with equal intensity. Saying no to everything the business wants is as destructive as saying yes to everything — it erodes trust and positions engineering as an obstacle rather than a partner. The practical skill is knowing which battles are non-negotiable and which are negotiable, and having the language to explain the difference.
Google’s Site Reliability Engineering team provides the most elegant framework for this negotiation: the error budget. The concept is simple: the team defines a service level objective — say, 99.9% uptime — and calculates the remaining 0.1% as a budget that can be "spent" on risk. When the budget is healthy, the product team can push changes aggressively. When the budget is nearly exhausted, the team shifts to reliability work. As the Google SRE book describes it: "This metric removes the politics from negotiations between the SREs and the product developers when deciding how much risk to allow."[18]
The error budget works because it converts a political argument ("we need to slow down" versus "we need to ship faster") into a data-driven one. The budget is either available or it is not. When it is available, the CTO can say yes with confidence. When it is not, the CTO can say no with evidence. The key insight from Google’s implementation: "When the budget is nearly drained, the product developers themselves will push for more testing or slower push velocity. In effect, the product development team becomes self-policing."[18]
Most startups cannot implement a formal SRE programme. But the principle scales down to a non-negotiables list — the five to seven commitments the CTO makes to the company and does not compromise regardless of business pressure. The list is not a set of engineering preferences. It is a set of conditions under which the company can ship fast without creating the crises that slow it down.
A non-negotiables list for a startup CTO might include: every deployment is reversible — if the change breaks production, it can be rolled back within minutes, not hours. No change ships without automated tests covering the critical path — not 100% coverage, but coverage of the flows that customers depend on. Monitoring and alerting are in place before a feature reaches production — the team knows when something breaks before the customer does. Security patches are applied within a defined window — not "when we get to it," but within 72 hours for critical vulnerabilities and two weeks for high-severity ones. Data access controls are never bypassed for convenience — in a healthcare or financial context, this is not an engineering preference but a legal obligation.
Charity Majors’ framing applies here: observability is not a feature request. It is "table stakes. It is non-negotiable."[19] She extends the argument to a broader principle: "SLOs are a hedge against micromanagement, because when teams meet their SLOs, the way they spend their time is not important."[20] The CTO who has established non-negotiables and is meeting them has earned the right to manage engineering time without interference. The CTO who has not established them is negotiating from a position of assertion rather than evidence.
The Veracode data from Chapter 12 adds urgency to this list in the AI era. When 45% of AI-generated code contains security vulnerabilities, and the rate does not improve with newer or larger models, the non-negotiable on security review becomes more important, not less.[21] The GitClear finding — that refactoring has collapsed to historic lows while code duplication has surged — means the non-negotiable on code review is protecting the codebase from a new category of structural degradation.[22] The non-negotiables list is not static. It evolves as the threat environment changes, and AI has changed it.
|
AUTHOR: CorralData’s specific non-negotiables — what are the five to seven things you will not compromise on? In healthcare B2B, HIPAA compliance, data encryption at rest and in transit, and audit logging are likely on the list. The reader needs to see a working CTO’s actual list, not a theoretical one. |
The Velocity Negotiation
The non-negotiables list defines what the CTO will not move on. The velocity negotiation is about everything else — the scope, the timeline, and the communication framework that allows the CTO and the CEO to align without either one capitulating.
David Subar, a technology leadership consultant who has served as CTO or CPO for multiple companies and advised more than 30 organisations including Disney, Meta, and Fox, identifies the fundamental communication error most CTOs make: talking to the CEO about the wrong thing. "Velocity, for example, is important to Engineering and partly to Product, but is completely the wrong thing to discuss with CEOs." The consequence: "Communicating results using the wrong metrics can lead CEOs to believe that their CTO is poorly focused, feckless, or worse, an impediment to the organization."[23] Subar traces the average CTO tenure of two and a half to three years directly to this communication failure.
Larson draws a sharper distinction that the chapter depends on: "There’s two different problems. There’s how do you improve execution? And then there’s how do you address your CEO who’s telling you to improve execution. I think these should be the same problem, but I find they’re a little bit different."[24] The operational problem — actually shipping faster — is an engineering challenge. The communication problem — managing the CEO’s perception of engineering speed — is a relationship challenge. Chapter 6 addressed the first. This section addresses the second.
Ryan Singer’s Shape Up methodology, developed at Basecamp, provides the most practical framework for scope negotiation. The core principle: fixed time, variable scope. "Estimates start with a design and end with a number," Singer writes. "Appetites start with a number and end with a design."[25] The CTO sets the time box — a six-week cycle, a two-week sprint, whatever cadence the team uses — and the scope is adjusted to fit within it. This is not about cutting corners. Singer’s language is deliberate: the team "hammers" scope to fit the time box, preserving the core value while stripping away the features that do not move the needle for the customer.[25]
The Shape Up model resolves a specific dynamic that creates most CEO-CTO friction: the CEO announces a deadline, the CTO estimates the work, the estimate exceeds the deadline, and the negotiation becomes adversarial. Under Shape Up, the deadline is shared and fixed. The negotiation is about scope, which both parties can discuss rationally. The CTO is not saying "we can’t do it in time." The CTO is saying "in this time, here is the most valuable version of it we can deliver." The conversation shifts from speed to value.
Jeff Bezos’s "disagree and commit" principle, introduced in his 2016 letter to Amazon shareholders, provides the executive-level framework for the cases where the CTO and CEO genuinely disagree about priorities. “If you have conviction on a particular direction even though there’s no consensus, it’s helpful to say, ‘Look, I know we disagree on this but will you gamble with me on it? Disagree and commit?’” Bezos emphasises that the principle runs both ways: "If you’re the boss, you should do this too. I disagree and commit all the time."[26]
For the startup CTO, "disagree and commit" is a tool for the cases that fall between the non-negotiables (which are not up for debate) and the scope adjustments (which are routine). The CEO wants to prioritise a feature the CTO believes is premature. The CTO has made the case, the CEO has listened, and the decision stands. The CTO commits fully to execution — no sandbagging, no passive resistance — while preserving the right to revisit the decision if the data changes.
Gergely Orosz, who was a senior engineering manager at Uber, adds the tactical layer: communicate early and often. "How and when the delay is communicated matters far more than the delay itself."[27] His practice: commit to the business that they will receive regular updates, then provide increasing confidence in dates as milestones are reached. "For every major project, we shipped something different on the launch date than what we agreed to originally. But the stakeholders were never surprised."[27] The principle is the same one Brad Feld articulated for board meetings in Chapter 10: no surprises. The CEO who is warned about a delay three weeks in advance can adjust plans. The CEO who learns about it the day before the deadline has lost trust in the CTO’s judgment.
Larson provides one additional framework for the cases where the CTO must push back. Most velocity disputes, he argues, are really about one of two things: "why is this taking so long when it should take a couple of hours?" or "why can’t you work on this other, more important, project?" These are different problems requiring different responses. The first is a velocity question — and the best response is to identify a reality-based approach that addresses the constraint. The second is a prioritisation question — and the best response is to make the trade-off visible: "we can do this project, but it means delaying that one. Which is more important?"[28] The CTO who conflates the two will fight the wrong battle. The CTO who separates them can address each one with the right tool.
Larson also warns against a common CTO defence mechanism: invoking technical debt and toil as a blanket justification for saying no. "The specter of technical debt and toil have been used to shirk so much responsibility, that simply naming them tends to be unconvincing."[28] The CTO who has done the debt audit from Chapter 5 — who can quantify the cost of debt service in dollars and engineering hours — has a persuasive case. The CTO who waves vaguely at "technical debt" has nothing.
The velocity negotiation is, at its core, a practice run for the board communication that Chapter 10 will describe. The CTO who can negotiate scope with the CEO — using business language, presenting trade-offs rather than ultimatums, offering alternatives rather than refusals — is developing the translation skills that determine whether they survive the transition from builder to executive.
|
AUTHOR: A specific velocity negotiation from CorralData — a feature request from the business that required scope reduction or timeline adjustment, and how the conversation went. The reader needs to see the Shape Up / appetite principle applied in a real context. |
Guardrails, Not Roadblocks
The DORA data resolves the chapter’s central tension, but it also introduces a finding that challenges the CTO’s instinct for caution. Forsgren, Humble, and Kim found that external change approval processes — review boards, manager sign-offs, compliance gates — were negatively correlated with lead time, deployment frequency, and recovery time. The most striking result: external approvals "had no correlation with change fail rate." They did not make deployments safer. They only made them slower. The finding was blunt: external approval processes are "worse than having no change approval process at all."[8]
This does not mean the CTO should abandon oversight. It means the form of oversight matters. The distinction, articulated by the DevSecOps community in the late 2010s, is between gates and guardrails. A gate is a manual checkpoint that blocks progress until a human approves it. A guardrail is an automated constraint that prevents unsafe actions without requiring human intervention. Automated test suites are guardrails. Deployment pipelines that reject builds with failing tests are guardrails. Static analysis tools that flag security vulnerabilities before code reaches review are guardrails. A weekly change advisory board meeting is a gate.[29]
Google’s error budget is the most sophisticated guardrail: it automates the speed-versus-safety negotiation by linking deployment permission to measured reliability. When the budget is healthy, deployments proceed. When it is exhausted, they halt — automatically, without politics, without a meeting. Google’s implementation data reveals why this matters: "Changes are a major source of instability, representing roughly 70% of our outages."[30] The answer to that statistic is not fewer changes. It is smaller changes, better tested, with faster rollback — the infrastructure of speed from Chapter 6, now understood as a safety system rather than a productivity tool.
John Allspaw, who built the engineering culture at Etsy that became the model for blameless postmortems, provides the cultural complement to the technical guardrails. His 2012 essay on blameless postmortems established the principle that has now been adopted by Google, Netflix, Amazon, and hundreds of startups: "A funny thing happens when engineers make mistakes and feel safe when giving details about it: they are not only willing to be held accountable, they are also enthusiastic in helping the rest of the company avoid the same error in the future."[31] Allspaw’s debriefing facilitation guide expands the point: "`A ‘post-mortem’ debriefing should be considered first and foremost a learning opportunity, not a fixing one.`"[32]
The cultural guardrail is as important as the technical one. The CTO who punishes the engineer who caused an outage will never learn the truth about what went wrong. The CTO who creates a blameless environment will learn everything — and the team will build the guardrails that prevent recurrence, because they feel safe enough to be honest about the failure.
The principle connects the chapter’s argument into a single framework. Speed is not the enemy of safety. Rushing is. The non-negotiables list defines the boundaries. The error budget automates the negotiation. The guardrails replace the gates. The blameless postmortem turns every failure into a system improvement. And the velocity negotiation — the conversation with the CEO, the scope discussion, the communication rhythm — ensures that the business understands what the engineering team is doing and why.
|
AUTHOR: How CorralData handles incidents and postmortems — is there a formal blameless process? Has there been a moment where an incident led to a system improvement that would not have happened without the failure? The reader needs to see the guardrails-not-gates principle in action at a real company, not just at Google. |
The pressure to ship faster will not end. It did not end at Digg, or Uber, or Calm, or Stripe, or Carta — every company where Larson has had the same conversation. It did not end at Facebook, which cycled through "break things," "stable infrastructure," and back to "break things" in the span of a decade. It will not end at your company. The pressure is structural because the business always needs more than engineering can deliver, and the gap between what is wanted and what is possible is the space the CTO occupies.
The CTO who manages that space well — who can distinguish rushing from speed, who has a non-negotiables list and the language to defend it, who can negotiate scope without becoming adversarial, and who builds the guardrails that make fast deployment safe deployment — is not just protecting the codebase. They are protecting their own tenure.
Chapter 9 provides the measurement language that makes this negotiation concrete. The velocity the business demands cannot be discussed in the abstract. It needs metrics — the right ones, presented in the right language, at the right cadence. The DORA data, the SPACE framework, and the stage-appropriate measurement systems of Chapter 9 are the tools that transform the velocity conversation from "we need to go faster" into "here is what we are shipping, here is how fast, and here is what it costs."