Chapter 6: Shipping Is the Strategy

A ship in port is safe, but that’s not what ships are built for.

— Grace Hopper

Monzo is a bank. Not a fintech startup that calls itself a bank — a regulated UK bank, authorised by the Prudential Regulation Authority, holding customer deposits, issuing debit cards, processing millions of transactions. Banks deploy software cautiously. They run change advisory boards, schedule release windows, require multiple sign-offs. Their regulators expect them to.

Monzo deploys to production over 100 times a day.[1]

Any engineer can ship with a single command using a bespoke tool called Shipper, which handles rolling deployments in Kubernetes and runs Cassandra migrations behind the scenes. The change management process: one code review from an engineer on the owning team, merge to main, deploy. For a bank, the process is — in Monzo’s own words — "surprisingly light on human touch points."[1] The architecture underneath: more than 1,600 microservices written in Go, running on Cassandra and Kubernetes on AWS, managed through a monorepo with automated CI checks.[2]

Between mid-2018 and early 2020, under CTO Meri Williams, Monzo’s engineering and data team grew from roughly 50 people to more than 250. The customer base went from under one million to more than four million.[3] Through that growth, deployment frequency per engineer went up, not down — and incidents went down.[1]

The question Williams faced is the question every startup CTO faces when growth arrives: how do you keep shipping at the same velocity when the team quintuples and the surface area of the product explodes? Frederick Brooks proved in 1975 that adding people to a late project makes it later.[4] The conventional answer is that you cannot sustain speed through scaling — that coordination costs will eat the gains from every new hire.

Monzo’s answer was that velocity can survive scaling if you invest in the right infrastructure. Not just CI/CD pipelines and deployment tools, but team structures, scope discipline, and a culture that treats small, frequent, reversible changes as the default. This chapter maps that infrastructure.

Speed Is Learning, Not Just Deployment

Most advice about shipping speed focuses on the deployment pipeline — continuous integration, deployment frequency, mean time to recovery. These matter. But they describe the mechanism, not the strategy. The strategic argument for speed is about learning.

A startup is a series of hypotheses. The product hypothesis, the market hypothesis, the pricing hypothesis, the retention hypothesis. Each deployment is an experiment. The faster you run experiments, the faster you learn which hypotheses are wrong, and the faster you can redirect investment toward the ones that might be right. The CTO who frames velocity as "we deploy more often" is describing plumbing. The CTO who frames velocity as "we learn faster than our competitors" is describing competitive advantage — in terms a board can understand.

Monzo’s engineering team stated the connection explicitly: "Our success relies on us rapidly shipping new features to customers. This tight feedback loop helps us quickly validate our ideas. We can double down on the ideas that are working, and fail fast if we need to."[1] Paul Adams, VP of Product at Intercom, a customer-messaging platform, framed it even more directly: shipping "is primarily about learning. No product team can fully predict how their users will behave or react."[5]

The Slack team discovered this by necessity. When they launched their MVP, they had no signup flows, no import tools, no email integration. They manually invited new teams and generated join links. What they learned from shipping without those features was more valuable than the features themselves: teams much larger than their own wanted to use the product. Rdio, a company of more than 300 people, tried Slack — revealing performance bugs, the need for a mute function, and the fact that users were inventing channel-naming conventions the team had never imagined. Johnny Rodgers and Ali Rayl, two of Slack’s earliest employees, reflected on this: "None of us was smart enough to anticipate all the ways that people would use our multiplayer software. The best way to ensure we didn’t drift away from those needs and expectations was to get people using it."[6] None of those discoveries would have surfaced in a planning document. They surfaced because Slack shipped an incomplete product to real users.

This framing has a direct business translation that most CTOs miss. Every week of delay in shipping a feature is a week of delay in learning whether that feature drives retention, revenue, or neither. A startup burning $200,000 per month that takes eight weeks to ship a feature instead of four has spent an additional $200,000 — not on the feature, but on the delay in learning whether the feature matters. Speed is not recklessness. It is capital efficiency. The CTO who cannot make this argument in business terms has already lost the velocity negotiation before it starts.

This framing applies even — especially — in contexts where the feedback loop is slow. Healthcare B2B, enterprise sales, regulated industries: these all have longer cycles between shipping and learning. But the principle holds. A healthcare SaaS company that takes six months to ship a feature instead of three has not just delayed a launch. It has delayed six months of customer usage data, compliance feedback, and integration testing with real health systems. The iteration cycle is longer, but the cost of delay is higher, not lower.

AUTHOR: A specific CorralData moment belongs here — a feature that shipped fast and produced a measurable business outcome, or one that shipped slowly and the delay cost the team something concrete. The reader needs to see this principle in a regulated, enterprise-sales context, not just consumer tech.

Will Larson, drawing on his CTO survey data, identified the biggest challenge startup CTOs face right now: their CEOs demanding increased engineering velocity. The pressure is real and universal. The CTO who has already reframed velocity as learning speed has a language for responding to that pressure. The CTO who has not will default to the only response the CEO can see — adding engineers, which Brooks proved half a century ago does not work.

The "move fast and break things" framing gets this wrong — not because speed is bad, but because the phrase conflates speed with carelessness. The companies that actually ship fastest are usually the ones with the most discipline, not the least. Kellan Elliott-McCrea, CTO of Etsy, an online marketplace, through its 2015 IPO, put it precisely: "The goal is NOT to be careful. The goal is to be confident."[7] Etsy’s engineering practices, he wrote, were "a spectrum of tools for increasing our confidence in our ability to make change."[7] Confidence and caution are different things. Caution tries to avoid risk. Confidence builds the systems that make risk manageable.

The Infrastructure of Speed

Confidence is not a feeling. It is an infrastructure. Five investments separate teams that ship fast from teams that talk about shipping fast. None of them is a technology choice. Each is a discipline — a commitment the CTO makes about how the team will work.

Continuous integration and delivery. The table stakes. Every commit builds, every build is tested, every passing build can deploy. The CTO’s real investment is not in the pipeline tool but in the discipline: if CI is red for more than an hour, everything stops until it is green. The cost of a broken pipeline is not the pipeline itself — it is every engineer whose workflow is blocked until it is fixed.

The Shopify data makes the case for treating CI speed as a first-class engineering investment. Before Shopify’s Test Infrastructure team tackled the problem, 68% of CI time was spent on overhead before any test ran. CI at the 95th percentile took 45 minutes. Docker container start time was 90 seconds — sometimes spiking past two minutes. After the team invested in instrumentation, Docker I/O optimisation, test selection, and parallel dependency building: CI p95 dropped to 18 minutes, Docker start time to 25 seconds, test stability went from 88% to 97%.[8] Christian Bruckmayer, who worked on that team, described the motivation plainly: "Our developers were frustrated."[8] Frustrated developers do not deploy. They batch changes, wait, and ship large, risky releases — the opposite of what velocity requires.

The same pattern played out at GitHub. CI averaged 45 minutes; after merging, CI ran again; in a perfect scenario, an engineer waited roughly two hours from checking in code to seeing the change live. The team identified two integration-testing jobs as the bottleneck and introduced what they called "deferred compliance" — long-running tests that no longer block deployment but must pass within 72 hours or all deploys halt.[9] CI dropped to 15 minutes. Three times faster.

The Etsy transformation is the canonical narrative, and the one most relevant to startup CTOs because it starts where startups start: small and painful. In 2008, Etsy had roughly 35 employees, half of them engineers. They deployed twice per week. Deploys took hours, required a minimum of three developers plus an ops engineer on standby, and the pattern was predictable: deploy, site goes down. By 2012, the company was making 25 deploys per day; 196 different people deployed to production that year. Production push time: 70 to 150 seconds. By 2013–2014, Etsy was shipping more than 50 times per day.[10] Elliott-McCrea, looking back in 2015, wrote: "Five years ago, continuous deployment was still a heretical idea. The idea you could do it with over 250 engineers was, to me at least, literally unimaginable."[11]

Intercom’s founding leadership made a similar bet before they had a product to deploy. Rich Archbold, then Senior Director of Engineering, described the decision: "In the first six months of Intercom’s creation as a company our CTO and VP of engineering decided we actually needed to have a world class CICD system."[12] They built it before they built most of the product. That decision — to invest in deployment infrastructure before there was much to deploy — paid compound returns for years.

Feature flags. Feature flags decouple deployment from release. Code ships to production without being visible to users, then turns on incrementally — by user segment, by percentage, by geography. This eliminates the most common source of deployment fear: the big-bang release where everything goes live at once. GitHub uses feature flags to ship code continuously, with features gated behind flags and gradually rolled out to expanding user populations.[13] The deployment risk and the feature risk become separate problems. The cost is a flag management system and the discipline to clean up old flags; the return is the ability to deploy at any time without the stakes of a full release. Feature flags are the single most underinvested infrastructure at startup scale, likely because they feel like overhead rather than capability. They are capability.

Trunk-based development. Long-lived feature branches are the enemy of velocity. Every hour a branch lives, it drifts further from main, and every hour of drift is a merge conflict waiting to happen. Trunk-based development — committing to main frequently, using feature flags to hide incomplete work — reduces merge conflicts, accelerates code review, and forces smaller, more reviewable changes.

Monzo’s engineering principles codify this as culture: "Make changes small, make them often. The key idea of change management at Monzo is that small, incremental changes that we can easily undo have a lower risk than large, irreversible deployments."[14] Their Shipper tool and monorepo enforce the discipline structurally. And their platform team — which Matt Heath and Suhail Patel described at QCon London in 2020 — exists to abstract away the complexity of Kubernetes and Cassandra so that product engineers never need to think about it.[2] The explicit mission: "Engineers shouldn’t be expected to know complex things like Kubernetes and Cassandra."[2] When the infrastructure is invisible, the deployment is trivial, and the change is small, engineers ship more often — not because they are told to, but because the friction is gone.

Automated testing. The fastest teams are usually the ones writing the most tests. This is counterintuitive if you were raised on "move fast and break things," but the logic is straightforward: manual testing does not scale. A team of five engineers can get away with manual QA. A team of 25 cannot — the test matrix grows faster than the team. Automated tests are not a quality investment. They are a speed investment. Every hour spent writing tests saves multiples of that hour in manual regression, debugging, and production incidents. The CTO who skips testing to ship faster will ship faster for one quarter. After that, the team spends more time fixing bugs than building features, and velocity collapses.

Etsy’s CI cluster ran more than 14,000 test-suite runs per day.[10] That was not quality overhead. That was the infrastructure that made 50 deploys per day safe. Shopify’s test stability improvement from 88% to 97% was worth a dedicated team because flaky tests erode trust in the entire pipeline — and when engineers stop trusting the pipeline, they stop using it.[8]

Scope discipline. The hardest investment because it is behavioural, not technical. Scope discipline means every feature has a minimum version — the smallest build that tests the hypothesis — and everything beyond that is deferred until the hypothesis is validated. The cost is the discomfort of shipping something that feels incomplete. The return is learning a month earlier whether the thing was worth building. This one matters enough to warrant its own section.

These five investments are not equally urgent at every stage. CI/CD is Stage 1 infrastructure — if you do not have automated builds and deploys before you hire your fifth engineer, you will regret it by the tenth. Testing and feature flags become urgent at Stage 2, when the team grows past what one person can coordinate. Trunk-based development and scope discipline are habits that should start at Stage 1 but are enforced by pain at Stage 2. The CTO who tries to introduce all five simultaneously will overwhelm the team. Sequence them: CI/CD first, testing second, feature flags third, trunk-based development and scope discipline as ongoing cultural work that begins on day one but becomes structurally enforced as the team grows.

Watch for the common failure mode: investing in the pipeline but not in the discipline. A CI/CD system that is routinely red and routinely ignored is worse than no system at all, because it teaches the team that build failures do not matter. The pipeline becomes decoration — a tool that exists but does not constrain. The same failure occurs with feature flags that are never cleaned up (the codebase fills with dead branches), with trunk-based development that is nominally practised but routinely violated for "just this one feature," and with testing that covers the easy paths but skips the edge cases that cause production incidents. The infrastructure only works if the team treats it as load-bearing. The CTO’s job is to enforce that treatment, especially when the pressure to "just ship it" makes cutting the discipline feel reasonable.

AUTHOR: An honest self-assessment of the CorralData pipeline belongs here — which of these five investments has the team made, which are still incomplete, where are the bottlenecks? The reader will trust a CTO who admits what they have not yet built.

Monzo’s deployment blog captured the underlying logic: "Successful startups move quickly. It’s how they can compete with companies who have 1000x the resources. But as companies grow they get slow. The cost of failure increases, so arduous change management processes are introduced." Monzo went the other direction: "optimise the developer workflow for rapid delivery, and this leads to a reduction in risk too." The mechanism, stated as an engineering principle: "less friction encourages smaller changes, and smaller changes are less risky."[1]

Cutting Scope Without Cutting Corners

Every sprint, every planning meeting, the CTO faces the same negotiation: what can we cut to ship sooner? The wrong answer is to cut testing, security, or observability. Those are corners. The right answer is to cut functionality — to ship the smallest version that tests the business hypothesis and defer everything else.

Scope and quality are not on the same axis. Reducing scope means building less. Cutting corners means building poorly. A feature with reduced scope can still be well-tested, well-monitored, and well-documented. A feature with full scope and cut corners will generate production incidents, customer complaints, and technical debt that compounds against future velocity. The CTO’s job is to hold the line on quality while negotiating scope relentlessly — and the language matters. "We’re cutting features" sounds like failure. "We’re shipping the version that tests the hypothesis fastest" sounds like strategy. Both describe the same action.

Ryan Singer, who shaped Basecamp’s product development process, uses language stronger than cutting. "`People often talk about ‘cutting’ scope. We use an even stronger word — hammering — to reflect the power and force it takes to repeatedly bang the scope so it fits in the time box.`"[15] The metaphor is right. Scope does not yield to politeness. It yields to force.

The clearest evidence comes from companies that shipped deliberately incomplete features and learned more than they would have from shipping complete ones.

When Intercom redesigned their reply composer, they removed the markdown preview feature entirely. Their reasoning: most users type quickly and hit reply; a simpler UI serves them better. Within two days, they learned that users who relied on markdown needed preview to check rendering. A few days after the initial ship, they restored it.[5] The cost of the reduced scope: a few days of user friction for a subset of customers. The benefit: they shipped sooner and learned precisely which part of the feature mattered. In a separate instance, an Intercom team chose not to build a many-to-many data relationship that seemed architecturally necessary — the kind of decision that keeps a backend engineer awake at night. The result: customers used the feature without difficulty. The complexity was never built. The team wrote: "something we only discovered by starting small and iterating quickly."[16] That unbuilt data relationship is the purest example of scope discipline: a piece of architecture that felt mandatory, turned out to be optional, and was discovered to be unnecessary only because the team shipped without it.

At Basecamp, a customer asked for a calendar. Rather than building one — Singer estimated six months — the team called the customer and asked what she actually needed a calendar for. She described driving to the office to check a chalkboard wall calendar for free meeting-room slots. The need was not "a calendar." It was "seeing which days have events so I can schedule around them." They designed a Dot Grid — a minimal month view showing dots on days that had events — and shipped it in a six-week cycle.[15] The scope hammer turned a six-month project into a six-week feature. In another case, a customer requested file-archiving permissions. Digging into the actual problem, Singer’s team discovered that someone had archived a file without knowing it would disappear for everyone. The fix: a warning message on the archive action. One day of work instead of six weeks.[15]

Linear, a project-management tool, shipped the first versions of Cycles and Projects — now major product features — in two weeks. They shipped to themselves and private beta users in the first week and began collecting feedback immediately. Authentication: Google login only, because it was the fastest path to shipping. They knew they would need email and other login methods eventually, but authentication was not the hypothesis they were testing.[17]

Each of these examples follows the same pattern: define the hypothesis, build the minimum that tests it, ship, and watch. The calendar hypothesis was not "do users want a calendar?" It was "do users need to see which days have events?" The authentication hypothesis was not "will users accept Google-only login?" It was "is the core product valuable enough to use at all?" When the scope question starts with the hypothesis rather than the feature, the minimum version becomes obvious — and it is always smaller than the team’s first instinct.

Three questions turn scope negotiation from a fight into a framework. What hypothesis does this feature test? What is the minimum functionality that tests it? What can be deferred without invalidating the test? The Basecamp calendar answered these: the hypothesis was "users need to see availability," the minimum was dots on a grid, and full event details could wait. The Intercom reply composer answered them: the hypothesis was "users want a faster reply flow," the minimum was a composer without preview, and preview could be restored if needed. The framework is not a process to follow mechanically. It is a lens that reveals which parts of a feature are load-bearing and which are assumptions.

The language the CTO uses in this negotiation matters as much as the framework itself. "We’re cutting features" sounds like failure. "We’re shipping the version that tests the hypothesis fastest" sounds like strategy. Both describe the same action. The CTO who can reframe scope reduction as speed-to-learning rather than compromise has a tool that works in every room — with engineers who want to build the complete version, with product managers who promised a customer the full feature, and with a CEO who wants to know why the roadmap shows less than what was planned.

Scope creep is the most common velocity killer at startups. It rarely arrives as a dramatic demand. It arrives as a series of small additions, each seemingly reasonable — "can we just add…" — that collectively push the delivery date by weeks or months. The CTO who cannot hold the scope line will never ship fast, regardless of how good the CI/CD pipeline is.

AUTHOR: A specific CorralData scope discipline example belongs here — a feature scoped down, shipped, learned from, and then expanded or killed based on what the data showed. The more specific — customer type, feature, what was cut, what happened — the better.

How Shipping Speed Survives Scaling

Scope discipline keeps a small team fast. The harder problem is whether velocity survives growth — when the team doubles, the codebase triples, and the number of things that can go wrong grows faster than either.

The mathematics are not encouraging. Brooks’s communication overhead formula is precise: n people create n(n-1)/2 communication channels.[4] Two people means one channel. Five means 10. Ten means 45. Fifty means 1,225. Brooks’s original observation was about late projects — "adding manpower to a late software project makes it later" — but the principle applies to organisational velocity broadly. His reasoning: "Men and months are interchangeable commodities only when a task can be partitioned among many workers with no communication among them… This is true of reaping wheat or picking cotton; it is not even approximately true of systems programming."[4] Modern research confirms it. A 2012 study of 951 software projects found that teams of nine or more were significantly less productive than smaller ones.[18] An analysis of 491 projects found that teams of three to seven had the best overall performance, and that the extreme non-linear effort increase did not kick in until teams approached nine people.[19]

Jeff Bezos, characteristically, treated this as a design constraint rather than a law of nature. In early 2002, he reorganised Amazon into "two-pizza teams" — no team large enough to require more than two pizzas to feed. The reasoning was not about pizza. It was about communication channels. Bezos’s philosophy, as recorded by Brad Stone: "Communication is a sign of dysfunction. It means people aren’t working together in a close, organic way. We should be trying to figure out a way for teams to communicate less with each other, not more."[20] At the Pragmatic Engineer Summit in February 2026, an engineering leader at a traditional company that sells physical goods reported that those two-pizza teams (six to ten people) were already becoming one-pizza teams (three to four people), driven by AI tooling.[21] The shrinking of the unit of independent shipping is the current frontier of this problem.

The Monzo scaling period — 50 to 250+ engineers in roughly 18 months — is the most thoroughly documented test of whether velocity can survive that kind of growth at a startup. Williams made several structural decisions that directly addressed the communication overhead problem.

In 2019, Monzo introduced a "Collectives and Squads" system. Squads were small and cross-functional: a product manager, backend engineers, mobile and web engineers, designers, user researchers. They stuck to the two-pizza rule as a baseline for squad size. Collectives grouped related squads, loosely modelled on Spotify’s tribe structure.[22] A critical organisational decision: team leads and engineering managers were separate roles. When an engineer moved teams, their team lead changed but their manager stayed the same — decoupling career development from project assignment and removing one reason engineers resist moving where they are needed.[23]

The platform investment was equally deliberate. Williams described an "engineering excellence team" — essentially a developer-tools team — whose mission was making deployment as easy, seamless, and low-risk as possible for both new joiners and existing engineers.[24] This is the team that built and maintained Shipper. Suhail Patel, presenting at QCon in 2020, stated the philosophy: "We think all the complexities about scaling infrastructure, making sure that servers are provisioned and databases are available, should be dealt with by a specific team, so that engineers who are working on the product can focus on building a great bank."[2] One hundred deploys per day is not the product of 250 engineers individually deciding to deploy. It is the product of a platform team whose job is to make deploying trivially easy and reversibly safe.

Williams observed the communication challenges at each growth threshold with precision. At 50 people, the complaints shift: “You start hearing more of: ‘I don’t know how to progress anymore, I need a career path.’” At 100, trust between teams can no longer rely on personal relationships: "You have to build processes and mechanics to build trust between teams." At 150 — Dunbar’s number — "you can’t maintain a relationship with each of them individually."[25] Monzo’s response at the 50-person threshold was to open-source their engineering progression framework on GitHub — a transparent document showing engineers exactly how to advance.[26] This was not an HR exercise. It was a velocity investment: when engineers can answer "how do I get promoted?" by reading a document instead of scheduling a meeting, one fewer bottleneck exists at the management layer.

Velocity under growth was not frictionless. Two major incidents during Williams’ tenure tested the infrastructure. On May 30, 2019, roughly a quarter of bank transfers into Monzo failed or were delayed for nine hours — the root cause was a third-party Faster Payments Gateway corrupting payment messages.[27] On July 29, 2019, a Cassandra scaling outage hit while the team was expanding from 21 to 27 servers: new nodes unexpectedly assumed ownership of data partitions before streaming data to serve them. The engineering team’s post-mortem included a line that deserves attention: "we’ve confirmed that something we thought was impossible, had in fact happened."[28] The on-call system, designed for 50 engineers, started creaking past 100.[29]

These incidents matter because they demonstrate that the question is not whether problems occur — they will — but whether the infrastructure recovers quickly. Both incidents were documented in detailed public post-mortems. Both led to specific improvements. And the business continuity work Williams led through the end of 2019 paid off in a way no one anticipated: when COVID hit and Monzo moved from a heavily in-person culture to fully distributed overnight, the transition caused — in Williams’ words — "basically no problems."[3]

AUTHOR: A velocity-under-growth observation from Ready Set Rocket or CorralData belongs here — how did shipping cadence change between 7 people and 50 people, or between early CorralData and the current state? What slowed things down? What made the biggest difference?

The velocity challenge changes shape at each stage of the CTO’s evolution. At the Coder stage, velocity is a function of individual speed and scope discipline — the CTO is the primary shipper. At the Manager stage, velocity becomes a function of team coordination and infrastructure investment — the CTO enables other people to ship. At the Director stage, velocity becomes a function of organisational design and autonomous teams — the CTO designs the system that ships. Monzo’s story is a Director-stage story: Williams did not ship code. She built the organisation that shipped code. Brockman’s Tuesday burnout — stacking all his one-on-ones on a single day and losing the rest of the week to recovery — is what happens when a CTO at the Manager stage tries to maintain velocity through personal effort rather than structural investment.

Speed and Quality Are Complements, Not Trade-offs

The conventional wisdom frames speed and quality as opposite ends of a dial: turn one up, the other goes down. A decade of data from DORA — the largest and longest-running research programme on software delivery performance, surveying more than 39,000 professionals cumulatively since 2014 — says the opposite.[30]

The 2019 Accelerate State of DevOps Report, based on roughly 1,000 respondents, found:

Metric Elite performers Low performers Difference

Deployment frequency

Multiple times per day

Monthly to every 6 months

208×

Lead time for changes

Less than 1 day

1 to 6 months

106×

Time to restore service

Less than 1 hour

1 week to 1 month

2,604×

Change failure rate

0–15%

46–60%

7× lower

Source: DORA Team, 2019 Accelerate State of DevOps Report.[30]

Elite performers did not trade reliability for speed. They deployed 208 times more frequently and failed seven times less often. They recovered from incidents 2,604 times faster. The DORA team’s official guidance, updated in January 2026, states it directly: "DORA’s research has repeatedly demonstrated that speed and stability are not tradeoffs."[31] Dave Farley, cited by DORA, frames the actual trade-off: "the real trade-off, over long periods of time, is between better software faster and worse software slower."[31]

The mechanism is the one this chapter has been describing. Smaller changes are easier to review, easier to test, and easier to roll back. Frequent deploys mean each deploy carries less risk. Automated tests catch regressions before they reach production. Feature flags allow instant rollback without a full redeploy. The investments that increase speed — CI/CD, trunk-based development, testing, feature flags — are the same investments that increase reliability. Monzo’s deployment blog stated the relationship in a single sentence: "less friction encourages smaller changes, and smaller changes are less risky."[1]

The DORA findings held across industries. The 2019 report found no evidence that industry type affected performance, "suggesting that organizations of all types and sizes, including highly regulated industries such as financial services and government, can achieve high levels of performance."[30] Enterprise organisations with more than 5,000 employees were actually lower performers than smaller ones. No startup-specific cohort (fewer than 50 employees) exists in the data, but the direction favours small teams, not against them.

A limitation worth naming: DORA’s metrics are self-reported by survey respondents, not measured from production systems. The research team defends their methodology, and the research is the largest dataset of its kind, but self-reported data carries self-selection bias — the professionals who fill out DevOps surveys are likely more engaged with DevOps practices than average. The 2024 report revealed an anomaly: medium performers reported a lower change failure rate (10%) than high performers (20%), prompting the DORA team to introduce a complementary "rework rate" metric.[32] The data is strong but not beyond challenge. What makes the finding credible is not any single year’s survey but the consistency across a decade: speed and stability have correlated positively in every report since 2014.

The AI-assisted development context makes this more urgent, not less. AI coding assistants generate code faster, but approximately 45% of AI-generated code contains security vulnerabilities.[33] GitClear’s analysis of 211 million lines of code changes between 2020 and 2024 found that refactoring dropped from 25% of changed lines to under 10%, while code duplication increased roughly fourfold.[34] The CTO who uses AI tools to ship faster without proportionally increasing testing and review is building faster on a weakening foundation. The quality infrastructure described in this chapter is not a constraint on AI-assisted velocity. It is the prerequisite for AI-assisted velocity to be safe.

Charity Majors, co-founder of Honeycomb, an observability platform, provides the punchline: "Velocity of deploys and lowered error rates are not in tension with each other, they actually reinforce each other. When one gets better, the other does too."[35]

The speed-versus-quality debate is not a debate about engineering values. It is a debate about infrastructure investment. The CTO who frames quality as something to add later — "we can write tests after we ship" — is making a hidden financing decision with compounding interest. Chapter 5 established that technical debt is a financing instrument. Skipping quality infrastructure is borrowing at predatory rates.

The harder challenge is not building this infrastructure. It is explaining what it produces. "We’re investing in testing infrastructure" sounds slow. "We’re building the system that lets us deploy ten times a day without breaking the product" sounds fast. Both describe the same investment. The language determines whether the CEO sees it as prudent engineering or gold-plating.


Shipping velocity is the strategy, not a tactical preference. The fastest team to learn wins — not because speed is inherently virtuous, but because speed is the mechanism by which a startup converts capital into knowledge. Etsy went from deploying twice per week, with crashes, to 50 times per day, with 70-second push times and 14,000 daily test runs. Monzo deployed more than 100 times per day — at a regulated bank — while growing from 50 to 250 engineers. Both invested in discipline, not recklessness. Both got faster and more reliable at the same time.

But the CTO does not operate in a vacuum. The business wants features yesterday. The CEO sees a competitor’s launch and demands a response by Friday. The board asks why the roadmap slipped. Building the infrastructure for sustainable velocity is half the problem. The other half is holding the line when the pressure to ship faster — now, at any cost — is constant.


1. Sewell, W. (2022, May 15). How we deploy to production over 100 times a day. Monzo Engineering Blog. https://monzo.com/blog/2022/05/16/how-we-deploy-to-production-over-100-times-a-day — Metrics cover 2021–2022; the practices described originated during the 2018–2020 scaling period.
2. Heath, M., & Patel, S. (2020). Modern banking in 1500 microservices. QCon London 2020. Transcript at InfoQ. https://www.infoq.com/presentations/monzo-microservices/
3. Targett, E. (2023, March 7). The big interview: Pleo CTO and Monzo veteran Meri Williams. The Stack. https://www.thestack.technology/the-big-interview-meri-williams-cto-pleo/
4. Brooks, F. P., Jr. (1975/1995). The Mythical Man-Month: Essays on Software Engineering (Anniversary ed.). Addison-Wesley.
5. Adams, P. (n.d.). Shipping is the beginning of a process. Intercom Blog. https://www.intercom.com/blog/shipping-is-the-beginning/
6. Rodgers, J., & Rayl, A. (2024, February 19). Building the Slack MVP. Building Slack. https://buildingslack.com/building-the-slack-mvp/
7. Elliott-McCrea, K. (2013, August 8). Paths to production confidence, part 1 of n. Laughing Meme. https://laughingmeme.org/2013/08/08/paths-to-production-confidence-part-1-of-n/
8. Bruckmayer, C. (2021, February 24). Keeping developers happy with a fast CI. Shopify Engineering Blog. https://shopify.engineering/faster-shopify-ci
9. GitHub Engineering. (n.d.). Making GitHub CI workflow 3x faster. The GitHub Blog. https://github.blog/engineering/infrastructure/making-github-ci-workflow-3x-faster/
10. Snyder, R. (2013, March). Continuous deployment at Etsy: A tale of two approaches \[Presentation slides]. https://www.slideshare.net/beamrider9/continuous-deployment-at-etsy-a-tale-of-two-approaches — See also Schauenberg, D. (2014). How Etsy deploys more than 50 times a day. QCon London. Reported in InfoQ. https://www.infoq.com/news/2014/03/etsy-deploy-50-times-a-day/
11. Elliott-McCrea, K. (2015, August 31). Five years, building a culture, and handing it off. Laughing Meme. https://laughingmeme.org/2015/08/31/five-years-building-a-culture-and-handing-it-off/
12. Archbold, R. (n.d.). Speed of deployment \[Podcast interview]. O11ycast, Episode 12. Heavybit. https://www.heavybit.com/library/podcasts/o11ycast/ep-12-speed-of-deployment-with-rich-archbold-of-intercom
13. Gimeno, A. (2021, April 27). How we ship code faster and safer with feature flags. The GitHub Blog. https://github.blog/engineering/infrastructure/ship-code-faster-safer-feature-flags/
14. Monzo Engineering Blog. (2018, June 29). Engineering principles at Monzo. https://monzo.com/blog/2018/06/29/engineering-principles
15. Singer, R. (2019). Shape Up: Stop Running in Circles and Ship Work that Matters. 37signals/Basecamp. Chapter 3: "Set Boundaries" (https://basecamp.com/shapeup/1.2-chapter-03) and Chapter 15: "Decide When to Stop" (https://basecamp.com/shapeup/3.5-chapter-14).
16. Intercom Engineering Team. (n.d.). Intercom’s product principles: Building in small steps to deliver maximum customer value. Intercom Blog. https://www.intercom.com/blog/intercom-product-principles-build-in-small-steps/
17. Saarinen, K. (n.d.). Building at the early stage. Linear App Blog / Medium. https://medium.com/linear-app/building-at-the-early-stage-e79e696341db
18. Rodríguez-García, D., Sicilia, M. Á., García-Barriocanal, E., & Harrison, R. (2012). Empirical findings on team size and productivity in software development. Journal of Systems and Software, 85, 562–570. https://doi.org/10.1016/j.jss.2011.09.009
19. QSM Associates. (n.d.). Team size can be the key to a successful software project. https://www.qsm.com/team-size-can-be-key-successful-software-project — Industry consultancy analysis of 491 projects. Not peer-reviewed.
20. Stone, B. (2013). The Everything Store: Jeff Bezos and the Age of Amazon. Little, Brown and Company. See also AWS Executive Insights (Slater, D.). Powering innovation and speed with Amazon’s two-pizza teams. https://aws.amazon.com/executive-insights/content/amazon-two-pizza-team/
21. Orosz, G. (2026, February 24). The future of software engineering with AI: Six predictions. The Pragmatic Engineer. https://newsletter.pragmaticengineer.com/p/the-future-of-software-engineering-with-ai — The engineering leader is unnamed; the claim is reported secondhand.
22. Lait, S. (n.d.). Engineering culture: The secret behind Monzo’s developer magnet. Level-up Engineering Podcast / Coding Sans. https://codingsans.com/blog/engineering-culture-monzo
23. Monzo Engineering Blog. (2018, June 27). Organising teams and managing engineers. https://monzo.com/blog/2018/06/27/engineering-management-at-monzo
24. Williams, M. (n.d.). Interview. Console DevTools Podcast, Season 3, Episode 10. https://console.dev/podcast/s03e10-engineering-leadership-meri-williams
25. Williams, M. (2018). Scaling your team and culture without wrecking everything. Turing Fest 2018. Write-up by Lucy Fuggle. https://turingfest.com/blog/meri-williams-scaling-engineering-teams/
26. Monzo Engineering Blog. (2018, June 25). We’ve published our engineering progression framework. https://monzo.com/blog/2018/06/25/monzos-transparent-engineering-progression-framework — Framework available at https://github.com/monzo/progression-framework
27. Monzo Engineering Blog. (2019, June 20). Why bank transfers failed on 30th May 2019. https://monzo.com/blog/2019/06/20/why-bank-transfers-failed-on-30th-may-2019
28. Monzo Engineering Blog. (2019, August 9). We had issues with Monzo on 29th July. Here’s what happened. https://monzo.com/blog/2019/09/08/why-monzo-wasnt-working-on-july-29th
29. Monzo Engineering Blog. (2022, February). Scaling our on-call process. https://monzo.com/blog/2022-02-24/scaling-our-on-call-process
30. DORA Team. (2019). 2019 Accelerate State of DevOps Report. Google Cloud / DORA. https://dora.dev/research/2019/dora-report/2019-dora-accelerate-state-of-devops-report.pdf
31. Harvey, N. (2026, January 5). DORA’s software delivery performance metrics \[Guide]. dora.dev. https://dora.dev/guides/dora-metrics/ — Dave Farley quote from Modern Software Engineering (2021), p. 154, as cited by DORA.
32. Stephens, R. (2024, November 26). DORA Report 2024 – A look at throughput and stability. RedMonk. https://redmonk.com/rstephens/2024/11/26/dora2024/
33. Veracode data as reported in the startup CTO landscape research; see Chapter 12 for full treatment.
34. GitClear analysis of 211 million lines of code changes (2020–2024); see Chapter 12 for full treatment.
35. Majors, C. (2019, May 1). Friday deploy freezes are exactly like murdering puppies. charity.wtf. https://charity.wtf/2019/05/01/friday-deploy-freezes-are-exactly-like-murdering-puppies/