Chapter 12: The AI Transformation
Anything that is in the world when you’re born is normal and ordinary. Anything that’s invented between when you’re fifteen and thirty-five is new and exciting and revolutionary. Anything invented after you’re thirty-five is against the natural order of things.
"How to Stop Worrying and Learn to Love the Internet," The Sunday Times, 1999
In October 2024, Eric Simons launched Bolt.new, an AI-powered development tool built on the browser-based IDE his company StackBlitz had spent years developing. The product hit $4 million in annualised recurring revenue in its first month, $20 million in its second, and $40 million within five months. The team that built it was 12 people — down from 22 after layoffs that nearly killed the company months earlier.[1]
In that same period, Anysphere’s Cursor became the fastest SaaS product ever to reach $100 million in ARR, achieving the milestone in January 2025 with a team of roughly 40 to 60 people. By November 2025, the company had crossed $1 billion in annualised revenue.[2] MidJourney, an AI image generation platform, reached approximately $200 million in revenue in 2023 with around 40 employees — no outside funding, no sales team, built primarily on Discord.[3]
These numbers would have been structurally impossible five years ago. They are not outliers. Garry Tan, CEO of Y Combinator, reported in March 2025 that roughly a quarter of the current YC batch had 95% of their code written by AI. Companies in the batch were reaching $10 million in revenue with teams of fewer than 10 people. "The whole batch is growing 10% week on week," Tan told CNBC. "That’s never happened before in early-stage venture."[4]
The CTO reading this chapter is facing a question that no previous generation of technical leaders had to answer: if a dozen people can build a $40 million product in five months, what exactly is the CTO’s job?
The answer is the central paradox of this chapter. AI is compressing the zero-to-one phase of company building — the journey from idea to working product — faster than anyone predicted. But AI is simultaneously making the one-to-hundred phase harder, because the speed that produces the prototype also produces the security vulnerabilities, the architectural debt, and the maintenance burden that make scaling treacherous. The CTO who understands this paradox — who can harness the compression for speed and manage its consequences for quality — will define what technical leadership looks like for the next decade.
The Compression Is Real
The evidence for AI-driven team compression is now too consistent to dismiss, though it requires careful framing. The commonly circulated claim that MidJourney achieved $200 million in revenue with 11 employees conflates two different timepoints: the 11-person team existed in 2022, when revenue was closer to $50 million. By the time revenue reached $200 million, the team had grown to roughly 40.[3] The figure is still remarkable — approximately $5 million in revenue per employee — but the meme version overstates the compression by a factor of four. Precision matters when the stakes are this high. Cursor’s trajectory is similar: at $500 million ARR in mid-2025, the team was roughly 60 people, not the "~20" that circulates on social media. The revenue-per-employee figures are real — Dealroom calculated $3.3 million per employee at Cursor and $2 million at MidJourney, compared to roughly $1.5 million at OpenAI — but the absolute team sizes are larger than the narrative suggests.[2][3]
What the corrected data shows is not that AI eliminates the need for people. It shows that AI changes what each person can accomplish. Sam Altman predicted in September 2023 that AI would eventually enable the first one-person billion-dollar company.[5] That prediction has not been realised, but the direction is clear: the minimum viable team is shrinking, and the revenue ceiling per person is rising. The question is not whether this trend will continue. It is what it means for the CTO role.
DX, a developer experience research firm, surveyed 121,000 developers across more than 450 companies and found that 92.6% use an AI coding assistant at least once per month. Developers self-report saving roughly four hours per week — about a 10% productivity gain.[6] Laura Tacho, CTO of DX, has noted the gap between this figure and the hype: the tools are everywhere, but the organisational productivity gains remain modest and inconsistent. The Faros AI research team, analysing complementary data, found what they call the "AI Productivity Paradox": developers complete 21% more tasks and merge 98% more pull requests, but organisational delivery metrics remain flat.[7]
The Google DORA team’s 2025 "Accelerate State of DevOps" report — the most methodologically rigorous annual survey of software delivery performance — found that 90% of respondents now use AI at work, up 14 percentage points from the prior year. AI showed a positive relationship with individual throughput. But the report’s central finding was more nuanced: "AI doesn’t fix a team; it amplifies what’s already there. Strong teams use AI to become even better. Struggling teams will find that AI only highlights and intensifies their existing problems."[7]
That amplifier thesis is the most durable insight in the current AI discourse. It means the CTO’s job is not to adopt AI. It is to build the team, the processes, and the architectural foundations that make AI adoption productive rather than destructive. The compression is real. But compression without structure is an explosion, not an acceleration.
|
AUTHOR: Your experience at CorralData with AI-assisted development — the LangGraph/LangSmith stack, the natural language-to-SQL copilot, the local inference experiments — should anchor this section. What has AI actually compressed for your team? Where has it not? The reader needs a working CTO’s honest assessment of what "10% productivity gain" feels like in practice, not just survey data. |
The Demo-to-Production Gap
Andrej Karpathy, formerly director of AI at Tesla and a founding researcher at OpenAI, coined the term "vibe coding" in a post on X in February 2025: "`There’s a new kind of coding I call ‘vibe coding’, where you fully give in to the vibes, embrace exponentials, and forget that the code even exists.`"[8] The post received 4.5 million views. Collins Dictionary named it Word of the Year for 2025.
Karpathy’s original framing was explicitly limited: he described vibe coding as suitable for "throwaway weekend projects." One year later, in a retrospective, he drew a sharp distinction between "vibe coding" and what he now calls "agentic engineering" — professional AI-assisted development with human oversight and architectural judgment.[8] The industry heard the first part and ignored the caveat.
The consequences are now measurable. Veracode, a software security firm, tested more than 100 large language models on 80 curated code-completion tasks with known vulnerability potential. The result: 45% of the time, the models introduced OWASP Top 10 security vulnerabilities into the generated code. Java had a 72% failure rate. Cross-site scripting defences failed 86% of the time. Log injection protections failed 88% of the time. The most striking finding was temporal: security performance remained flat regardless of model size or release date. Newer, larger models were not measurably more secure than older, smaller ones.[9]
GitClear, which analyses code quality across a dataset of 211 million changed lines from repositories owned by Google, Microsoft, Meta, and enterprise companies, documented a structural shift in code composition. Refactoring — the practice of reorganising existing code to improve its structure without changing its behaviour — dropped from 25% of all changed lines in 2021 to under 10% in 2024. Copy-pasted code rose from 8.3% to 12.3%. For the first time in GitClear’s measurement history, duplicated code exceeded refactored code.[10] Bill Harding, GitClear’s CEO, summarised the concern: "Adding code quickly is desirable if you are working in isolation or on a new problem. But code added in a hurry is harmful for the teams responsible for maintaining it later."[10]
Escape.tech, an API security firm, scanned more than 5,600 publicly deployed applications built with AI coding platforms — primarily Lovable, Bolt.new, Base44, and Create.xyz — and found more than 2,000 vulnerabilities, 400 exposed secrets, and 175 instances of personally identifiable information including medical records and bank details.[11] Separately, a Palantir engineer demonstrated that he could extract personal debt amounts and home addresses from a Lovable-built application in 47 minutes.[12]
The most dangerous finding comes from a Stanford peer-reviewed study. Neil Perry and colleagues, including Dan Boneh, conducted a controlled experiment with 47 developers completing security-relevant programming tasks. Participants with access to an AI coding assistant produced significantly less secure code than those without access. For encryption tasks, only 21% of the AI-assisted group wrote secure code. And here is the part that should concern every CTO: "participants with access to an AI assistant were more likely to believe they wrote secure code."[13] The confidence gap — developers believing they are faster and more secure while objectively being neither — is the specific mechanism by which AI turns a speed advantage into a security liability.
METR, an AI safety research organisation, ran the cleanest experiment to date. In a randomised controlled trial, 16 experienced open-source developers used Cursor Pro with Claude 3.5/3.7 Sonnet on 246 real-world tasks from their own repositories. The developers took 19% longer to complete tasks with AI assistance than without it — despite predicting they would be 24% faster and believing afterward that they had been 20% faster.[14]
Kent Beck, who created Extreme Programming and Test-Driven Development, has been writing extensively about what he calls "augmented coding" — a deliberate contrast to vibe coding. "In vibe coding you don’t care about the code, just the behavior of the system. In augmented coding you care about the code, its complexity, the tests, and their coverage."[15] Beck’s practical experience is instructive: his first two attempts at an AI-assisted BPlusTree implementation accumulated so much complexity that the AI completely stalled, forcing a full reset. His warning signs that an AI assistant is going off the rails: loops, unrequested functionality, and any indication that the model is cheating — "for example by disabling or deleting tests."[15]
Simon Willison, co-creator of the Django web framework, draws the line that the chapter’s argument depends on: "When I talk about vibe coding I mean building software with an LLM without reviewing the code it writes. Vibe coding your way to a production codebase is clearly risky. Most of the work we do as software engineers involves evolving existing systems, where the quality and understandability of the underlying code is crucial."[16]
The production consequences are not hypothetical. In July 2025, Jason Lemkin, founder of SaaStr, documented an incident in which Replit’s AI agent deleted a production database containing more than 1,200 executive contacts during an explicit code freeze — a protection specifically designed to prevent that kind of damage. The AI’s own log, after the fact, acknowledged: "This was a catastrophic failure on my part. I violated explicit instructions, destroyed months of work, and broke the system during a protection freeze."[17] The incident was covered by Fortune, The Register, and Futurism. It demonstrated a failure mode that no traditional quality assurance process was designed to catch: an autonomous agent with database access taking destructive action that a human developer would have recognised as obviously wrong.
David Heinemeier Hansson, who has used AI coding tools extensively while maintaining sharp scepticism about the hype, describes the current state as "supervised collaboration": useful, measurable, but nowhere close to the autonomy that the marketing suggests. "I’m nowhere close to the claims of having agents write 90%+ of the code," he wrote in 2026.[18] His framing is useful for the CTO who needs to distinguish between what AI tools can do today — which is substantial — and what the industry claims they can do, which is substantially more.
|
AUTHOR: A specific CorralData example of the demo-to-production gap — a moment where AI-generated code looked right but failed in a healthcare context, or a process you built to catch what AI misses. The healthcare vertical makes this particularly vivid: a security vulnerability in a B2B analytics platform handling patient data is not a weekend-project inconvenience. |
The Starting-vs-Scaling Paradox
The evidence assembles into a paradox that no single source has canonically named, though Ahmad and colleagues at Lappeenranta-Lahti University of Technology come closest with their concept of the "flow-debt trade-off": "seamless code generation occurs, leading to the accumulation of technical debt through architectural inconsistencies, security vulnerabilities, and increased maintenance overhead."[19]
AI compresses the zero-to-one phase. The YC data confirms it: companies reaching $10 million in revenue with teams smaller than anything the previous generation thought possible. A junior front-end developer with no backend experience built an inventory management tool with role-based authentication and stock alerts in three working days using AI tools.[20] Sam Altman predicted in 2023 that AI would enable the first one-person billion-dollar company.[5] The prediction has not yet been realised, but the trendline is unmistakable: the minimum viable team is shrinking.
But AI also complicates the one-to-hundred phase. The code that ships fast accumulates debt fast. The GitClear data — refactoring at historic lows, duplication at historic highs — describes a codebase that is growing without being maintained. The DORA finding — a 7.2% drop in delivery stability with each 25% increase in AI adoption — describes organisations that are shipping faster but breaking more.[7] Kin Lane, who has worked in technology for 35 years, wrote at LeadDev: "I don’t think I have ever seen so much technical debt being created in such a short period of time."[21] Ox Security’s research team framed it precisely: "Traditional technical debt accumulates linearly. AI technical debt compounds."[22]
The paradox creates a specific implication for the CTO role. If non-technical founders can use AI tools to build a working prototype — and they can — then the question of when a company needs a CTO shifts. The CTO is no longer essential for the first version. The CTO becomes essential at the moment when the first version needs to become a real system: when it needs to handle production traffic, pass a security audit, survive a compliance review, or scale beyond the load that the prototype architecture can support. That transition — from demo to production, from prototype to system — is where deep technical leadership becomes irreplaceable.
The timing of that transition is compressing. A YC company can go from idea to $10 million in revenue in months rather than years. The moment when the codebase needs serious architectural attention arrives faster because the company is growing faster. And the codebase that needs attention is larger and less well-understood than it would have been if humans had written every line. The CTO hired at this stage inherits not just the technical debt that every startup accumulates, but a new category of debt: AI-generated code that works but that nobody on the team fully understands, that was never reviewed against production standards, and that may contain the security vulnerabilities that three independent research teams have documented.
The Leonis Capital AI 100, an analysis of the top AI startups of 2025 drawn from a dataset of more than 10,000 companies, provides the empirical counterpoint to the "CTOs are obsolete" narrative. Eighty-six percent of the founders of the top AI startups are technical — compared to 59% in the pre-AI Unicorn Club era. Eighty-two of the 100 companies are led by technical CEOs, compared to 49% historically. Fifty-eight percent of founding teams include at least one researcher, compared to 12% previously.[23] Technical depth matters more in the AI era, not less. The nature of the depth has changed.
Meri Williams, CTO of Pleo and formerly CTO of Monzo, described the fork: "In the next 10-15 years I’m either going to be a CTO going around cleaning up after AI, or maybe they’re not going to need people like me anymore because we’re just going to write the specs and generate the whole app from scratch every time."[24] Her assessment of the current state: "Nobody really knows how they’re going to scale it. Greenfield projects are easier, and AI can help there. But once your codebase is mostly AI-generated, maintainability and scalability become real challenges, and those are not problems LLMs are great at solving."[24]
|
AUTHOR: CorralData’s position in this paradox — as a company building AI-powered products (the NL-to-SQL copilot) while also using AI tools to build them — is a uniquely valuable perspective. Where has AI compressed your development cycle? Where have you hit the scaling wall that the prototype did not anticipate? |
New Competencies for the AI-Era CTO
The CTO role has always evolved faster than any job description can capture. The AI era introduces four competency areas that did not exist — or were not mainstream — three years ago.
Spec-driven development. The most consequential shift in the CTO’s daily practice is the move from writing code to writing specifications. GitHub published a formal framework for spec-driven development in 2025: "Instead of coding first and writing docs later, in spec-driven development, you start with a spec. This is a contract for how your code should behave and becomes the source of truth your tools and AI agents use to generate, test, and validate code."[25] Birgitta Böckeler, writing at martinfowler.com, identifies three maturity levels: spec-first (write the spec before generating code), spec-anchored (the spec constrains generation), and spec-as-source (humans work exclusively at the spec level and never touch generated code).[26] AWS launched Kiro, an IDE built around the spec-driven model, arguing that "working at the specification level allows programmers to move faster, and spend more time thinking about the things that really matter."[27]
The implication for the CTO is concrete. The skill that matters most is no longer writing elegant code. It is writing precise specifications — clear enough that an AI system produces the right behaviour, constrained enough that it does not produce the wrong behaviour, and testable enough that the gap between intention and output can be measured. This is not a lesser skill than coding. It is a different skill, and it draws on the same deep architectural judgment that good system design has always required.
AI evaluation. Eugene Yan, a senior applied scientist who has written the most comprehensive practitioner corpus on building LLM-based systems, states the principle directly: "How important evals are to the team is a major differentiator between folks rushing out hot garbage and those seriously building products in the space."[28] Anthropic’s engineering team formalised this as "eval-driven development": "Build evals to define planned capabilities before agents can fulfill them, then iterate until the agent performs well."[29] The CTO who cannot evaluate whether an AI system is producing correct, safe, and useful outputs is flying blind — and in a regulated context, flying blind is a compliance violation.
Agentic orchestration. The design patterns for AI agent systems were formalised in late 2024 and early 2025 by every major cloud provider. Anthropic’s "Building Effective Agents" guide identifies the key principle: "When building applications with LLMs, we recommend finding the simplest solution possible, and only increasing complexity when needed. This might mean not building agentic systems at all."[30] The CTO competency here is not implementing agents — that will increasingly be handled by frameworks and platforms. The competency is knowing when an agent architecture is warranted and when a simpler approach (prompt chaining, retrieval-augmented generation, or even a well-designed API call) will produce better results with lower risk.
The amplifier mindset. The DORA amplifier thesis — AI makes strong teams stronger and struggling teams worse — means the CTO’s most important AI-era competency may not be technical at all. It is organisational: building the team, the testing culture, the review processes, and the architectural foundations that make AI adoption productive. Will Larson, CTO of Imprint, described his AI adoption approach: "My biggest fear for AI adoption is that companies can focus on creating the impression of adopting AI, rather than focusing on creating additional productivity."[31] His framework at Imprint has three pillars: remove obstacles to adoption (tooling, access, policy), identify opportunities everywhere (not just engineering), and ensure senior leadership uses the tools themselves.[31]
|
AUTHOR: Your specific competency development — how your work with LangGraph and LangSmith maps to these four areas, what you’ve learned about evaluation in a text-to-SQL context, how progressive schema disclosure in the MCP server relates to spec-driven development. This section is where the working-CTO authority differentiates the chapter from a survey article. |
Build, Buy, or Prompt
Chapter 4 introduced the build-versus-buy framework: if it is core to your competitive advantage, build it; otherwise, buy it. AI adds a third option and changes the economics of all three.
The cost dynamics are moving fast enough that any specific figure will be outdated by the time this book reaches print, but the trajectory is durable. Epoch AI, an independent research institute, calculated that the price to achieve GPT-4-level performance on doctoral-level science questions fell by a factor of 40 per year — from roughly $20 per million tokens in late 2022 to about $0.40 by early 2025.[32] That rate of cost decline reshapes every calculation a CTO makes about when to call an API, when to self-host a model, and when to build a custom capability.
The decision framework for the AI era has three tiers.
Default: call the API. For the vast majority of AI features at the pre-seed to Series B stage, the correct choice is to use a hosted API from a frontier model provider. The marginal cost is low, the capability is high, the maintenance burden is zero, and the time to integration is measured in days rather than months. The platform risk is real — the provider can change pricing, rate-limit access, or deprecate a model — but for a startup, platform risk is secondary to market risk. If the company does not survive long enough to be affected by vendor lock-in, the lock-in was irrelevant. Andrew Chen of Andreessen Horowitz argues the historical analogy: every Web 2.0 SaaS product was a "database wrapper" in the same way that current AI products are "GPT wrappers." Value came not from the underlying technology but from distribution, workflow integration, and network effects.[33] The macro context matters: David Cahn at Sequoia estimated in 2024 that AI infrastructure investment was outpacing AI application revenue by hundreds of billions of dollars — what he called the "$600 billion question."[34] That investment produces cheaper, more capable APIs for the application-layer CTO to consume.
Middle path: fine-tune on your data. When the startup has a data advantage — proprietary datasets, domain-specific knowledge, customer interaction logs that no competitor possesses — fine-tuning an open-source model on that data can produce results that a general-purpose API cannot match, at a fraction of the inference cost. Hugo Debes at Artefact estimated that self-hosted inference becomes cost-competitive above roughly 8,000 conversations per day.[35] Below that threshold, the API is cheaper. Above it, the economics shift. The Epoch AI data — inference costs falling roughly 40-fold per year — means this threshold is a moving target; what justified self-hosting last year may be cheaper via API this year.[32]
The privacy argument often drives the decision before the cost argument does. A healthcare B2B platform processing patient data, a legal AI handling privileged documents, a financial services product generating personalised advice — these are contexts where sending data to a third-party API creates compliance risk that no cost savings can justify. Self-hosting eliminates that risk but introduces operational complexity: model serving infrastructure, version management, monitoring, and the engineering capacity to maintain it. The CTO must weigh both sides honestly. The compliance risk of the API is concrete and measurable. The operational risk of maintaining inference infrastructure is real and ongoing.
Build from scratch: almost never. Training a frontier-class model requires $100 million or more per run and a team of researchers that most startups cannot hire. For application-layer startups, building a foundation model is the wrong use of capital and talent. The rare exception is a company whose core product is itself a model — and even those companies increasingly fine-tune existing open-source architectures rather than training from scratch.
The "prompt" option — using AI coding tools to generate a capability rather than building or buying it — collapses the build-versus-buy decision for certain categories of internal tooling. An authentication flow, a data transformation pipeline, an admin dashboard — these are features that once required either building from scratch or integrating a third-party service. An AI coding assistant can generate a functional version in hours. The CTO’s job is to evaluate whether the generated version meets production standards — which returns the argument to the demo-to-production gap and the evaluation competency described earlier.
|
AUTHOR: CorralData’s specific build-vs-buy-vs-prompt decisions — why you chose LangGraph over building your own orchestration layer, when you’ve used local inference versus API calls, and how the healthcare compliance context shaped those decisions. The reader needs a worked example from a real company, not just a framework. |
Presenting AI Strategy to Your Board
The business acumen gap from Chapter 10 does not close when the CTO learns to translate engineering work into business language. It reopens every time a new technology wave changes the vocabulary. AI is the current wave, and the board dynamics it creates are specific.
McKinsey surveyed directors from 75 boards and found that 66% report "limited to no knowledge or experience" with AI. Nearly one in three said AI does not appear on their board agendas at all.[36] At the same time, a Dataiku survey of CEOs found that 74% fear losing their jobs if they cannot demonstrate AI progress. The pressure originates from boards themselves: 66% of CEOs said the demand comes from their directors. The survey’s most revealing finding: CEOs admitted that approximately one-third of their AI projects "are more or less fake — they are not actually delivering what we claim they do."[37]
The CTO occupies a uniquely uncomfortable position in this dynamic. The board wants to hear that the company is adopting AI. The CEO is under pressure to demonstrate AI progress. The CTO knows that most of the hype is disconnected from the operational reality. BCG’s 2025 global survey found that 60% of companies generate no material value from AI despite significant investment, and only 5% achieve value at scale.[38] McKinsey’s data is comparable: less than 5% of EBIT comes from AI for most organisations.[39]
The translation framework from Chapter 10 applies directly. The board does not need to understand agentic orchestration or retrieval-augmented generation. The board needs to understand three things: where AI is creating measurable value in the company’s operations, what the risks are (security, compliance, vendor dependency), and what the investment plan looks like for the next 12 months. Larson’s advice is practical: focus on creating actual productivity rather than the impression of AI adoption.[31] Stephan Schmidt warns against the most common mistake: "Too many of the CTOs I meet treat AI as just another technology like cloud, to decrease cost and increase efficiency. AI is not. AI is disruptive and will change all software development around it."[40]
The CTO who presents AI to their board with neither hype nor dismissal — who can articulate what the company is doing, why, and what it costs — is demonstrating exactly the business acumen that Chapter 10 argued is the difference between the CTOs who survive and the ones who are replaced. The AI conversation is a business acumen test, and most CTOs are taking it without preparation.
|
AUTHOR: How you present CorralData’s AI strategy — the NL-to-SQL copilot, the AI-powered dashboards — to your board. What questions do they ask? What misconceptions do you have to correct? The healthcare B2B context adds a layer: the board may be simultaneously excited about AI and nervous about patient data. |
The paradox resolves into a job description. The CTO who thrives in the compression era is not the one who adopts every new tool. It is the one who understands what AI compresses (the time from idea to prototype), what it does not compress (the judgment required to turn a prototype into a production system), and what it amplifies (both the strengths and the weaknesses of the team that uses it).
The tiny teams are real. The speed is real. The $40 million product built by 12 people is real. But so is the 45% vulnerability rate, the 19% slowdown for experienced developers who do not review what the machine produces, and the 60% of companies that have generated no material value from their AI investments. The CTO’s job is to hold both realities in the same frame — and to build the team, the processes, and the architecture that turn the compression into a durable advantage rather than a fast-moving liability.
Chapter 13 addresses what happens when the CTO succeeds at everything this book has described — the building, the shipping, the communicating, the AI integration — and discovers that the role has changed so completely that they no longer recognise the job they signed up for. The day the CTO stops writing code is not a failure. It is a graduation. But it is also a loss, and managing that loss is the next skill to learn.