Chapter 4: Your First Technical Decisions Are Business Decisions
It seems that perfection is attained not when there is nothing more to add, but when there is nothing more to take away.
Terre des hommes, 1939
In October 2010, Instagram launched with two engineers and a single server in Los Angeles. The stack was Django, PostgreSQL, Redis, and nginx — technology so established that no one at a conference would ask a follow-up question about any of it. Mike Krieger, Instagram’s co-founder and technical leader, had a phrase for the philosophy behind these choices: "Do the simple thing first."[1]
Eight years later, Instagram had more than 450 engineers, one billion monthly active users, and had migrated from Amazon’s cloud to Facebook’s data centres. The engineering team had replaced the task queue (Gearman to Celery and RabbitMQ), added Cassandra alongside PostgreSQL, rebuilt the search system four times, and completed a migration from Python 2 to Python 3 that produced a 12% CPU savings and 30% memory improvement.[2] What they had not replaced was the fundamental architecture. Django and PostgreSQL — the boring choices from day one — were still running the product. Krieger’s description of the journey: "The users are still in the same car they were in at the beginning of the journey, but we’ve swapped out every single part without them noticing."[3]
The Instagram stack was not an engineering decision. It was a business decision. Krieger did not choose Django because it was the best web framework. He chose it because it was well-understood, operationally quiet, and let a two-person team focus on the product rather than the infrastructure. Every hour not spent debugging an unfamiliar database was an hour spent building the feature that would determine whether the company survived. The stack choice was a bet on what mattered most — speed to learning — and it paid off for eight years.
This chapter is about making those bets well. Stack selection, MVP architecture, build-versus-buy — these feel like engineering decisions, and they are. But they are business decisions first, and the CTO who treats them as purely technical will make choices that are elegant, well-reasoned, and wrong for the company.
Every Stack Choice Is a Bet
Dan McKinley, an engineer at Etsy during the period when the company scaled from a struggling marketplace to an IPO-track business, published an essay in 2015 called "Choose Boring Technology." It became the most widely cited piece of engineering strategy advice in the startup world, and for good reason: it named something that experienced engineers knew intuitively but had never formalised.[4]
McKinley’s central argument is that every organisation has a finite capacity for novelty — what he calls "innovation tokens." A startup might have three. Spend one on an unfamiliar database, another on a new programming language, a third on a deployment tool that has existed for less than a year, and you have used your entire budget for the unexpected before writing a line of product code. "If you choose to write your own database," McKinley writes, "oh god, you’re in trouble."[4]
The cost is not the technology itself. It is the unknown unknowns. A known unknown — what happens when this database hits 100% CPU — can be researched. An unknown unknown — a garbage-collection pause triggered by a write pattern nobody anticipated — cannot be researched because nobody knows to ask the question. For mature technology, the set of unknown unknowns is small and shrinking. For new technology, it is large and growing. McKinley’s formulation: "’`Boring’ should not be conflated with ‘bad.’ There is technology out there that is both boring and bad. You should not use any of that. But there are many choices of technology that are boring and good, or at least good enough.`"[4]
McKinley credits this philosophy to Kellan Elliott-McCrea, who was CTO of Etsy during the scaling period that Chapter 6 described. Elliott-McCrea’s influence on Etsy’s engineering culture was the origin of the "boring technology" tradition — the recognition that operational predictability is an engineering asset, not an engineering compromise.[5] Etsy’s early years provided the cautionary tale. The company hired Python programmers and then searched for something to give them to do in Python, which led to a pointless middleware layer that took years to remove. Meanwhile, search latency was running at two minutes at the 90th percentile.[4] The technology choice created the problem. The business paid the cost.
Krieger’s Instagram embodied McKinley’s framework before McKinley wrote it. The philosophy — choose technology whose failure modes you understand, spend your innovation budget on the product rather than the infrastructure — produced a specific set of decisions. PostgreSQL for the core data store, because its behaviour under load was well-documented and its scaling patterns (connection pooling via Pgbouncer, eventual sharding) were proven. Redis for caching and real-time operations, because it was fast, simple, and the team understood it. Django for the web layer, because it favoured pragmatism and let the team hire easily. Krieger articulated the reasoning on the Instagram engineering blog: "A large part of how we’ve been able to scale Instagram with very few engineers is by choosing simple, easy-to-understand solutions that we trust."[6]
The phrase "solutions that we trust" is doing the important work in that sentence. Trust, in an engineering context, means you know how the technology will break. You have seen its failure modes. You can diagnose a problem at 3 a.m. without reading the source code. That trust is not a property of the technology. It is a property of the team’s experience with the technology. A brilliant database that nobody on your team has run in production is not trustworthy. A mediocre database that your team has operated for three years is.
Krieger also warned about a subtler trap: coupling your hiring strategy to your technology stack. He recalled advice from Kevin Rose, founder of Digg: "one of the biggest mistakes they made early on was recruiting engineers who were too finely matched to the technology that they were using." The problem: "if people tie their own job security to, say, staying on PHP, you’ll end up making the wrong technology choices."[7] The technology should serve the business. When the team’s identity becomes inseparable from the technology, the technology starts dictating the business — and the CTO loses the ability to make the pragmatic changes that scaling will eventually demand.
This is where the business lens sharpens the engineering decision. The CTO choosing a stack at the founding stage is not optimising for architectural elegance. They are optimising for the speed at which the team can build, ship, and learn — because every week spent debugging infrastructure is a week not spent discovering whether the product has a market. Krieger, looking back, described the discipline: "The goal is not to set up Nagios or Munin. The goal is to ship software so that you can get people using it."[7]
Benjamin Jordan, CTO of Big Run Studios, a mobile game developer, offers a useful provocation against taking this framework too seriously. In a deliberately self-undermining essay, Jordan argues that CTO expertise on technology selection is largely survivorship bias: "I assume I know what I’m talking about simply because of a small modicum of success." His claim: "a dartboard is all you need to determine a starting point for your tech, because we are forced to consider that, just perhaps, initial tech choices don’t matter at all."[8] He points to Google running Java on mobile hardware, Zynga building an empire on Flash, and Facebook pushing PHP beyond any reasonable limit. People figure things out. Almost any software can evolve into any other if the team is strong enough.
Jordan’s argument is not quite what it appears. He is not saying technology choices are meaningless. He is saying that the workflow — iteration speed, learning culture, willingness to replace what does not work — matters more than the starting point. "If you add five seconds to your iterative cycle, you’ve just shot yourself in the foot. Think about multiplying that five seconds by the hundred times per day that you need to do it."[8] This is actually McKinley’s point stated from the other direction: boring technology is valuable not because it is boring but because it minimises friction in the iteration cycle. The starting technology matters less than most CTOs believe. The speed at which the team can iterate on that technology matters more than almost anyone admits.
|
AUTHOR: The CorralData stack choice belongs here — FastAPI, LangGraph, LangSmith, and the decision to build on Python in a healthcare B2B context. What drove the choice? Was it the team’s familiarity, the library availability, the hiring pool? Was there a moment where a more technically ambitious choice was available and the author chose the pragmatic one? The reader needs to see this philosophy applied in the author’s own context. |
The MVP Is Supposed to Be Wrong
The Instagram stack survived because it was boring. But the Instagram product survived because it was incomplete. The first version of Instagram did not support video — even though the technology existed — because uploads were slow, they failed more often than they succeeded, and the experience did not meet Krieger’s standard. "We chose not to in the first version," he told Fast Company in 2018, "because we couldn’t deliver a great experience."[9] The decision to exclude video was a product decision, not an engineering one. It was also a business decision: every feature that delays launch is a feature that delays learning.
Reid Hoffman, co-founder of LinkedIn, coined the formulation that has become startup scripture: "If you’re not embarrassed by the first version of your product, you’ve launched too late."[10] The phrase is older than most people realise — Hoffman says he first used it "more than a decade" before his 2017 essay, placing it somewhere around 2005–2007. His explanation of the logic is more nuanced than the soundbite suggests. Three themes drive it: the importance of speed, the certainty that your assumptions about customers are wrong, and the cost of delaying the feedback loop. "If you’re willing to be embarrassed," Hoffman writes, "you gain speed and flexibility."[10] Eric Ries, who developed the Lean Startup methodology, offered Hoffman a corollary: "No matter how long you wait to release your first version, you will be embarrassed by it."[10]
The MVP paradox is that building the wrong thing — deliberately, knowingly — is often the correct business decision. The first version of Dropbox was not a product. It was a three-minute screencast. Drew Houston, Dropbox’s founder, could not demonstrate a working prototype because the product required solving hard technical problems in file synchronisation that had not been solved yet. Instead, he recorded a video showing what the product would do. The video was targeted at the Digg community and was full of inside jokes and Easter eggs visible only to that audience. Overnight, the beta waiting list went from 5,000 to 75,000.[11] Houston did not build the product to test the hypothesis. He built a video. The video validated demand. Then he built the product.
Stripe’s founding had the same structure. Patrick and John Collison built an API that reduced payment integration from weeks of work to seven lines of code. But the backend was not automated. When someone signed up, Patrick would call a friend who would manually create a merchant account. The product that felt instantaneous to the developer was, behind the scenes, a manual process held together by phone calls.[12] Paul Graham, who funded Stripe through Y Combinator, gave this technique a name: the "Collison installation." When anyone agreed to try Stripe, the Collisons would say "Right then, give me your laptop" and set the user up on the spot.[13] The technical sophistication of the product was not the point. The speed of getting it into someone’s hands was.
Nick Swinmurn tested the Zappos hypothesis — that people would buy shoes online — without building a supply chain. He photographed shoes at local stores, posted the images on a website, and when someone ordered, he bought the shoes at full price and shipped them himself.[14] The unit economics were terrible. The learning was immediate: people would pay for shoes they had not tried on.
Martin Fowler, who has spent decades advising organisations on software architecture, makes the strongest architectural case for deliberate technical incorrectness. In his 2015 essay "MonolithFirst," he observes that "almost all the successful microservice stories have started with a monolith that got too big and was broken up. Almost all the cases where I’ve heard of a system that was built as a microservice system from scratch, it has ended up in serious trouble." His recommendation: "Don’t be afraid of building a monolith that you will discard."[15] The monolith is architecturally wrong for the company you hope to become. It is architecturally right for the company you are now — because it lets you ship faster, learn faster, and postpone the expensive architectural decisions until you have the information to make them well.
Emmett Shear’s experience building Twitch illustrates what happens when the MVP reveals something the founders did not expect. Justin.tv, the predecessor product, had more than 20 million monthly active users but was not growing. A venture capitalist visited the office and delivered a blunt assessment: “You guys aren’t growing, and on the Internet, not growing is dying. So your business is totally f***ed.”[16] The gaming section of Justin.tv — roughly 400,000 monthly users, about 2% of the total — was the only part of the product Shear was interested in running. He spun it out as Twitch. Within months, it was growing 30–40% per month. "Product/market fit," Shear observed, "is a little like falling in love. If you have to ask, don’t worry, it will be clear."[16] The pivot was possible because the existing infrastructure could be repurposed. The MVP for Twitch was not a new product. It was a re-framing of an existing one.
The common thread across these examples is that the MVP’s job is not to demonstrate engineering capability. Its job is to produce learning. The learning that Dropbox’s video produced (75,000 signups) was worth more than any prototype could have delivered in the same timeframe. The learning that Stripe’s manual backend produced (developers will integrate payments if it takes seven lines of code) was worth more than an automated system that took six months longer to build. The CTO who delays launch to build the "right" architecture is spending the company’s money on a hypothesis about infrastructure when the company’s survival depends on a hypothesis about the market.
|
AUTHOR: A CorralData MVP decision belongs here — a moment where the team shipped something technically incomplete to learn from customers. Healthcare B2B has specific constraints (compliance, security, data handling) that complicate the MVP philosophy. The reader needs to see how "build the wrong thing first" works in a regulated context where certain things cannot be wrong. |
Build What’s Core, Buy Everything Else
Joel Spolsky, co-founder of Stack Overflow and Fog Creek Software, published two essays in the early 2000s that remain the clearest framework for the build-versus-buy decision. The first, "Things You Should Never Do" (2000), argues that rewriting working software from scratch is "the single worst strategic mistake that any software company can make." Old code, Spolsky writes, has been used, tested, and debugged. The odd formatting and unexplained conditionals are not signs of sloppiness — "those are bug fixes." Throwing that code away means throwing away years of accumulated knowledge and giving competitors a two-to-three-year head start.[17]
The second essay, "In Defense of Not-Invented-Here Syndrome" (2001), provides the bright-line rule: "If it’s a core business function — do it yourself, no matter what." The corollary: everything that is not core should be bought, rented, or borrowed. "Pick your core business competencies and goals, and do those in house. If you’re a software company, writing excellent code is how you’re going to succeed. Go ahead and outsource the company cafeteria."[18] Spolsky’s example: if you are building a game and cool 3D effects are your competitive advantage, write your own 3D engine. If the plot is your advantage, use someone else’s engine and focus on the story.
The question the CTO must answer is not "can we build this?" The question is "is building this the best use of our engineering capacity?" Krieger described the discipline at early Instagram: "’`We could figure out how to do our own push notifications. But Urban Airship is right here.’ Put pride aside and keep your eye on your real goal.`"[7] Every hour an engineer spends building authentication, analytics, or deployment infrastructure is an hour not spent on the product that differentiates the company. Peter Reinhardt, co-founder and CEO of Segment, a customer data platform, quantified the risk: "Say you have three engineers and one of them is working on building your analytics tool. Your ability to develop your product is now cut by 30% so you need 30% more runway."[19]
The decision changes as the company grows, and the CTO must recalibrate the framework at each stage. At the Coder stage — a team of one to five — the default should be to buy almost everything that is not the core product. Authentication, analytics, monitoring, email delivery, payment processing: none of these differentiate your product, and each of them will consume engineering weeks that you do not have. Krieger at early Instagram: "`Early on, we thought, ‘Well, we could figure out how to do our own push notifications. But Urban Airship is right here.’`"[7] He described the temptation to over-build as "yak shaving" — spending hours setting up Nagios when a simpler alerting tool could be running in minutes. "`Finally I was like, ‘I’ve got to get back to building the product.’`"[7]
At the Manager stage, some infrastructure investments begin to pay for themselves. A CI/CD pipeline, an internal tool for a repeated workflow, a monitoring stack tailored to your specific failure modes — these are worth building because the team will use them hundreds of times. But the test remains the same: is this core? At the Director stage, the calculus shifts again. The company may have enough scale that owning infrastructure becomes a competitive advantage rather than a distraction. The classic example is Dropbox, which saved nearly $75 million over two years by migrating from AWS to custom infrastructure — but Dropbox’s product is storage infrastructure. Owning it was the core competency, not a diversion from it.[20]
David Heinemeier Hansson, creator of Ruby on Rails and co-founder of 37signals, provides the most detailed recent case study of a build-versus-buy reversal. In 2022, 37signals was paying more than $3.2 million per year for cloud services. DHH’s assessment: "Renting computers is mostly a bad deal for medium-sized companies like ours with stable growth."[21] The company invested approximately $600,000 in Dell servers and migrated off the cloud entirely. Post-migration infrastructure costs dropped to roughly $360,000 per year — an annual savings of nearly $2.9 million, with projected savings of more than $7 million over five years.[22]
The 37signals exit was the right decision for 37signals. It is not the right decision for most startups, and DHH himself identified the conditions. The cloud excels at two ends of the spectrum: very simple, low-traffic applications where operational simplicity genuinely saves money, and highly irregular workloads where burst capacity matters. For medium-sized companies with predictable, stable loads, the economics favour owning hardware.[21] Most pre-Series B startups are not in that position. They are uncertain about whether the product will exist in six months, and the cloud premium is an insurance policy against the need to think about infrastructure at all. The CTO who optimises hosting costs before finding product-market fit is solving the wrong problem.
There is a subtlety that Spolsky’s bright-line rule does not capture: the line between "core" and "not core" moves. A feature that is not core at founding may become core at scale. Analytics was not core for Segment’s customers — until it was, and Reinhardt built a company around that realisation. Conversely, a feature that feels core at founding — a custom deployment pipeline, a bespoke data pipeline — may turn out to be commodity infrastructure that a third party can maintain better than you can. The CTO must revisit the build-versus-buy decision at every stage, not just at founding. Chapter 12 will revisit it again in the AI context, where a third option — prompt — is collapsing the cost of building certain categories of software to near zero.
The build-versus-buy framework introduced here will recur throughout the book. For now, the principle is simple. Ask: is this where our engineering capacity creates the most value? If the answer is yes, build it. If the answer is no, buy it, integrate it, and move on. The CTO’s job is not to build everything. It is to build the right things.
|
AUTHOR: A CorralData build-versus-buy decision — something the team chose to build that could have been bought, or something they bought that a more technically ambitious CTO might have built. The LangGraph/LangSmith choice is a natural candidate: why use LangChain’s tooling rather than building a custom orchestration layer? What was the business reasoning? |
The Reversibility Test
The decisions described so far — stack selection, MVP scope, build versus buy — share a property that most engineering discussions ignore: some of them can be undone and some cannot. A practical framework for early technical decisions must account for this difference, because the cost of a wrong decision depends entirely on how hard it is to reverse.
Jeff Bezos introduced the clearest formulation of this principle in his 2015 letter to Amazon shareholders. He distinguished between two types of decisions. Type 1 decisions are "consequential and irreversible or nearly irreversible — one-way doors." These "must be made methodically, carefully, slowly, with great deliberation and consultation." Type 2 decisions are "changeable, reversible — two-way doors." These "can and should be made quickly by high judgment individuals or small groups."[23]
Bezos’s concern was not about technology. It was about organisational speed. "As organizations get larger," he wrote, "there seems to be a tendency to use the heavy-weight Type 1 decision-making process on most decisions, including many Type 2 decisions. The end result of this is slowness, unthoughtful risk aversion, failure to experiment sufficiently, and consequently diminished invention."[23] He added a footnote that is often overlooked: "Any companies that habitually use the light-weight Type 2 decision-making process to make Type 1 decisions go extinct before they get large."[23] The framework cuts both ways. Treating reversible decisions as irreversible kills speed. Treating irreversible decisions as reversible kills the company.
Kent Beck, writing about his experience at Facebook in 2015, applied the framework directly to software engineering. He identified irreversibility as one of four sources of system complexity — alongside states, interdependencies, and uncertainty. His formulation: "When the effects of decisions can’t be predicted and they can’t be easily undone, decisions grow prohibitively expensive." The insight for engineers: "Irreversibility is absurd in the Fordist assembly line world but it’s at least possible for computer systems." Software decisions can often be made reversible by design.[24]
Beck’s distinction is practical: "Feature changes are irreversible, structure changes are reversible." A feature shipped to users creates expectations, generates data, and establishes contracts that are expensive to break. A structural change to the codebase — refactoring a module, extracting a service, reorganising a directory — can be undone without affecting the user. The CTO who treats structural decisions with the same gravity as feature decisions is wasting deliberation on two-way doors.[24]
Martin Fowler, writing four years earlier, reached the same conclusion from a different direction. In his essay "Is Design Dead?" he argued that designers "need to think about how they can avoid irreversibility in their decisions. Rather than trying to get the right decision now, look for a way to either put off the decision until later — when you’ll have more information — or make the decision in such a way that you’ll be able to reverse it later without too much difficulty."[25] Mary and Tom Poppendieck, drawing on lean manufacturing principles, formalised this as the "last responsible moment": schedule irreversible decisions for as late as possible, and use the intervening time to learn.[26]
Applied to the decisions this chapter covers, the reversibility test produces clear guidance.
Your database is close to a one-way door. Migrating from one relational database to another is possible but expensive. Migrating from a relational database to a document store — or vice versa — is a multi-quarter project that will consume engineering capacity and introduce risk. Choose your primary database carefully. This is a Type 1 decision.
Your programming language is a heavy two-way door. Rewriting a codebase in a new language is painful but not impossible — Instagram migrated from Python 2 to Python 3 while serving hundreds of millions of users. But the hiring pool, the available libraries, and the team’s accumulated expertise are all tied to the language. Introducing a new language is reversible in theory and expensive in practice. Treat it as a Type 1 decision at founding; treat additions of new languages as a careful Type 2 decision later.
Your API framework is a two-way door. Swapping a web framework — Django for FastAPI, Express for Koa — is a contained change that affects the team’s development workflow but not the product’s external behaviour. This is a Type 2 decision. Make it quickly.
Your cloud provider can be made reversible by design. If you build on Kubernetes with minimal use of vendor-specific services, moving from AWS to GCP is a logistics problem, not an architecture problem. If you build on AWS Lambda with DynamoDB and SQS, you have made the cloud choice nearly irreversible. The decision is not which cloud to use. The decision is how tightly to couple to it.
Your third-party integrations are two-way doors — as long as you put an abstraction layer between your code and the vendor’s API. Without that layer, every vendor dependency is a one-way door. With it, swapping vendors is a configuration change.
The practical framework for the CTO at the Coder stage: make Type 1 decisions (database, primary language, data model) carefully and early. Make Type 2 decisions (framework, tooling, third-party services) quickly and cheaply. Design for reversibility wherever possible — abstractions over vendor APIs, infrastructure as code, configuration over hardcoding. And when you are unsure whether a decision is Type 1 or Type 2, default to treating it as Type 2 and shipping. The cost of a wrong reversible decision is low. The cost of deliberating a reversible decision as if it were irreversible is the time you did not spend learning.
|
AUTHOR: A specific CorralData example of the reversibility test in action — a decision that felt irreversible but turned out not to be, or a decision that was treated as reversible and later turned out to have lasting consequences. The healthcare B2B context adds a layer: compliance and data architecture decisions may be less reversible than they appear. |
Technology Evaluation Is Resource Allocation
The previous sections describe the founding CTO’s first decisions. But the decisions do not stop. Every quarter brings new frameworks, new infrastructure services, new AI tools, and new arguments from the team about why the current stack is inadequate. The CTO who cannot evaluate new technology systematically will either adopt everything — and drown in operational complexity — or adopt nothing — and fall behind. Both failure modes are common. Both are avoidable.
McKinley’s "boring technology" thesis, already cited in this chapter, contains a metaphor that deserves its own weight: the innovation token. "Let’s say every company gets about three innovation tokens," McKinley writes. "You can spend these however you want, but the supply is fixed for a long while."[4] If you spend a token on a novel database, you cannot also spend one on a novel deployment pipeline and a novel frontend framework without exhausting the team’s capacity to absorb operational novelty. The question is not whether the new technology is better. The question is whether it is better enough to justify spending one of your finite tokens — and whether the problem it solves is the problem most worth solving with scarce innovation budget.
Kellan Elliott-McCrea, whose tenure as CTO of Etsy gave rise to the philosophy McKinley codified, published the most actionable evaluation framework available: eight questions to ask before introducing any new technology.[27] The first two are the ones most CTOs skip: "What problem are we trying to solve?" and "How could we solve the problem with our current tech stack?" If the answer to the second question is "we could, it would just be ugly" — that is almost always the right path. The ugliness is operational familiarity. The beautiful alternative is operational novelty, and operational novelty is where outages come from.
Elliott-McCrea’s fifth question deserves particular attention: "Will this solution kill and eat the solution that it replaces?"[27] If the answer is no — if the old system and the new system will coexist indefinitely — then the new technology has not simplified the stack. It has doubled it. Every technology the CTO adds without removing its predecessor increases the surface area the team must understand, monitor, and debug at three in the morning. The CTO who introduces Kafka alongside RabbitMQ, or Redis alongside Memcached, has not made a technology decision. They have made a complexity decision — and the complexity will compound.
The practical defence is the timebox. Before committing to any new technology, run a spike — a one-to-two-week experiment, scoped to a single use case, producing a decision rather than production code. The spike answers three questions: does this technology solve the problem we think it solves? What operational cost does it introduce? And can the team adopt it without dedicated training? If the spike fails on any of these, the CTO has spent a fortnight rather than a quarter. If it succeeds, the CTO has evidence rather than enthusiasm.
Vision Is Not a Document
The technology decisions described in this chapter — stack, data model, build versus buy, reversibility, security posture — are individual bets. Technical vision is the coherent thesis that connects them.
Will Larson, whose engineering strategy writing is the most rigorous in the practitioner literature, defines three layers: design specifications (individual decisions), engineering strategy (the principles that guide those decisions), and technical vision (where the technology and organisation should be in two to three years).[28] His method for producing all three is deliberately bottom-up: "Write five design documents, and pull the similarities out. That’s your engineering strategy. Write five engineering strategies, and forecast their implications two years into the future. That’s your engineering vision."[29] The vision is not handed down from the CTO’s imagination. It is synthesised from the decisions the team has already made — which means it describes reality rather than aspiration, and the team recognises themselves in it.
For the seed-stage CTO, the vision may be three sentences long: what the system must do in two years, what technical constraints it must operate within (HIPAA, real-time latency, multi-tenancy), and what the team must be capable of. It does not need to be a slide deck. It needs to be specific enough that when an engineer proposes adding a new technology, both the engineer and the CTO can point to the vision and ask: does this move us closer or further away?
Bryan Cantrill, CTO of Oxide Computer and the inventor of DTrace, offers the complementary frame. Where Larson builds vision from decisions, Cantrill builds it from values. His argument: every technology reflects the values of its creators — composability, debuggability, performance, simplicity — and the CTO’s job is to articulate which values the team’s technology should embody, then evaluate every technology choice against those values.[30] Oxide’s public "Requests for Discussion" repository is one of the few real examples of this in practice: every significant technical decision is written up, debated, and archived in a format the entire company can read.[31] The writing is the vision. The vision is not a separate artefact.
The CTO who has a clear technical vision — even a short one — can delegate technology evaluation to the team. The engineer who wants to introduce a new tool can evaluate it against the vision’s principles without requiring the CTO’s approval for every experiment. The CTO who has no vision must make every technology decision personally, which does not scale past the first five hires. Vision is not a luxury. It is the mechanism by which the CTO stops being a bottleneck.
|
AUTHOR: CorralData’s technical vision — even the informal version. Where do you want the system to be in two years? What are the principles that guide your technology choices (e.g., boring infrastructure, novel only where it creates competitive advantage like the NL-to-SQL copilot)? Have you written this down, and if so, where does the team find it? The reader at a five-person startup needs to see what a lightweight technical vision looks like in practice. |
Security and Compliance Are Architecture, Not Paperwork
There is a moment in the life of every B2B startup — usually during a sales call — when the CTO hears the question for the first time: "Do you have a SOC 2 report?" The conversation that follows determines whether the deal closes or stalls. And the CTO who has not thought about security and compliance as a first technical decision will discover, in that moment, that they have already made the decision — by default, badly, and in a way that will take months and tens of thousands of dollars to undo.
Antigoni Sinanis, the first operations hire at Kolide (an eight-person startup), describes the experience: "I soon received my first security questionnaire from a prospective customer. It had over 100 questions, was full of acronyms I couldn’t decipher, and asked for plans, policies, and procedures that I had never heard of. … And then I noticed something: most of these questionnaires started by asking if we had a compliance audit report, usually SOC 2."[32] Christina Gilbert, co-founder and CEO of OneSchema, a YC-backed data infrastructure company, saw the same pattern: "From the earliest days of our business, we heard loud and clear from customers that the lack of SOC 2 Type II certification would block them from doing business with us."[33]
The CTO who treats compliance as paperwork to be handled later will face a specific and expensive problem: retrofitting security controls into an architecture that was not designed for them. Gilbert’s testimony is the clearest cautionary tale: "If we hadn’t been aware of the infrastructure considerations around SOC 2, we would’ve had to re-architect our system to handle multi-tenancy with data isolation and retention. Instead, we had the right architecture in place from the get-go."[33] The architecture decision — how you isolate tenant data, how you handle encryption at rest, how you log access events — is made at the founding stage whether you intend it or not. The CTO who makes it consciously saves the company a quarter of re-engineering work. AWS’s official guidance puts it in institutional language: "Good security controls, data privacy, and data management should be foundational components of a SaaS application from the beginning."[34]
Building Secure by Default
Emily Choi-Greene, a YC founder with a security engineering background, captures the tension between the startup community and the security community: "The advice from my security community ('WAIT!') alongside the startup community ('RIGHT NOW!') broke my brain."[35] Both sides have a point. Pursuing SOC 2 certification before you have revenue or enterprise customers is premature spending. But building secure by default — encryption at rest, MFA enforced, audit logging from day one, least-privilege access controls, infrastructure as code — costs almost nothing in engineering time and saves everything later.
The practical minimum for a five-person startup, before any formal certification, is straightforward. The Minimum Viable Secure Product (MVSP) checklist — developed collaboratively by Google, Salesforce, Okta, and Slack, with CISA participation — provides 24 controls across four areas and is explicitly designed for startups not yet mature enough to afford a full compliance process.[36] The Center for Internet Security’s Implementation Group 1 (CIS IG1) defines 56 safeguards labelled "Essential Cyber Hygiene" for small organisations with limited resources.[37] Neither requires a budget. Both require discipline.
One architectural decision deserves specific attention because it affects every compliance framework you will ever pursue: permissioning. Enterprise customers will ask for role-based access control, attribute-based access control, team-level permissions, department-level permissions, customer-managed encryption keys, and permission structures you have not imagined yet. The permissioning system you build at the founding stage will be extended, patched, and cursed more than any other component of your application. Build it flexible from day one. Use a well-established authorisation model — RBAC at minimum, with the ability to layer ABAC on top — and abstract it behind a clean interface so that when the enterprise customer asks for a permission structure your current model does not support, the change is a configuration problem rather than an architecture problem. Permissioning is a one-way door disguised as a two-way door: the data model is easy to extend, but the assumptions baked into your access control logic spread throughout the codebase and become progressively harder to change.
|
AUTHOR: Your CorralData permissioning journey — how has the model evolved as you’ve moved into healthcare enterprise? What did you build initially, what did the first big customer ask for that you didn’t expect, and what would you do differently? |
The Compliance Ladder
Compliance certifications follow a predictable progression that maps to the startup’s growth stage. The CTO who understands this sequence can plan ahead rather than scramble.
Before revenue (Seed stage): Build secure by default. Follow MVSP and CIS IG1. Publish a vulnerability disclosure policy. Enforce HTTPS, MFA, and least-privilege access. Document your security controls informally. When the first enterprise prospect sends a security questionnaire, you will be able to answer most of the questions honestly — and the honesty matters more than the certification.
First enterprise deal (Seed to Series A): This is when SOC 2 becomes a business decision. SOC 2 Type I is a point-in-time assessment of whether your security controls are designed appropriately. Type II evaluates both design and operational effectiveness over a three-to-twelve-month observation window. Type II is what most enterprise buyers ultimately require. First-year cost for a startup ranges from $20,000 to $80,000 all-in, including audit fees ($8,000–$50,000), compliance automation platform ($7,500–$15,000 per year from vendors like Vanta or Drata), and 100–400 hours of internal team time.[38] Justin McCarthy, CTO of StrongDM, admits the learning curve: "I wildly underestimated the cost of our first SOC 2 audit — both in time and expense. I figured an auditor would come in for a few months, offer suggestions on how to improve, and then sign off. I could not have been more wrong."[39]
EU expansion (Series A to B): ISO 27001 is the default requirement for European enterprise procurement. There is approximately 80% overlap between ISO 27001 and SOC 2 controls, so pursuing the second framework after the first requires only 30–40% additional effort.[40] Venkat Rangan, CTO of Clari, led his company’s ISO 27001 certification as a personal priority: "Because we believe in the importance of selling, we know the value of sales data. As a co-founder and CTO, I took responsibility for leading the effort."[41] The decision framework is geographical: US customers want SOC 2; EU customers want ISO 27001; if you sell to both, you will eventually need both.
Healthcare and regulated industries (Series A onward): HIPAA does not have a formal certification. There is no HIPAA certificate to hang on the wall. Compliance is demonstrated through self-assessment, Business Associate Agreements (BAAs), and third-party audits.[42] This is both liberating and dangerous: liberating because you can achieve compliance incrementally without a massive upfront investment, dangerous because there is no external validation to tell you whether you have done enough. The CTO building a healthcare product must understand three things. First, if you handle Protected Health Information (PHI), you need a BAA with every subprocessor in your chain — your cloud provider, your database host, your analytics tool. AWS, Azure, and GCP all offer standard BAAs.[43] Second, the HIPAA Security Rule requires administrative, physical, and technical safeguards — but it is intentionally flexible and scalable, meaning the requirements for a five-person startup are different from the requirements for a hospital system.[44] Third, the consequences of non-compliance are not theoretical: HIPAA violation penalties range from $141 to over $2 million per violation category per year, and the Office for Civil Rights has collected $144.9 million in settlements and penalties to date.[45]
For the CTO at a healthcare startup who wants external validation beyond self-assessment, the HITRUST Common Security Framework (CSF) is the closest thing to a gold standard. HITRUST harmonises over 60 regulatory frameworks into a single certifiable structure. It now offers three tiers: e1 (44 foundational controls, approximately $35,000), i1 (roughly 182 controls, approximately $70,000), and r2 (comprehensive, risk-based, $100,000–$150,000 or more).[46] The pragmatic path for a healthcare SaaS startup: start with HIPAA self-assessment and BAAs, add SOC 2 when enterprise customers require it, consider HITRUST e1 when large health systems ask for it, and pursue full HITRUST r2 only when the deal size justifies the investment.
|
AUTHOR: Where CorralData sits on this ladder right now. Which certifications do you have, which are you pursuing, and what triggered each decision? The reader in healthcare SaaS needs to see the progression applied to a real company at their stage. |
Penetration Testing: The Question Every Customer Asks
Enterprise procurement teams will ask when you had your last penetration test. The question is so routine that it appears on virtually every security questionnaire, and the CTO who cannot answer it will watch deals slow to a crawl.
A penetration test is a structured attempt to break into your systems — a professional attacker simulating what a real attacker would do. It is not the same as automated vulnerability scanning, though the two are often confused. Automated scanning is cheap (many tools are free), continuous, and catches known vulnerabilities — outdated libraries, misconfigured headers, exposed ports. It should run in your CI/CD pipeline. Manual penetration testing is expensive ($5,000–$30,000 for a typical SaaS startup), periodic (annually is the standard cadence), and catches what scanners cannot: business logic flaws, access control bypasses, and context-dependent vulnerabilities that require a human to identify.[47] Caleb Mattingly, founder and CISSP at Secure Cloud Innovations, makes the practical point: "SOC 2 does not explicitly mandate penetration testing. But enterprise clients often request pen test results separately as part of their vendor security assessment process. Penetration testing is technically optional for SOC 2 but practically necessary for enterprise sales."[48]
The timing question has a clear answer. Do not pay for a pen test while the product is still changing shape weekly — you will be testing a system that no longer exists by the time you receive the report. Get your first pen test when you have a stable production environment, at least one enterprise customer or prospect requiring it, and the engineering capacity to remediate the findings. Run automated vulnerability scanning from day one. Budget for annual manual pen testing from the first enterprise deal onward.
Bug bounty programmes — where external security researchers are paid for finding vulnerabilities — are not appropriate for early-stage startups. They require a mature security posture and dedicated triage resources. What a startup should publish instead is a vulnerability disclosure policy: a simple page at /.well-known/security.txt that tells security researchers how to report issues responsibly. It costs nothing and signals professionalism.
When to Hire a CISO
The CTO at an early-stage startup is the de facto Chief Information Security Officer whether they hold the title or not. The security architecture, the compliance decisions, the incident response plan (if one exists) — these are the CTO’s responsibility by default, just as the product roadmap is the CTO’s responsibility until the first PM arrives.
The trigger for hiring a dedicated security leader follows the same pattern as hiring a PM: you need one when the compliance burden exceeds what the CTO can carry alongside their other responsibilities, and when the regulatory environment demands expertise the CTO does not have. For most B2B SaaS startups, this means a fractional CISO or virtual CISO (vCISO) starting at Series A — typically $5,000–$15,000 per month — rather than a full-time hire. The vCISO manages the SOC 2 audit process, handles security questionnaires, leads the pen test remediation, and ensures the compliance programme stays current without consuming the CTO’s calendar.
A full-time CISO typically becomes necessary at Series B or later, when the company has multiple compliance frameworks to maintain, a growing customer base with enterprise security requirements, and enough engineering surface area that security review of new features requires dedicated attention. In healthcare specifically, the regulatory complexity of HIPAA, HITRUST, and state-level privacy laws can accelerate this timeline. The CTO who waits until a breach or a failed audit to hire a security leader has waited too long. The CTO who hires one before the company has any compliance requirements has hired too early.
|
AUTHOR: How you handle security leadership at CorralData today. Are you the de facto CISO? Have you used a vCISO? What’s the plan as the company scales? |
Krieger’s Instagram reached one billion users on the stack two people chose in 2010. The stack was not optimal. It was not elegant. It was not what a senior architect would have designed if given six months and a whiteboard. It was fast to build on, easy to understand, and operationally quiet — and it let a tiny team focus on the product while the infrastructure did what it was supposed to do, which was nothing interesting at all.
Every technical decision a CTO makes in the first year of a company is a business decision wearing engineering clothes. The stack is a bet on iteration speed. The MVP is a bet on learning. The build-versus-buy choice is a bet on where engineering capacity creates the most value. The reversibility test is a tool for sizing those bets. The technology evaluation framework — innovation tokens, Elliott-McCrea’s eight questions, the timebox — is a tool for deciding which new bets to take. The technical vision is the thesis that connects individual bets into a coherent direction. And the security architecture — the permissioning model, the audit logging, the encryption decisions, the compliance posture — is a bet on the company’s ability to sell to enterprise customers without a six-month retrofit.
The company that ships the wrong product on the right infrastructure will learn and adapt. The company that ships no product on perfect infrastructure will not learn anything at all. Chapter 5 examines what happens to the debt those early decisions create — and why treating that debt as a strategic instrument, rather than an engineering failure, is the CTO’s next essential skill.