Why most startups should start with a single codebase, and when splitting into services actually makes sense.
There's a specific kind of startup failure: the team spends six months building distributed infrastructure-container orchestration, service mesh, distributed tracing-for an app that has 50 users. They optimize for scale they'll never reach, while competitors ship features.
The opposite failure exists too: a monolith that's become unmaintainable, where every deploy is a risk and teams step on each other constantly. But this problem comes later, and it's a good problem to have-it means you've succeeded enough to need scaling.
Architecture is a set of bets. The skill is knowing when to place which bet.
For most new products, start here:
One deployable: a single application. Web framework of your choice. Containerized for consistent environments.
One database: a relational database covers 95% of use cases. Add caching when you have actual latency problems, not theoretical ones.
Strong typing: catches bugs at compile time, enables confident refactoring, improves tooling assistance.
Tests from day one: not 100% coverage, but critical paths covered. The safety net that lets you move fast.
This setup handles more scale than most people expect. Companies with millions of users have run on monoliths. The constraint is usually team coordination, not technology.
Several reasons, none good:
Resume-driven development: distributed systems look impressive on LinkedIn. A working product is more impressive.
Premature scaling: "What if we get 10 million users?" You won't have this problem. And if you do, it's solvable with money and time you'll have.
Copying Big Tech: Large companies need microservices. Your 5-person startup doesn't. They solved different problems at different scales.
Underestimating coordination cost: distributed systems are hard. Network failures, data consistency, deployment coordination. Every service boundary adds complexity.
Add complexity when you have evidence it's needed, not before:
Caching: when the profiler shows database queries are the bottleneck, and those queries can't be optimized further.
Background jobs: when requests are timing out because of slow operations that don't need to be synchronous.
Service extraction: when one team needs to deploy independently, or one component has radically different scaling/reliability requirements.
The signal is always specific and measurable. "It feels slow" isn't a signal. "P95 latency is 3 seconds and users are churning" is.
Whether it's a monolith or microservices, certain practices aren't optional:
Explicit contracts: typed APIs, versioning strategy, backward compatibility rules. This matters even within a monolith-it's how you split later without breaking things.
Observability: logs with correlation IDs, metrics for SLOs (response time, error rate, throughput), tracing for debugging. You can't improve what you can't measure.
Safe deployments: feature flags for gradual rollout, database migrations that work forwards and backwards, rollback procedures that are tested.
Operational basics: runbooks for common incidents, alerts tied to user impact (not CPU usage), capacity planning before traffic spikes.
When it's time to split the monolith, do it surgically:
This is a months-long project for a critical path. Don't start until the pain is real.
Ask these questions:
Most early-stage products should answer "monolith" to all of these. The teams that ship successfully are the ones who resist premature optimization and focus on what matters: building something users want.