Feb 14, 2025

Right-Sizing Architecture: The Monolith-First Approach

Why most startups should start with a single codebase, and when splitting into services actually makes sense.

There's a specific kind of startup failure: the team spends six months building distributed infrastructure-container orchestration, service mesh, distributed tracing-for an app that has 50 users. They optimize for scale they'll never reach, while competitors ship features.

The opposite failure exists too: a monolith that's become unmaintainable, where every deploy is a risk and teams step on each other constantly. But this problem comes later, and it's a good problem to have-it means you've succeeded enough to need scaling.

Architecture is a set of bets. The skill is knowing when to place which bet.

The Monolith-First Default

For most new products, start here:

One deployable: a single application. Web framework of your choice. Containerized for consistent environments.

One database: a relational database covers 95% of use cases. Add caching when you have actual latency problems, not theoretical ones.

Strong typing: catches bugs at compile time, enables confident refactoring, improves tooling assistance.

Tests from day one: not 100% coverage, but critical paths covered. The safety net that lets you move fast.

This setup handles more scale than most people expect. Companies with millions of users have run on monoliths. The constraint is usually team coordination, not technology.

Why Companies Over-Complicate Early

Several reasons, none good:

Resume-driven development: distributed systems look impressive on LinkedIn. A working product is more impressive.

Premature scaling: "What if we get 10 million users?" You won't have this problem. And if you do, it's solvable with money and time you'll have.

Copying Big Tech: Large companies need microservices. Your 5-person startup doesn't. They solved different problems at different scales.

Underestimating coordination cost: distributed systems are hard. Network failures, data consistency, deployment coordination. Every service boundary adds complexity.

When to Add Infrastructure

Add complexity when you have evidence it's needed, not before:

Caching: when the profiler shows database queries are the bottleneck, and those queries can't be optimized further.

Background jobs: when requests are timing out because of slow operations that don't need to be synchronous.

Service extraction: when one team needs to deploy independently, or one component has radically different scaling/reliability requirements.

The signal is always specific and measurable. "It feels slow" isn't a signal. "P95 latency is 3 seconds and users are churning" is.

Non-Negotiables at Any Scale

Whether it's a monolith or microservices, certain practices aren't optional:

Explicit contracts: typed APIs, versioning strategy, backward compatibility rules. This matters even within a monolith-it's how you split later without breaking things.

Observability: logs with correlation IDs, metrics for SLOs (response time, error rate, throughput), tracing for debugging. You can't improve what you can't measure.

Safe deployments: feature flags for gradual rollout, database migrations that work forwards and backwards, rollback procedures that are tested.

Operational basics: runbooks for common incidents, alerts tied to user impact (not CPU usage), capacity planning before traffic spikes.

The Extraction Playbook

When it's time to split the monolith, do it surgically:

Identify the boundary-a domain with clear interfaces and different operational needs
Define the API contract explicitly
Build the new service, running both paths in parallel
Shift traffic gradually, monitoring for regressions
Remove the old code path once the new one is proven

This is a months-long project for a critical path. Don't start until the pain is real.

The Decision Framework

Ask these questions:

Do we have users? (If no: optimize for learning speed)
Do we have product-market fit? (If no: don't build for scale you might never need)
Is one team stepping on another? (Service boundary might help)
Is one component failing and taking down everything? (Fault isolation might help)
Are we hitting infrastructure limits? (Profile first, then add complexity)

Most early-stage products should answer "monolith" to all of these. The teams that ship successfully are the ones who resist premature optimization and focus on what matters: building something users want.

From Vague Request to Shipped Feature

How to turn unclear stakeholder asks into concrete deliverables with API contracts, success metrics, and minimal rework.