Short URL System Design

A Step-by-Step Walkthrough for Backend Engineers

When engineers first encounter the idea of a short URL service, the instinctive reaction is often:

"It sounds easy. It's just a redirect."

And to be fair, functionally, that statement is not wrong. At its core, a short URL system really does only two things:

  • Stores a mapping between a short string and a long URL
  • Redirects users when that short string is accessed

The challenge begins when we stop thinking in terms of functionality and start thinking in terms of constraints.


Lesson 1: Why a Short URL System Is Not as Simple as It Looks

Thinking Beyond the Happy Path

In small projects or internal tools, we often design for the happy path:

  • Few users
  • Limited data
  • No strong latency or security requirements

But public-facing infrastructure systems live in a very different world.

For a real short URL service, we are not optimizing for correctness alone.
We are optimizing for:

  • Scale
  • Latency
  • Safety

…all at the same time. Let's slow down and unpack that.

Scale Changes Everything

The moment a short URL service becomes useful, it becomes popular.

Links are shared:

  • On social media
  • In emails
  • Inside mobile apps
  • Across messaging platforms

Each short URL may live for years, and each one can be clicked hundreds or thousands of times.

This means:

  • Data volume grows continuously
  • Read traffic compounds over time
  • Old design decisions become very expensive to undo

This is why experienced engineers are often cautious with systems that look "simple" on the surface.

Latency Is Not Optional

Redirects are part of the user experience.

If clicking a link feels slow, users will notice immediately.
There is:

  • No progress bar
  • No loading indicator
  • Just a pause

In practice, this means:

  • Single-digit millisecond latency is not a luxury
  • Tail latency (P95 / P99) matters more than average latency
  • Designs that introduce retries, blocking operations, or unpredictable code paths tend to fail here

Security Is a Hidden Requirement

Another subtle requirement is non-guessability. If short URLs are predictable:

  • Attackers can crawl the entire space
  • Private links may be exposed
  • Abuse becomes trivial

This requirement quietly eliminates many otherwise elegant designs.

Lesson 1 Takeaway

A short URL system is not difficult because of its logic.
It is difficult because seemingly small design choices interact badly under real-world constraints.

Understanding this early will save you from many painful redesigns later.

✅ Quiz – Lesson 1

Question #1 of 1

Why does a short URL system become challenging at scale?


Lesson 2: The Two Flows That Define the Entire System

Before we talk about storage engines, caches, or distributed systems, we need to answer a more fundamental question:

What does this system actually spend most of its time doing?

This is a question many beginners skip — and that's a mistake.

Create Flow vs Redirect Flow

There are only two meaningful flows in a short URL system:

Create Flow

  • Input: long URL
  • Output: short URL
  • Frequency: low

Redirect Flow

  • Input: short URL
  • Output: HTTP redirect
  • Frequency: very high

This difference in frequency is not a detail — it is the core characteristic of the system.

Why This Asymmetry Matters

Let's imagine two naive designs:

  • Design A: Treat create and redirect equally
  • Design B: Optimize redirect aggressively, keep create simple

Design A often looks "cleaner" on paper, but it wastes resources:

  • Expensive consistency on a low-frequency path
  • Insufficient optimization on the hot path

Design B accepts an important reality:

Writes are important, but reads pay the bills.

In most real systems, redirect traffic completely dominates resource usage.

A Practical Engineering Instinct

Experienced engineers develop a habit here:

Always identify the hot path first.

Once you know which path is hot:

  • You can tolerate more cost on cold paths
  • You design caches, data models, and APIs accordingly

Failing to do this often results in systems that are correct, but slow.

Lesson 2 Takeaway

The short URL system is not symmetric.

If you design it as if all requests are equal, the system will not scale gracefully.

✅ Quiz – Lesson 2

Question #1 of 1

Why is the redirect flow more important to optimize than the create flow?


Lesson 3: Estimating Scale Before Designing Solutions

At this point, many people want to jump straight into architecture diagrams.

Resist that urge.

Good system design starts with approximate math, not because math is impressive, but because it prevents bad assumptions.

Estimating Data Size

Assume:

  • 500 million new short URLs per month
  • URLs are kept for 2 years

That gives us:

500M × 12 × 2 = 12 billion URLs

This number doesn't need to be exact. Its purpose is to tell us:

  • We are firmly in "distributed storage" territory
  • Any solution that assumes everything fits comfortably in a single database will struggle

Estimating Traffic

Now consider usage. If each short URL is clicked about 100 times on average:

  • Monthly reads ≈ 50 billion
  • Average QPS ≈ 20,000
  • Peak QPS ≈ 40,000

At first glance, these numbers look scary.

But this is where experience helps us separate throughput from concurrency.

Throughput vs Concurrency (Very Important)

  • Throughput answers: "How many requests per second?"
  • Concurrency answers: "How many requests are happening at the same time?"

They are related, but not the same.

If average latency is 10ms:

Concurrency ≈ 20,000 × 0.01 = 200

This means:

  • The system is busy, but not overwhelmed
  • Optimizing latency reduces concurrency pressure directly
  • Many engineers only learn this lesson after over-scaling systems unnecessarily

Lesson 3 Takeaway

Capacity estimation is not about precision.

It is about identifying:

  • What will grow without bound
  • What must stay fast
  • What can be simplified safely

✅ Quiz – Lesson 3

Question #1 of 1

Why does reducing latency directly reduce required concurrency?


Lesson 4: A Deliberately Simple High-Level Architecture

At this stage, many engineers feel an urge to design something "impressive".

  • Multiple databases
  • Message queues
  • Complex pipelines

All of these may sound professional. But experience teaches a different lesson:

Complexity should be earned, not assumed.

Before optimizing anything, we want a system that is:

  • Easy to reason about
  • Easy to scale horizontally
  • Easy to recover when something goes wrong

That naturally pushes us toward a stateless application layer backed by shared infrastructure.

Why Stateless Services Matter So Much

Stateless services give us three immediate benefits:

  • Horizontal scalability — If traffic doubles, we add more instances. No coordination needed.
  • Simpler failure handling — If a server crashes, requests are simply retried on another instance.
  • Cleaner mental model — State lives in dedicated systems (cache, storage), not hidden inside application memory.

This separation of concerns is one of the most reliable patterns in large-scale backend systems.

The First-Cut Architecture

At this point, notice what we are not doing:

  • No premature sharding logic
  • No asynchronous pipelines
  • No distributed transactions

We are deliberately choosing a design that we can grow into, rather than one we have to simplify later.

Lesson 4 Takeaway

A boring architecture is often a sign of good engineering judgment.

✅ Quiz – Lesson 4

Question #1 of 1

Why is a stateless application layer preferred here?


Lesson 5: The Core Design Question — How Do We Generate Short URLs?

This lesson is the heart of the entire system.

If you get this part wrong, everything else becomes harder:

  • Latency spikes
  • Security issues appear
  • Operational complexity grows quietly over time

Let's approach this the way experienced engineers usually do:
by evaluating ideas in the order they naturally arise.

First Instinct: Hash the Long URL

Hashing feels mathematically clean.

But truncation fundamentally changes the problem.

The issue is not that collisions might happen —
the issue is that when they do happen, the system has no clear upper bound on work:

  • Check database
  • Retry with another hash
  • Possibly retry again

This introduces tail latency, which is far more dangerous than slightly slower averages.

Second Instinct: Auto-Increment IDs

Auto-increment IDs solve collisions completely.

But they introduce predictability.

Once short URLs become enumerable, the system becomes vulnerable to:

  • Crawling
  • Privacy leaks
  • Abuse

This is a great example of a solution that is technically elegant but rejected by product and security requirements.

The Experienced Choice: Pre-Generated Random Codes

Instead of fighting uncertainty online, we move it offline.

We generate short codes:

  • In bulk
  • Ahead of time
  • With deduplication handled outside the request path

This transforms the online problem from "generate a short URL" into:

"Fetch the next available short code."

That difference is subtle, but extremely powerful.

Lesson 5 Takeaway

When latency matters, move unpredictability out of the request path.

✅ Quiz – Lesson 5

Question #1 of 1

Why does offline pre-generation improve system stability?


Lesson 6: Understanding the Short Code Pool

At first glance, a "short code pool" may sound complicated.

In reality, it's a very common pattern:

  • Database ID allocators
  • Ticketing systems
  • Resource leasing services

The core idea is always the same:

Allocate scarce resources safely and efficiently across many servers.

Conceptual Behavior

  • Short codes are stored sequentially
  • Multiple pool servers exist for availability
  • Each server consumes codes without overlap
  • Each server keeps a local in-memory buffer
  • Refilling happens in the background

The important thing is coordination, not the storage medium.

Why Sequential Allocation?

Sequential allocation makes coordination simpler:

  • Easy to reason about progress
  • Easy to recover after failures
  • Easy to monitor consumption rate

Random allocation online would reintroduce collision checks — exactly what we are trying to avoid.

Lesson 6 Takeaway

Resource pools are not about clever data structures.
They are about clear ownership and coordination.

✅ Quiz – Lesson 6

Question #1 of 1

What problem does the short code pool primarily solve?


Lesson 7: Walking Through the Create Flow

Now let's slow down and trace what happens when a user creates a short URL.

Create Short URL

Endpoint
POST /api/shorten (Create long URL → short URL)
Request
  • Client sends long URL (and optionally idempotency key for duplicate long URLs).
Server logic
  • Request a short code from the pool (pre-generated, no collision check on hot path).
  • Persist mapping (short_code → long_url) in KV store.
  • Return short URL to the client.
Response
  • Short URL (e.g. https://short.example/abc12).
Notes
  • Create flow is low frequency; keep it minimal. Pool supplies codes so the path stays predictable.

This flow is intentionally minimal. Why? Because the create path is:

  • Low frequency
  • User-facing
  • Sensitive to latency spikes

Every extra step here increases operational risk.

A Design Principle in Action

Notice that:

  • No heavy computation happens
  • No retries are expected
  • No coordination across app servers is required

This is a conscious decision.

We accept that the hard work already happened offline, so that this flow stays predictable.

Lesson 7 Takeaway

A good create flow feels almost boring — and that's a compliment.

✅ Quiz – Lesson 7

Question #1 of 1

Why should the create flow be kept simple?


Lesson 8: Redirect Flow — Where Performance Really Matters

  • The create flow is about correctness
  • The redirect flow is about speed

Redirect Flow

Endpoint
GET /{short}
Server logic
  • Lookup short code in cache first.
  • On cache hit: return 302 Redirect to long URL.
  • On cache miss: query store; if found, populate cache and return 302; if not found, return 404.
Response
  • 302 Redirect with Location: long_url.
  • Or 404 if short code not found.
Notes
  • Redirect is the hot path; cache-first keeps latency low. 302 preferred for control and observability.

This path must be:

  • fast
  • simple
  • highly cacheable

Any unnecessary logic here will show up immediately in user experience metrics.

Why Cache First?

Cache lookups are:

  • In-memory
  • Low latency
  • Cheap

If most redirects hit cache, the database becomes a fallback, not a bottleneck.

Lesson 8 Takeaway

In read-heavy systems, caches are not optimizations — they are core infrastructure.

✅ Quiz – Lesson 8

Question #1 of 1

Why is the cache placed directly on the redirect path?


Lesson 9: Cache Strategy — Knowing What Not to Cache

A common beginner mistake is trying to cache everything.

In practice, this is unnecessary and wasteful.

Traffic patterns are skewed:

  • A small set of URLs receives most clicks
  • These URLs are usually recent

So we cache:

  • Hot data
  • Recently created mappings

Cold data can safely fall through to storage.

A Practical Trade-Off

Caching everything increases:

  • Memory usage
  • Eviction churn
  • Operational complexity

Caching selectively achieves most of the benefit at a fraction of the cost.

Lesson 9 Takeaway

Effective caching is about selectivity, not completeness.


Lesson 10: Redirect Codes — Why 302 Is Often the Right Choice

Choosing between HTTP 301 and 302 looks trivial, but it has long-term consequences:

  • 301 — lets clients cache redirects
  • 302 — forces requests to pass through your system

With a fast cache-backed system, server load is no longer the main concern. What becomes more valuable:

  • Control (e.g. changing redirect target, rate limiting)
  • Observability (every redirect hits your system, so you can log and analyze)

Lesson 10 Takeaway

When infrastructure is cheap, control is expensive — and worth keeping.

✅ Quiz – Lesson 10

Question #1 of 1

Why might a system prefer HTTP 302 redirects?


Lesson 11: Supporting Custom Short URLs Safely

Custom aliases are attractive to users, but dangerous without limits.

Rules are essential:

  • Length limits
  • Character restrictions
  • Strict uniqueness checks

Failing fast is better than silently overwriting mappings.

Lesson 11 Takeaway

Flexibility without boundaries is a reliability risk.


Lesson 12: Duplicate Long URLs and Idempotency

If users submit the same long URL repeatedly, returning a new short URL each time causes:

  • Data bloat
  • Fragmented analytics
  • Confusing user experience

Idempotent creation solves this by mapping the same long URL to the same short URL.

This is a product decision, not just a technical one — and you should always be ready to explain that trade-off.

✅ Final Quiz

Question #1 of 1

Why is idempotent short URL creation often preferred?

Question #1 of 1

You need to support custom short URLs (e.g. /go/promo). What is the main risk if you do not enforce uniqueness and length limits?

Question #1 of 1

Traffic is 20k QPS and average latency is 10ms. Roughly how many in-flight requests do you have?


Final Thoughts

This system works because it respects reality:

  • Reads dominate writes
  • Latency matters more than throughput
  • Predictability beats cleverness
  • Simplicity scales

If you can internalize these lessons, you'll recognize the same patterns in many other large-scale systems.