URL Shortener

 

URL Shortener

In the modern digital ecosystem, the ability to share information efficiently is fundamental. Hyperlinks (URLs) are the backbone of the web, allowing users to navigate between resources. However, URLs can often be lengthy, complex, and difficult to share or remember. As digital communication expanded across social media, messaging platforms, and marketing channels, the need for a more concise way to represent web addresses became increasingly evident.

The URL Shortener Problem arises from this challenge — how to transform long, cumbersome URLs into short, manageable links that are easy to distribute, while still maintaining an accurate redirection to the original destination.

Overview

With the proliferation of online content and user-generated media, sharing URLs has become an integral part of communication. However, several practical challenges make this process cumbersome:

  1. Length and Readability Many URLs contain query strings, tracking parameters, and identifiers that make them excessively long and visually unappealing. For example:
  2. https://www.example.com/products/item?id=12345&utm_source=twitter&utm_campaign=spring_sale

Such URLs are difficult to remember, type, or communicate verbally.

  1. Character Limitations Platforms like Twitter impose strict character limits on posts. Including a lengthy URL can consume a significant portion of the message, leaving less room for meaningful content.
  2. Aesthetic and Trust Factors Short, branded URLs (e.g., https://nyt.ms/abc123) appear more professional and are easier to trust and recognize compared to long, generic URLs.
  3. Tracking and Analytics Organizations require insight into user engagement — how many times a link was clicked, where the traffic originated, and what devices were used. URL shorteners facilitate click tracking and analytical reporting without altering the target URL.
  4. Ease of Sharing Shortened URLs are more convenient to share in offline formats (such as QR codes, printed materials, or presentations) where space is limited.

These needs collectively motivated the emergence of URL shortening services such as TinyURL (2002) and Bitly (2008), which have since become critical tools for digital marketing, analytics, and content distribution.

Requirements

The requirements for the URL Shortener system are divided into three categories: Functional Requirements, Non-Functional Requirements.

Functional Requirements

  • URL Shortening: Users should be able to input a long URL and receive a unique, shortened alias. The shortened URL should use a compact format with English letters and digits to save space and ensure uniqueness. Also use can give custom alias too. The input url should be unieq per user, means a same url can get different short url per user.
  • URL Redirection: When users access a shortened URL, the service should redirect them seamlessly to the original URL with minimal delay.
  • URL Managment: The url must have a expire and user can delete them if they want.

Out of Scope Requirements

  • Authentication and Authorization for users (e.g., who can create URLs or access certain analytics).
  • Link Analytics: The system should be able to track the number of times each shortened URL is accessed to provide insights into link usage.
  • Monetization features (e.g., ad-based redirect pages).

Non-Functional Requirements

Performance

  • URL redirect latency must be P95 < 5 ms
  • System throughput must handle ≥ 100K reads/sec and ≥ 10K writes/sec during peak
  • Support high request concurrency without performance degradation

Availability

  • Maintain ≥ 99.99% availability for URL redirection services
  • Automatic failover mechanisms must ensure minimal downtime

Scalability

  • System must scale horizontally to support future growth
  • Handle traffic spikes up to 2× peak load without impacting SLAs

Capacity

  • Support 100M daily active users
  • Write volume: ≥ 1M new URLs/day
  • Data retention: ≥ 5 years
  • Storage: ≥ 912GB raw data, average entity size is 512 bytes.

Reliability & Durability

  • Prevent data loss with multi-region replication
  • Ensure ≥ 11 nines durability for stored URLs and metadata

Security

  • Detect and block malicious/spam URLs
  • Ensure secure handling of user data with strong access controls and audit trails

Maintainability

  • Support zero-downtime deployments
  • Provide full observability: logs, metrics, traces, and proactive alerting

Compliance

  • Data handling must adhere to regulatory and organizational retention policies (≥ 5 years)

 

Scale Requirements

  • Daily Active Users: 100 million
  • Read-to-Write Ratio: 100: 1.
  • Write Requests: ~1 million per day.
  • Data Retention Period: 5 years.
  • Average Entry Size: ~500 bytes per record.
  • Estimated Storage Requirement: 1 million writes/day × 500 bytes × 365 days × 5 years ≈ 912 GB total raw data (excluding replication and indexes)
  • Average Writes Per Second (WPS): (1,000,000 requests / 86,400 seconds) ≈ 12
  • Since Read:Write ratio is 100:1, Average Redirects per second (RPS): 12 * 100 = 1,200

 

 

Design Considerations

The Design Considerations section outlines the architectural reasoning and key technical decisions made to ensure that the URL Shortener system meets its functional and non-functional requirements efficiently and at scale.

These decisions cover unique identifier generation, data storage, caching strategy, performance goals, and system scalability.

A primary design aspect is how the system generates and manages shortened identifiers — compact, unique tokens that map to original long URLs.

Encoding Scheme

The system uses a Base62 encoding approach, which includes:

[a–z] → 26 lowercase letters 

[A–Z] → 26 uppercase letters 

[0–9] → 10 digits

This yields 62 unique symbols per position, providing a good balance between readability and entropy.

For a short URL of length L, the total number of unique combinations is calculated as:

[ 62^L ]

Capacity Estimation

Length (L)

Unique Combinations

Approx. Count

Scalability Level

4

14,776,336

14.8M

Small-scale

5

916,132,832

0.9B

Moderate-scale

6

56,800,235,584

56.8B

Recommended

7

3,521,614,606,208

3.5T

Global-scale

Given projected usage of 1 million new URLs per day and a 5-year data retention policy, the total number of URLs will be approximately 1.8 billion. A 6-character Base62 identifier provides ~56.8 billion unique values — sufficient capacity for long-term scalability with significant headroom.

Data Storage and Retention

All persistent data will be stored in a Relational Database Management System (RDBMS) to ensure ACID compliancedata integrity, and efficient querying.

  • Data will be partitioned by creation date or hash key range to support horizontal scalability and balanced load distribution.
  • Appropriate indexes will be maintained on short URL keysuser identifiers, and expiration dates to optimize read and lookup operations.
  • Regular archival of expired URLs.
  • Average Entry Size: ~500 bytes per record.
  • Estimated Storage Requirement: 1 million writes/day × 500 bytes × 365 days × 5 years ≈ 912 GB total raw data (excluding replication and indexes)

 

Caching Strategy

Efficient caching is essential to minimize database load and achieve sub-millisecond redirection latency.

Given the 100:1 read-to-write ratio, Average Redirects per second (RPS): 12 * 100 = 1,200

If we want to cache some of the hot URLs, we can follow the 80-20 rule where 20% of the URLs generate 80% of the read traffic.

Since we have 1 million writes per day, if we only cache 20% of the hot urls in a day, Total cache memory required = 1M * 0.2 * 127 Bytes = 25.4 MB.

Assuming a cache hit ratio of 90%, we only need to handle 10% of the redirect requests directly from the database.

Requests hitting the DB: 1,200 × 0.10 ≈ 120 RPS

 

Eviction Policy

The cache employs the Least Recently Used (LRU) strategy with optional key expiration:

  • Frequently accessed URLs remain in memory
  • Least recently used or expired keys are automatically evicted as memory fills
  • This ensures efficient memory utilization and consistent cache performance

Cache Operations

  • On URL redirection, the short code is first queried from Redis
  • If found → redirect immediately
  • If not found → fetch from the RDBMS, then asynchronously update Redis

This caching approach guarantees high throughputlow latency, and load isolation from the primary database, while maintaining full consistency with persistent storage.

Security and Abuse Prevention

To ensure system trustworthiness and prevent misuse:

  • Destination Validation: Prevent shortening of malicious or blacklisted URLs.
  • Rate Limiting: Restrict URL creation per user/IP to prevent abuse.
  • Input Sanitization: Guard against injection attacks and malformed inputs.
  • HTTPS Enforcement: All requests must be served over TLS to protect data integrity.

Solution Approaches

The Solution Approaches section presents multiple strategies to design and implement the URL Shortener system. Each approach offers distinct trade-offs in terms of scalability, performance, complexity, and data consistency. These approaches primarily differ in how short keys are generatedhow data is stored and accessed, and how collisions and uniqueness are managed.

Approach 1: Auto-Increment ID + Base62 Encoding

Overview

In this approach, each new URL insertion is assigned a unique, auto-incrementing numeric ID from the database. This ID is then converted into a short code using Base62 encoding (using characters a–z, A–Z, and 0–9).

Key Generation Process

  1. Insert long URL into the RDBMS → receive an incremental integer ID.
  2. Encode the ID into a Base62 string.
  3. Use the encoded string as the short URL key.

Advantages

  • Simple and deterministic — no collisions.
  • Fast key generation (delegated to the database sequence).
  • Compact URLs due to Base62 encoding.
  • Easily scalable by partitioning ID ranges per database instance.

Disadvantages

  • Potential predictability — users could guess URLs by incrementing IDs.
  • Requires centralized ID generation or sequence management to avoid collisions across shards.

Best Fit

Small-to-medium scale systems or systems with strong RDBMS dependency and moderate write throughput.

Approach 2: Hash-Based Key Generation

Overview

Each long URL is passed through a hashing algorithm (e.g., MD5, SHA-1, or MurmurHash) to generate a unique identifier. A fixed number of characters (e.g., first 6–8) from the hash output are used as the short key.

Key Generation Process

  1. Compute hash of the original URL (optionally combined with user ID or timestamp).
  2. Take the first N characters of the hash as the short code.
  3. On collision, append a salt or increment a counter.

Advantages

  • Stateless key generation — no dependency on database sequences.
  • Can be distributed easily (each node can independently generate keys).
  • Good randomness — harder for users to guess short URLs.

Disadvantages

  • Collision handling adds complexity.
  • Non-deterministic length control if hash truncation leads to duplicates.
  • Slightly slower than auto-increment approach for very high throughput systems.

Best Fit

Systems requiring distributed key generation without centralized coordination.

Approach 3: Random String Generation with Collision Check

Overview

The system generates a random Base62 string of a fixed length (e.g., 6 or 7 characters) for each new URL. Before finalizing, it checks whether the generated key already exists in the database.

Key Generation Process

  1. Generate random 6-character Base62 string.
  2. Check existence in RDBMS.
  3. If collision occurs, regenerate until unique key is found.

Advantages

  • Simple and stateless.
  • Good randomness and unpredictability.
  • Easy to implement without sequence management.

Disadvantages

  • Possible collisions, especially as the dataset grows (though rare for large Base62 space).
  • Slightly higher latency during key creation due to existence checks.
  • Database contention on uniqueness constraint for high write volume.

Best Fit

Smaller systems or systems prioritizing randomness and simplicity over absolute performance.

Approach 4: Distributed ID Generation (Snowflake-like IDs)

Overview

This approach uses a distributed unique ID generation service inspired by Twitter’s Snowflake algorithm. Each generated 64-bit ID encodes:

  • Timestamp bits
  • Machine ID bits
  • Sequence bits

The resulting ID is then Base62-encoded to form the short URL.

Key Generation Process

  1. Each node generates unique 64-bit IDs using timestamp + node ID + counter.
  2. Convert ID to Base62 string.
  3. Store mapping in RDBMS.

Advantages

  • Globally unique and scalable across data centers.
  • No collisions without central coordination.
  • High throughput (millions of IDs per second).

Disadvantages

  • Slightly more complex implementation (requires clock synchronization).
  • Not deterministic — hard to reproduce keys for the same URL.

Best Fit

Large-scale, distributed systems requiring high availabilitygeo-redundancy, and independent node operations.

Approach 5: Pre-Generated Key Pool (Dedicated Key Generation Service)

Overview

separate service is responsible for pre-generating and managing a pool of unique short keys, which are stored in advance in a database table. The main URL Shortener service simply fetches an available key from the pool during URL creation, eliminating key generation latency at runtime.

Key Generation Process

  1. Key Generation Service pre-populates a table (e.g., AvailableKeys) with millions of unique Base62 strings.
  2. When a new URL is shortened:
    • The service fetches one unused key (e.g., SELECT TOP 1 … FOR UPDATE).
    • Marks it as used and stores the mapping.
  3. When the pool runs low, the background job replenishes it asynchronously.

Advantages

  • Instant key allocation — no hash computation or collision checks at runtime.
  • Supports horizontal scaling — multiple nodes can draw from the same pool.
  • Predictable performance and low latency even under heavy load.
  • Decouples key lifecycle management from the core URL service.

Disadvantages

  • Requires key pool monitoring and replenishment logic.
  • Concurrency control is needed to avoid multiple nodes claiming the same key.
  • Slightly higher operational complexity due to an extra service.

Best Fit

High-throughput production systems requiring consistent millisecond-level latency and separation of concerns.

Selected Approach — Hybrid: Snowflake-style ID Generator + Pre-generated Key Pool

We will adopt the hybrid approach: use a distributed, Snowflake-like ID generator to create globally unique numeric IDs, pre-generate batches of Base62-encoded keys, store them in a Key Pool table, and let application servers consume keys from the pool at URL-creation time. This gives the best balance of low-latency writesglobal uniquenesshorizontal scale, and operational predictability.

Below is a professional, production-ready design package for the next phases: high-level architecturedatabase design (RDBMS)API specificationcache & key-pool behavioroperational notes, and monitoring.

High-level Architecture (components & flow)

                        

Database Design

We assume a horizontally scalable RDBMS (e.g., PostgreSQL with partitioning, or managed cluster like Aurora) and usage of standard SQL features (transactions, row-level locking).

URL Mappings Table — (url_mappings)

Column

Type

Nullable

Description

id

BIGINT

No

Primary key (unique identifier for each mapping)

short_key

VARCHAR(16)

No

Base62 short URL identifier (~6 chars), must be unique

long_url

TEXT

No

Original long URL

owner_id

BIGINT

Yes

User account ID (null if anonymous)

created_at

TIMESTAMP (UTC)

No

Time record was created (default = now)

expires_at

TIMESTAMP (UTC)

Yes

TTL expiration timestamp (optional)

deleted_at

TIMESTAMP (UTC)

Yes

Timestamp used for soft deletion

 

Indexing & Partitioning Decisions

  • Unique index: (short_key) for fast redirect lookups
  • Unique index: (owner_id, long_url) to avoid duplicates per user
  • Index: (owner_id, created_at DESC) for user reporting
  • Index: (expires_at, deleted_at) for cleanup jobs
  • Partition by created_at (monthly/quarterly) for query scalability

 

Key Pool Table — (key_pool)

Column

Type

Nullable

Description

key_id

BIGINT

No

Primary key for key pool record

short_key

VARCHAR(16)

No

Pre-generated Base62 key (unique)

status

SMALLINT

No

Key state → 0=Available, 1=Reserved, 2=Used, 3=Retired

created_at

TIMESTAMP (UTC)

No

Generation timestamp

reserved_by

VARCHAR(64)

Yes

Worker/transaction ID reserving the key

reserved_at

TIMESTAMP (UTC)

Yes

Time key was reserved

retired_at

TIMESTAMP (UTC)

Yes

Time key expired/retired

updated_at

TIMESTAMP (UTC)

No

Last modification timestamp

 

 

API Design (RESTful) — Endpoint specification

All endpoints use JSON, TLS, and rate-limiting.

Create short URL

POST /v1/shorten

Request

{

  "longUrl": "https://example.com/very/long/url?x=1",

  "customAlias": "promo2026",        // optional

}

Responses

  • 201 Created

{

  "shortUrl": "https://short.ly/Ab3dE4",

  "expiresAt": "2026-01-01T00:00:00Z"

}

  • 409 Conflict if customAlias already exists.
  • 400 Bad Request invalid URL or domain blacklisted.
  • 429 Too Many Requests rate limit exceeded.

Behavior

  • Reserve key from key_pool.
  • Create mapping in url_mappings within transaction.
  • Populate Redis cache with short_key→long_url.
  • Return created resource.

2. Redirect (short URL)

GET /{shortKey}

Behavior

  • Lookup Redis by shortKey. If found → 302 Found (or 301 if permanent) redirect to long_url.
  • If miss → query url_mappings → if found, cache it and redirect.
  • If not found or expired → 404 Not Found or 410 Gone.

Notes

  • Increment click counter asynchronously (e.g., push event to queue).

Delete a short URL

DELETE /v1/urls/{shortKey}

Auth: optional owner verification

Responses

  • 204 No Content success
  • 404 Not Found if not present
  • 403 Forbidden if not owner

Behavior

  • Mark mapping as deleted or remove row (depending on retention policy).
  • Optionally mark key as recyclable after quarantine.

Get analytics (basic)

GET /v1/urls/{shortKey}/stats

Response

{

  "shortKey": "Ab3dE4",

  "clicks": 12345,

  "createdAt": "2025-11-02T12:00:00Z",

  "lastAccessedAt": "2025-11-30T09:00:00Z"

}

Notes

  • For heavy analytics, query pre-aggregated tables or OLAP store.

Admin: Key Pool status

GET /v1/admin/keypool/status (internal)

Response

{

  "availableKeys": 12500,

  "reservedKeys": 10,

  "usedKeys": 1250000,

  "lowWaterMark": 1000

}

Operational & Monitoring checklist

Metrics to emit

  • QPS: create requests / redirect requests (global and per-region).
  • Key pool size (available/reserved/used).
  • Cache hit ratio, average latency (p50/p95/p99).
  • DB connections, slow queries (> 50 ms).
  • Key generation throughput (keys/sec).
  • Error rates (4xx/5xx), rate limiting rejections.

Alarming

  • Key pool below low_water_mark → alert.
  • Redis memory usage > 80% → alert.
  • Cache hit ratio drops below threshold (e.g., 85%) → investigate.
  • DB CPU or connection saturation → scale or throttle.

Backups & DR

  • Regular DB snapshots with cross-region storage.
  • Key pool persistence must be durable (in DB) to avoid loss.
  • Plan for rolling upgrades of KeyGen service and safe rebalancing of reserved keys.

 

Security & Abuse mitigation

  • Domain & URL safety checks (blocklisted domains and payload scans).
  • Rate-limit create requests per IP or API key.
  • CAPTCHA on anonymous bulk creation to mitigate bots.
  • Monitor for anomalous patterns and add auto-throttle.

 

Low Level System Design

Base62 Encoder

public static class Base62Encoder
{
    private const string Characters = "0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz";

    public static string Encode(long number)
    {
        if (number == 0)
            return Characters[0].ToString();

        var result = new StringBuilder();
        while (number > 0)
        {
            var remainder = (int)(number % 62);
            result.Insert(0, Characters[remainder]);
            number /= 62;
        }

        return result.ToString();
    }
}

Snowflake ID Generator

public class SnowflakeIdGenerator
{
    private const long Twepoch = 1577836800000L; // Custom epoch: 2020-01-01 UTC
    private const int WorkerIdBits = 10;
    private const int SequenceBits = 12;

    private const long MaxWorkerId = -1L ^ (-1L << WorkerIdBits);
    private const long MaxSequence = -1L ^ (-1L << SequenceBits);

    private const int WorkerIdShift = SequenceBits;
    private const int TimestampLeftShift = SequenceBits + WorkerIdBits;

    private long _lastTimestamp = -1L;
    private long _sequence = 0L;
    private readonly object _lock = new();
    private readonly long _workerId;

    public SnowflakeIdGenerator(long workerId)
    {
        if (workerId < 0 || workerId > MaxWorkerId)
            throw new ArgumentException($"Worker ID must be between 0 and {MaxWorkerId}");

        _workerId = workerId;
    }

    public long NextId()
    {
        lock (_lock)
        {
            var timestamp = GetCurrentTimestamp();

            if (timestamp < _lastTimestamp)
                throw new InvalidOperationException("Clock moved backwards. Refusing to generate ID.");

            if (timestamp == _lastTimestamp)
            {
                _sequence = (_sequence + 1) & MaxSequence;
                if (_sequence == 0)
                    timestamp = WaitForNextMillis(_lastTimestamp);
            }
            else
            {
                _sequence = 0;
            }

            _lastTimestamp = timestamp;

            return ((timestamp - Twepoch) << TimestampLeftShift)
                   | (_workerId << WorkerIdShift)
                   | _sequence;
        }
    }

    private static long GetCurrentTimestamp()
        => DateTimeOffset.UtcNow.ToUnixTimeMilliseconds();

    private static long WaitForNextMillis(long lastTimestamp)
    {
        var timestamp = GetCurrentTimestamp();
        while (timestamp <= lastTimestamp)
            timestamp = GetCurrentTimestamp();
        return timestamp;
    }
}

Hybrid Short Key Generator

public class ShortKeyGenerator
{
    private readonly SnowflakeIdGenerator _snowflake;
    private readonly int _minLength;
    private readonly Random _random = new();

    public ShortKeyGenerator(long workerId, int minLength = 6)
    {
        _snowflake = new SnowflakeIdGenerator(workerId);
        _minLength = minLength;
    }

    public string GenerateShortKey()
    {
        var id = _snowflake.NextId();
        var base62 = Base62Encoder.Encode(id);

        // Ensure minimum length by padding (if needed)
        if (base62.Length < _minLength)
        {
            var pad = new string(Enumerable.Range(0, _minLength - base62.Length)
                                           .Select(_ => Base62Encoder.Encode(_random.Next(62))[0])
                                           .ToArray());
            base62 = base62 + pad;
        }

        return base62;
    }
}

Vikash Chauhan

C# & .NET experienced Software Engineer with a demonstrated history of working in the computer software industry.

Post a Comment

Previous Post Next Post

Contact Form