The design of a Pastebin (or Gist-like) system — a service that allows users to create, store, and share text or code snippets online.
Overview
A Pastebin or Gist system enables users to submit a piece of text or code and obtain a shareable link that allows others to view it. The primary goals of such a system are:
- Fast read and write operations
- Efficient storage of large textual or binary content
- Metadata-driven retrieval and filtering
- Optional privacy and access control
While conceptually simple, the system must scale efficiently to handle millions of pastes, ensure persistence, and manage user access patterns.
Requirements
Functional Requirements
- Create Paste: Accept content (text or small binary), optional title, language, visibility (public/unlisted/private), TTL/expiration, and optionally a custom slug (if user requests human-friendly alias — optional feature).
- Read Paste: Given GUID (or optional custom slug), return paste metadata and content (raw or rendered). Support Content-Type negotiation (text/plain, text/html for rendered view).
- Delete/Expire: Owner (or admin) can delete or mark a paste expired.
- Fork/Versioning (optional): Allow users to fork or create versions of pastes.
- List Pasts: Paginated listing of a user's pastes.
Non-Functional Requirements
- High Availability: Paste content should be available with high uptime; use CDN for static raw content distribution.
- Low Latency: Metadata lookup and generation of signed URLs should be fast (sub-10ms lookup target for metadata; CDN handles content delivery latency).
- High Durability: Use multi-region replication of object storage for content durability.
- Security: Content scanning, embed protection, rate limits, and optional authentication.
Scale & Assumptions (example numbers — adapt to your needs)
- Daily Active Users: 10–50M (depends on product goals)
- Writes per day: 200k — 2M
- Average Paste Size: 2–8 KB (text); some may be larger (attachments up to configured max, e.g., 1–5 MB)
- Retention: configurable per paste; default 1 year for non-expiring public content
- Storage: 2M writes/day 4 KB 365 ≈ 2.9 TB/year raw (before replication and versions)
Design Considerations
The Design Considerations section outlines the architectural reasoning and key technical decisions made to ensure that the Pastebin or Gist system meets its functional and non-functional requirements efficiently and at scale.
High-level decisions:
- Use UUID/GUID for paste identifiers. This is simple, globally unique, and removes the need for centralized ID allocation.
- Store content in object/blob storage (S3, Azure Blob, GCS) and store metadata in RDBMS for queries and small lookups.
- Use CDN with signed URLs for efficient content delivery and to offload traffic from origin.
- Cache metadata and small hot content in Redis.
- Provide server-side limits and scanning on upload to detect abuse/PII/malware.
Identifier Design
Use UUIDv4 (random) or UUIDv7 (time-ordered) depending on indexing preferences:
- UUIDv4: random, simple, no coordination.
- UUIDv7: time-ordered (if you prefer monotonicity for partitioning and locality in database indices).
Contract: Paste ID is a UUID string; clients will receive it as the primary reference (e.g., https://paste.example/3f2b7a1c-0b45-4e3f-bc1c-fdbe6c8f9b21).
Edge cases:
- Very large upload attempt → reject with 413 (Payload Too Large).
- Duplicate content detection: optional dedup by storing content-hash and mapping multiple GUIDs to same blob (if desired).
Perfect — here’s your corrected and expanded “Design Considerations” section, maintaining the same professional tone, formatting, and structure as your existing document while adding a robust explanation for slug generation (based on title or fallback rules) and ensuring uniqueness through database constraints or collision-safe patterns.
Slug Generation
In addition to the UUID, the system can expose a human-friendly slug that enhances readability and usability. The slug is optional and can be:
User-provided:
- Users can specify a custom slug at paste creation (e.g.,
my-first-snippet). - The slug must be unique in the database.
- On conflict, the API should return a
409 Conflicterror with a clear message.
- Users can specify a custom slug at paste creation (e.g.,
System-generated:
If no slug is provided, the system generates one automatically using:
- A sanitized, lowercased version of the title (e.g.,
"Hello World!" → "hello-world"), and - A short entropy suffix (e.g., timestamp fragment, short hash, or nanoid) to ensure uniqueness.
- A sanitized, lowercased version of the title (e.g.,
Example generated slug:
hello-world-20251105orhello-world-a7f3c1A robust approach for uniqueness:
- Attempt slug generation based on title + date or hash.
- Perform a quick uniqueness check in the metadata store.
- If conflict occurs, retry with an additional entropy component.
Slug generation algorithm (example):
1. Normalize title:
- Convert to lowercase
- Remove punctuation
- Replace spaces with hyphens
- Truncate to max 50 characters
2. Append entropy:
- Add short suffix: last 6 chars of UUIDv4 or a timestamp fragment
3. Example:
Title: "System Design Notes"
UUID: 3f2b7a1c-0b45-4e3f-bc1c-fdbe6c8f9b21
Slug: "system-design-notes-fdbe6c"
Data Storage
1) Blob / Object Storage (S3/Azure/GCS)
- Stores the paste content blob at path:
pastes/{yyyy}/{mm}/{uuid}or content-addressed pathblobs/{sha256}. - Use server-side encryption at rest and TLS in transit.
- Configure lifecycle policies for expired content retention and deletion.
2) Relational Database (Metadata)
- Keeps small metadata records for fast lookups, queries, and ACLs.
Example metadata schema (Postgres DDL):
CREATE TABLE paste_metadata (
id UUID PRIMARY KEY,
blob_Id TEXT NOT NULL,
owner_id UUID NULL, -- nullable for anonymous pastes
title TEXT NULL,
language VARCHAR(64) NULL, -- syntax highlighting tag
description TEXT NULL,
visibility SMALLINT NOT NULL DEFAULT 0, -- 0=public,1=unlisted,2=private
created_at TIMESTAMPTZ NOT NULL DEFAULT now(),
expires_at TIMESTAMPTZ NULL,
deleted_at TIMESTAMPTZ NULL
) Partition by range (created_at);
CREATE INDEX idx_paste_owner_created_at ON paste_metadata (owner_id, created_at DESC);
CREATE INDEX idx_paste_expires ON paste_metadata (expires_at);
CREATE INDEX idx_paste_visibility ON paste_metadata (visibility);
CREATE UNIQUE INDEX idx_paste_slug_unique ON paste_metadata (slug);
CREATE TABLE paste_versions (
id BIGSERIAL PRIMARY KEY,
paste_id UUID NOT NULL REFERENCES paste_metadata(id) ON DELETE CASCADE,
version_number INT NOT NULL,
blob_path TEXT NOT NULL,
content_hash CHAR(64) NOT NULL,
size_bytes BIGINT,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
is_latest BOOLEAN DEFAULT TRUE,
CONSTRAINT unique_version UNIQUE (paste_id, version_number)
);
Notes:
blob_Idis the authoritative pointer to the content in the object store.- Use partitioning if metadata volume grows extremely large (e.g., monthly partitions on
created_at). - Optionally keep a
content_hashindex for dedup lookup.
Content Deduplication (Optional)
If desired, compute sha256 of the content on upload and store binary in blob store keyed by hash. Multiple GUIDs map to the same blob path; metadata entries reference the blob hash. This saves storage for duplicate content but requires reference counting or garbage collection to remove unused blobs.
Caching Strategy
- Redis for metadata lookups: Key:
paste:meta:{id}→ small JSON with title, blob_path, size, visibility, and possibly a signed URL ttl. - CDN + Signed URL: For serving raw content, generate a short-lived signed URL from the object storage or use a CDN origin that enforces signed headers.
- Optional content cache: For very small pastes (< 16 KB), cache the content in Redis to serve directly from cache.
Cache Operations:
- Read Path: check Redis for metadata; if present and visibility allows, return metadata and either cached content or generate signed URL.
- Write Path: on create, upload blob to object store (or dedup and reference existing blob), insert metadata row, then update Redis.
High-Level Architecture
┌────────────────────────┐
│ Clients │
│ (Web / Mobile / APIs) │
└─────────────┬──────────┘
│ HTTPS
┌─────────▼──────────┐
│ API Gateway │
│ (LB, TLS, Rate Lim) │
└─────────┬──────────┘
│
┌─────────────────▼─────────────────┐
│ Paste Service API │
│ (Submit, Read, Delete, List, Admin)│
└──────────┬────────────┬────────────┘
│ │
┌────────────▼──┐ ┌──▼──────────────┐
│ Azure Queue │ │ Redis Cache │
│ (Create Jobs) │ │ (Metadata + hot) │
└────────────┬──┘ └──┬──────────────┘
│ │
┌────────────▼──────────┐ │
│ Azure Function Worker │ │
│ (Scan, Dedup, Upload, │ │
│ Finalize Paste) │ │
└────────────┬───────────┘ │
│ │
┌──────────────▼────────────┐ │
│ Paste Service API (Int) │ │
│ (Finalize Paste via M2M) │ │
└──────────────┬────────────┘ │
│ │
┌────────────▼──────────┐ │
│ Relational Database │ │
│ (paste_metadata) │ │
└────────────┬──────────┘ │
│ │
┌────────────▼────────────┐ │
│ Object / Blob Storage │ │
│ (S3 / Azure Blob / GCS) │ │
│ + Lifecycle Policies │ │
└────────────┬────────────┘ │
│ │
┌───────▼────────────┐ │
│ CDN │ │
│ (Optional caching) │ │
└────────────────────┘ │
│
┌────────────────▼──────────────┐
│ Background Jobs / GC / Scanner│
│ (Expire pastes, clean orphans)│
└───────────────────────────────┘
- Background workers or serverless functions for: content scanning, thumbnail or rendering (if needed), lifecycle expiration and garbage collection of blobs, deref-counted blob deletion (for dedup scheme).
API Design (RESTful)
All endpoints use TLS and JSON for metadata operations. Raw content retrieval may return text/plain or a rendered format depending on client headers.
1. Create Paste (Asynchronous Submission)
Endpoint
POST /v1/pastes
Purpose
Submit a paste creation request. The request is validated and queued for content scanning and asynchronous processing.
Request (JSON or multipart/form-data)
{
"content": "print('hello world')",
"title": "Hello World Snippet",
"description": "Sample Python snippet",
"language": "python",
"visibility": "public|unlisted|private",
"expiresIn": "7d",
"slug": "hello-world" // optional custom slug
}
Immediate Response
| Status | Description |
|---|---|
202 Accepted | Request accepted and queued for scanning/processing |
400 Bad Request | Invalid payload or missing required fields |
413 Payload Too Large | Exceeds configured size limit |
429 Too Many Requests | Rate limit exceeded |
Example Response:
{
"requestId": "b62c28b4-1531-4893-b2b9-1f05c8f9dcd9",
"status": "accepted",
"message": "Paste submitted for scanning and processing.",
"estimatedProcessingTime": "a few seconds"
}
Processing Flow (Asynchronous)
API Gateway / Paste API performs:
- Basic size and MIME validation.
- Writes an event to Azure Storage Queue / Service Bus Queue with metadata and user info.
- Returns HTTP
202 Accepted.
Azure Function (Worker):
- Triggered from the queue.
- Fetches request payload.
- Performs deep content scanning (PII, malware, abuse patterns).
- Computes sha256 hash and performs deduplication check.
- Uploads validated content to the object store (S3/Azure Blob/GCS).
- Posts the final sanitized payload to the Paste Service API using a Machine-to-Machine (M2M) token.
Paste Service API (Internal Endpoint):
- Inserts a new metadata record into the relational database.
- Returns the final paste metadata (ID, URLs, etc.) to the worker.
- Worker updates the processing result (optional callback or polling).
Internal Endpoint (Used by Worker)
POST /internal/v1/pastes/finalize
This endpoint is protected via M2M credentials and not exposed to external users.
Request:
{
"id": "3f2b7a1c-0b45-4e3f-bc1c-fdbe6c8f9b21",
"slug": "hello-world-fdbe6c",
"blobPath": "pastes/2025/11/3f2b7a1c...",
"contentHash": "sha256:abc123...",
"title": "Hello World Snippet",
"language": "python",
"visibility": "public",
"ownerId": "d6e2f1a9-...",
"sizeBytes": 128,
"expiresAt": "2025-11-12T00:00:00Z"
}
Response:
{
"id": "3f2b7a1c-0b45-4e3f-bc1c-fdbe6c8f9b21",
"url": "https://paste.example/hello-world-fdbe6c",
"rawUrl": "https://cdn.example/pastes/2025/11/3f2b7a1c..."
}
2. Get Paste (Rendered View)
GET /{slugOrId}
Behavior
- Look up metadata in Redis or DB.
If visibility allows, return either:
- Rendered HTML view (for web clients), or
- Raw content (for API clients via redirect or signed URL).
3. Get Raw Paste
GET /v1/pastes/{slugOrId}/raw
Behavior
- Validate visibility and auth.
- Stream raw content from object store or redirect to signed CDN URL.
4. Delete Paste
DELETE /v1/pastes/{slugOrId}
Behavior
- Validate ownership or admin privileges.
- Mark metadata as deleted and enqueue deletion job.
- Background worker removes blob after quarantine delay (for audit/recovery).
5. List User Pastes
GET /v1/users/{userId}/pastes?limit=20&cursor=...
Behavior
- Paginated listing of metadata (title, slug, createdAt, expiresAt, etc.).
- Does not fetch or return raw content.
- Supports filters (visibility, language, expiration).
6. Admin: Blob Health & GC
GET /v1/admin/blobs/status
Returns summary of:
- Blob reference counts
- Orphaned blobs
- Pending deletions
- Last GC run time
Protected with admin authentication.
7. Get Specific Version
GET /v1/pastes/{slug}/versions/{version}
Returns content and metadata for a specific version.
8. Update Paste (create new version)
PATCH /v1/pastes/{slug}
Behavior
- Validates ownership and size limits.
- Queues for content scan (as in
POST). - When approved, Azure Function uploads new blob, creates new version entry, updates metadata.
- Returns HTTP
202 Acceptedimmediately.
Responses
202 Accepted
{
"message": "Update accepted and queued for processing",
"slug": "my-snippet",
"queuedAt": "2025-11-05T07:25:00Z"
}
Azure Function posts final payload via M2M token:
POST /v1/internal/pastes/{slug}/versions
Authorization: Bearer <m2m-token>
9. List Versions
GET /v1/pastes/{slug}/versions
Returns a list of all versions:
[
{ "version": 1, "createdAt": "2025-11-01T10:00:00Z", "isLatest": false },
{ "version": 2, "createdAt": "2025-11-05T08:00:00Z", "isLatest": true }
]Security & Abuse Prevention
- Max paste size (e.g., 5 MB) enforced per account tier.
- Per-IP and per-account rate limits (e.g., 30 creates/hour for anonymous).
- Deep scanning pipeline for malware/PII.
- Quarantine or reject malicious pastes.
- Auth required for private pastes.
- CSP headers and CORS restrictions on rendered HTML views.
Monitoring, Metrics & Alerts
Metrics:
- Queue length and processing latency
- Paste creation throughput (requests/sec)
- Average paste size
- Blob store egress volume
- CDN and Redis hit ratios
- GC and orphan blob cleanup stats
Alerts:
- Queue backlog > threshold
- Content scan failures
- Blob storage egress anomalies
- High quarantined content ratio
Retention, GC, and Cost Controls
- Object store lifecycle rules automatically delete expired blobs.
- Periodic GC job removes metadata marked
deleted_atand dereferences blobs. - Optional reference-count model for deduped blobs.
- Quarantine window before permanent deletion for audit/recovery.
Low-Level Design and Implementation Details
This section provides representative implementation snippets in C# demonstrating how the Pastebin/Gist service can generate unique identifiers, compute content hashes, derive unique slugs, and persist metadata with blob uploads. The examples assume an Azure-based environment, though the logic applies equally to AWS or GCP.
Generating Paste Identifier, Hash, and Slug
Core responsibilities:
- Generate a globally unique paste ID (UUID / GUID).
- Compute a cryptographic hash (SHA-256) of the content.
- Generate a slug — either provided by the user or auto-generated from the title (and time-based suffix to ensure uniqueness).
using System;
using System.Security.Cryptography;
using System.Text;
using System.Text.RegularExpressions;
public static class PasteUtils
{
/// <summary>
/// Generates a globally unique identifier for a paste.
/// </summary>
public static string GeneratePasteId() => Guid.NewGuid().ToString();
/// <summary>
/// Computes the SHA-256 hash of the given content.
/// Used for deduplication and integrity verification.
/// </summary>
public static string ComputeContentHash(byte[] content)
{
using var sha = SHA256.Create();
var hashBytes = sha.ComputeHash(content);
return BitConverter.ToString(hashBytes).Replace("-", "").ToLowerInvariant();
}
/// <summary>
/// Generates a URL-friendly slug.
/// If user-provided slug is null, derive from title or generate a timestamped fallback.
/// </summary>
public static string GenerateSlug(string? userSlug, string? title = null)
{
if (!string.IsNullOrWhiteSpace(userSlug))
return NormalizeSlug(userSlug);
if (!string.IsNullOrWhiteSpace(title))
{
var baseSlug = NormalizeSlug(title);
// Append short timestamp to ensure uniqueness
return $"{baseSlug}-{DateTime.UtcNow:yyyyMMddHHmmss}";
}
// Fallback: random slug derived from GUID short form
return Guid.NewGuid().ToString("N")[..8];
}
private static string NormalizeSlug(string input)
{
// Convert to lowercase, replace spaces with hyphens, remove invalid chars
var slug = input.ToLowerInvariant();
slug = Regex.Replace(slug, @"[^a-z0-9\s-]", ""); // remove invalid
slug = Regex.Replace(slug, @"\s+", "-"); // spaces to dashes
slug = Regex.Replace(slug, "-+", "-"); // collapse multiple dashes
return slug.Trim('-');
}
}
Algorithm Notes:
- The slug generation ensures human readability while maintaining uniqueness through timestamp or short-GUID suffixes.
- Using SHA-256 for content hashing provides cryptographic strength and low collision probability.
- The
slugfield should be unique in the database (UNIQUE INDEX(slug)).
Uploading Content to Blob Storage (Azure Example)
using Azure.Storage.Blobs;
using Azure.Storage.Blobs.Models;
using System.Threading.Tasks;
public class PasteStorageService
{
private readonly BlobContainerClient _container;
public PasteStorageService(string connectionString, string containerName)
{
_container = new BlobContainerClient(connectionString, containerName);
_container.CreateIfNotExists(PublicAccessType.None);
}
public async Task<string> UploadAsync(string pasteId, byte[] content)
{
var blobClient = _container.GetBlobClient($"pastes/{pasteId}");
using var stream = new MemoryStream(content);
await blobClient.UploadAsync(stream, overwrite: true);
return blobClient.Uri.ToString();
}
public async Task DeleteAsync(string pasteId)
{
var blobClient = _container.GetBlobClient($"pastes/{pasteId}");
await blobClient.DeleteIfExistsAsync();
}
}
Persisting Metadata to Database
Below is an example of how metadata can be stored in a relational database such as PostgreSQL or SQL Server:
CREATE TABLE paste_metadata (
id UUID PRIMARY KEY,
slug VARCHAR(120) UNIQUE NOT NULL,
blob_path TEXT NOT NULL,
content_hash CHAR(64) NOT NULL,
title TEXT,
language VARCHAR(50),
size_bytes BIGINT,
visibility VARCHAR(20) DEFAULT 'public',
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
deleted_at TIMESTAMP NULL
);
Creating and Saving a Paste (C# Example)
public async Task<string> CreatePasteAsync(byte[] content, string? title, string? language, string? customSlug)
{
var pasteId = PasteUtils.GeneratePasteId();
var slug = PasteUtils.GenerateSlug(customSlug, title);
var contentHash = PasteUtils.ComputeContentHash(content);
var sizeBytes = content.Length;
// Upload to blob
var blobPath = await _storageService.UploadAsync(pasteId, content);
// Save metadata
await _repository.InsertAsync(new PasteMetadata
{
Id = Guid.Parse(pasteId),
Slug = slug,
BlobPath = blobPath,
ContentHash = contentHash,
Title = title,
Language = language,
SizeBytes = sizeBytes
});
return slug;
}
Garbage Collection Workflow
A background Azure Function or Worker Service can periodically run cleanup jobs:
-- Mark paste deleted
UPDATE paste_metadata
SET deleted_at = NOW()
WHERE id = @Id AND owner_id = @OwnerId;
-- Select expired/deleted pastes for cleanup
SELECT id, blob_path
FROM paste_metadata
WHERE deleted_at IS NOT NULL
AND deleted_at < NOW() - INTERVAL '7 DAYS'
LIMIT 500;
-- After successful deletion from blob store
DELETE FROM paste_metadata WHERE id = @Id;
CDN Service
// Required packages:
// - StackExchange.Redis
// - Azure.Storage.Blobs
// - Azure.Storage.Sas
// - System.Text.Json
using System;
using System.Text;
using System.Text.Json;
using System.Threading;
using System.Threading.Tasks;
using Azure.Storage;
using Azure.Storage.Blobs;
using Azure.Storage.Sas;
using StackExchange.Redis;
public record PasteMetadata(
Guid Id,
string Slug,
string BlobPath,
long SizeBytes,
string Visibility,
int LatestVersion,
string? CachedSignedUrl,
DateTimeOffset? SignedUrlExpiryUtc
);
public class PasteCacheService
{
private readonly IDatabase _redis;
private readonly BlobServiceClient _blobService;
private readonly string _containerName;
private readonly TimeSpan _metaTtl = TimeSpan.FromMinutes(5);
private readonly TimeSpan _signedUrlTtl = TimeSpan.FromMinutes(5);
private readonly int _contentCacheThreshold = 16 * 1024; // 16 KB
private readonly TimeSpan _lockTtl = TimeSpan.FromSeconds(5);
public PasteCacheService(IConnectionMultiplexer mux, BlobServiceClient blobService, string containerName)
{
_redis = mux.GetDatabase();
_blobService = blobService;
_containerName = containerName;
}
private string MetaKey(Guid id) => $"paste:meta:{id}";
private string ContentKey(Guid id, int version) => $"paste:content:{id}:{version}";
private string LockKey(Guid id) => $"paste:lock:{id}";
// Deserialize helper
private static readonly JsonSerializerOptions _jsonOptions = new() { PropertyNamingPolicy = JsonNamingPolicy.CamelCase };
public async Task<PasteMetadata?> GetMetadataFromCacheAsync(Guid pasteId)
{
var raw = await _redis.StringGetAsync(MetaKey(pasteId));
if (raw.IsNullOrEmpty) return null;
try
{
return JsonSerializer.Deserialize<PasteMetadata>(raw!, _jsonOptions);
}
catch
{
// Corrupted cache? remove it
await _redis.KeyDeleteAsync(MetaKey(pasteId));
return null;
}
}
public async Task SetMetadataAsync(PasteMetadata meta)
{
var json = JsonSerializer.Serialize(meta, _jsonOptions);
await _redis.StringSetAsync(MetaKey(meta.Id), json, _metaTtl);
}
public async Task InvalidateMetadataAsync(Guid pasteId)
{
await _redis.KeyDeleteAsync(MetaKey(pasteId));
// Optionally delete latest content cache as well
// We could scan versions or keep latest version from DB - simplified: delete content:* prefix if Redis supports server-side scan in your environment.
}
// Try get small cached content; if not cached, attempt to build it and cache (with stampede protection)
public async Task<(bool IsContentCached, byte[]? Content, string? SignedUrl)> GetRawContentOrSignedUrlAsync(
PasteMetadata meta,
CancellationToken ct = default
)
{
// If content small and cached, return directly
if (meta.SizeBytes <= _contentCacheThreshold)
{
var contentKey = ContentKey(meta.Id, meta.LatestVersion);
var raw = await _redis.StringGetAsync(contentKey);
if (!raw.IsNullOrEmpty)
return (true, Convert.FromBase64String(raw!), null);
// Not cached — try to acquire lock and populate
var lockKey = LockKey(meta.Id);
var gotLock = await _redis.StringSetAsync(lockKey, "1", _lockTtl, When.NotExists);
if (gotLock)
{
try
{
// fetch from blob and cache
var content = await FetchBlobContentAsync(meta.BlobPath, ct);
// store as base64 string to preserve binary in Redis String
await _redis.StringSetAsync(contentKey, Convert.ToBase64String(content), _metaTtl);
return (true, content, null);
}
finally
{
await _redis.KeyDeleteAsync(lockKey);
}
}
else
{
// Another worker is populating. As fallback, provide signed URL.
var url = await EnsureSignedUrlAsync(meta, ct);
return (false, null, url);
}
}
else
{
// Large content: use signed URL
var url = await EnsureSignedUrlAsync(meta, ct);
return (false, null, url);
}
}
private async Task<string> EnsureSignedUrlAsync(PasteMetadata meta, CancellationToken ct)
{
// If cached SAS present and not expired, return it
if (meta.CachedSignedUrl != null && meta.SignedUrlExpiryUtc.HasValue && meta.SignedUrlExpiryUtc.Value > DateTimeOffset.UtcNow.AddSeconds(10))
{
return meta.CachedSignedUrl;
}
// Acquire lock to avoid multiple SAS creations
var lockKey = LockKey(meta.Id);
var gotLock = await _redis.StringSetAsync(lockKey, "1", _lockTtl, When.NotExists);
if (!gotLock)
{
// someone else creating SAS, just return a freshly generated but unsigned URL or wait. For simplicity, wait up to short timeout.
var sw = System.Diagnostics.Stopwatch.StartNew();
while (sw.Elapsed < TimeSpan.FromSeconds(2))
{
var freshMeta = await GetMetadataFromCacheAsync(meta.Id);
if (freshMeta?.CachedSignedUrl != null && freshMeta.SignedUrlExpiryUtc > DateTimeOffset.UtcNow.AddSeconds(10))
return freshMeta.CachedSignedUrl;
await Task.Delay(100, ct);
}
// fallback: generate SAS without updating cache
return GenerateBlobSasUrl(meta.BlobPath, _signedUrlTtl);
}
try
{
var sasUrl = GenerateBlobSasUrl(meta.BlobPath, _signedUrlTtl);
var updated = meta with
{
CachedSignedUrl = sasUrl,
SignedUrlExpiryUtc = DateTimeOffset.UtcNow.Add(_signedUrlTtl)
};
await SetMetadataAsync(updated);
return sasUrl;
}
finally
{
await _redis.KeyDeleteAsync(lockKey);
}
}
private string GenerateBlobSasUrl(string blobPath, TimeSpan ttl)
{
// blobPath expected to be "container/blobname" or just blob name depending on your storage scheme.
// For simplicity, let's assume blobPath is "pastes/{pasteId}/{version}" and container is known.
var container = _blobService.GetBlobContainerClient(_containerName);
var blobClient = container.GetBlobClient(blobPath);
// Generate SAS - requires storage account key or user delegation (not covered here)
var sasBuilder = new BlobSasBuilder
{
BlobContainerName = _containerName,
BlobName = blobClient.Name,
Resource = "b",
ExpiresOn = DateTimeOffset.UtcNow.Add(ttl)
};
sasBuilder.SetPermissions(BlobSasPermissions.Read);
// This requires BlobServiceClient created from storage account credentials that can sign SAS
// If using managed identity you must obtain user delegation key - omitted for brevity.
var credential = new StorageSharedKeyCredential(/* accountName */ "<acct>", /* accountKey */ "<key>");
var sasToken = sasBuilder.ToSasQueryParameters(credential).ToString();
return $"{blobClient.Uri}?{sasToken}";
}
private async Task<byte[]> FetchBlobContentAsync(string blobPath, CancellationToken ct)
{
var container = _blobService.GetBlobContainerClient(_containerName);
var blobClient = container.GetBlobClient(blobPath);
var ms = new System.IO.MemoryStream();
await blobClient.DownloadToAsync(ms, ct);
return ms.ToArray();
}
}Why CDN Is Needed Even with Redis and Blob Storage
Redis is excellent for fast metadata lookups, but:
- It’s not meant for large binary content.
- It’s centralized (or region-limited) — not global.
- It doesn’t reduce egress cost or latency for geographically distributed users.
That’s where the CDN comes in.
⚙️ CDN Role in the System
1. Edge Caching
When a client requests a paste’s raw content (say https://cdn.paste.example/abc123):
- The CDN checks if it already has that object cached.
- If cached → instantly serves from the nearest edge location.
- If not cached → fetches from blob storage origin, caches it, and serves to the user.
This reduces latency from 300–600 ms (blob access) to < 30 ms.
2. Offloading Blob Storage
Without CDN:
- Every read goes directly to blob storage.
- You pay egress for every request.
- Hot content causes unnecessary traffic.
With CDN:
- Most reads hit edge cache.
- Blob egress costs drop dramatically.
- Blob storage handles only cold or expired cache requests.
3. Security with Signed URLs
You can integrate signed URLs or signed cookies/headers:
- The API generates a short-lived signed URL (e.g., 5 min TTL).
- The CDN validates the signature before serving.
- Prevents abuse, direct-linking, or hotlinking.
This allows secure yet cacheable delivery.
4. Content Freshness and Invalidation
When a paste is updated:
- API invalidates or purges the old CDN object via API (e.g.,
cdnClient.Purge("pastes/{id}")). - CDN fetches the new version on next access.
- You can use versioned object keys (
pastes/{id}/v{version}) to avoid invalidation complexity.
🔸 Typical Flow Example
Read Path
- Client requests →
GET /v1/pastes/{id} API checks Redis:
- If metadata + small content cached → returns directly.
- Else → fetches metadata from DB.
- API generates signed CDN URL (for raw content).
Client fetches content from CDN.
- CDN serves from edge cache or pulls from blob store.
GET /v1/pastes/{id}API checks Redis:
- If metadata + small content cached → returns directly.
- Else → fetches metadata from DB.
Client fetches content from CDN.
- CDN serves from edge cache or pulls from blob store.
Write Path
- User submits new paste → goes through validation/scan queue.
- Worker uploads content to blob store.
- API updates DB and Redis.
- CDN cache remains empty until the first read (lazy population).
💡 Optional Optimization
For frequently accessed content:
- Pre-warm CDN by making a HEAD request from API to CDN origin right after publishing.
- Store small content (<16 KB) directly in Redis to skip CDN/Blob entirely.
✅ C# Example: Generating a Signed CDN URL (Azure Blob + Azure CDN)
using Azure.Storage.Blobs;
using Azure.Storage.Sas;
public static class CdnHelper
{
public static string GenerateSignedCdnUrl(string blobUrl, BlobSasPermissions permissions, TimeSpan validFor)
{
var blobUri = new Uri(blobUrl);
var blobClient = new BlobClient(blobUri);
var sasBuilder = new BlobSasBuilder
{
BlobContainerName = blobClient.BlobContainerName,
BlobName = blobClient.Name,
ExpiresOn = DateTimeOffset.UtcNow.Add(validFor),
Resource = "b"
};
sasBuilder.SetPermissions(permissions);
var sasToken = sasBuilder.ToSasQueryParameters(
new Azure.Storage.StorageSharedKeyCredential(
"your-storage-account",
"your-account-key"
)
).ToString();
return $"{blobClient.Uri}?{sasToken}";
}
}
using Azure.Storage.Blobs;
using Azure.Storage.Sas;
public static class CdnHelper
{
public static string GenerateSignedCdnUrl(string blobUrl, BlobSasPermissions permissions, TimeSpan validFor)
{
var blobUri = new Uri(blobUrl);
var blobClient = new BlobClient(blobUri);
var sasBuilder = new BlobSasBuilder
{
BlobContainerName = blobClient.BlobContainerName,
BlobName = blobClient.Name,
ExpiresOn = DateTimeOffset.UtcNow.Add(validFor),
Resource = "b"
};
sasBuilder.SetPermissions(permissions);
var sasToken = sasBuilder.ToSasQueryParameters(
new Azure.Storage.StorageSharedKeyCredential(
"your-storage-account",
"your-account-key"
)
).ToString();
return $"{blobClient.Uri}?{sasToken}";
}
}
Then your metadata in Redis could include:
{
"id": "abc123",
"title": "Hello World",
"blob_path": "pastes/abc123/v1",
"cdn_url": "https://cdn.paste.example/pastes/abc123?v=1",
"signed_url": "https://cdn.paste.example/pastes/abc123?v=1&sig=..."
}
...
Queue and Worker Processing (Azure Function Example)
public class PasteCleanupFunction
{
private readonly PasteStorageService _storageService;
private readonly PasteRepository _repository;
[FunctionName("PasteCleanupWorker")]
public async Task RunAsync([TimerTrigger("0 */15 * * * *")] TimerInfo timer, ILogger log)
{
var expired = await _repository.GetExpiredPastesAsync(TimeSpan.FromDays(7));
foreach (var paste in expired)
{
await _storageService.DeleteAsync(paste.Id.ToString());
await _repository.DeleteAsync(paste.Id);
log.LogInformation($"Cleaned up paste {paste.Id}");
}
}
}