LRU Cache from Scratch

#01 · 4,200 words

Every engineer has been asked to “design an LRU cache” at some point. Most reach for a hash map and a doubly-linked list — the right instinct. But the interesting part isn’t what data structures to use. It’s why this combination works, what trade-offs it encodes, and where it falls apart.

This post builds an LRU cache from first principles. We’ll start with the interface, derive the data structure from the access patterns, implement it in C, then explore what changes under concurrency, bounded memory, and production constraints.

The interface

Before any internal design, define the contract. An LRU cache supports two operations:

typedef struct LRUCache LRUCache;

// Returns NULL on cache miss.
void* lru_get(LRUCache *cache, const char *key);

// Evicts least-recently-used entry if at capacity.
void  lru_put(LRUCache *cache, const char *key, void *value);

Both operations must be O(1). That single constraint — constant time for both read and write — eliminates most candidate data structures and forces our hand toward a specific design.

Deriving the design

Let’s think about what O(1) requires:

O(1) lookup by key → hash map. Nothing else gives us constant-time key-based access.

O(1) eviction of the oldest entry → we need an ordering structure where removing the tail is constant time. A doubly-linked list gives us this.

O(1) “move to front” on access → when we read a key, it becomes the most recently used. We need to remove it from its current position and move it to the head. In a doubly-linked list, removal is O(1) if we have a pointer to the node — and the hash map gives us exactly that pointer.

The hash map stores key → node pointer. The linked list maintains recency order. They reference each other. Neither works alone.

Implementation

The node carries a key, a value, and pointers in both directions:

typedef struct Node {
    char         *key;
    void         *value;
    struct Node  *prev;
    struct Node  *next;
} Node;

typedef struct LRUCache {
    int       capacity;
    int       size;
    Node     *head;     // sentinel — most recent side
    Node     *tail;     // sentinel — least recent side
    HashTable *map;     // key → Node*
} LRUCache;

The sentinel nodes simplify edge cases. We never insert or remove the sentinels themselves — they just mark the boundaries. Every real node lives between head and tail.

The get operation

void* lru_get(LRUCache *cache, const char *key) {
    Node *node = hashtable_get(cache->map, key);
    if (!node) return NULL;

    // Move to front: detach, then reattach after head.
    detach(node);
    attach_after(cache->head, node);

    return node->value;
}

Two pointer swaps for detach, two for attach. Constant time. The hash map lookup is O(1) amortised. Total: O(1).

Trade-offs and extensions

Concurrency — a single lock serialises all access. Sharding by key hash gives per-shard locks. At extreme scale, a lock-free approach with CAS on the linked list pointers is possible but rarely worth the complexity.

Size-bounded vs count-bounded — most real caches bound by memory, not entry count. This means tracking the size of each value and evicting until you’re under budget, not just evicting one entry.