Building Redis from scratch

A codecrafter challenge to build redis tool from scratch

· 12 min read

Overview

Building Redis from scratch sounds intimidating, but here’s the secret: Redis is actually a surprisingly simple piece of software at its core. It’s a simple TCP server that reads commands, manipulates data structures in memory, and sends responses back. That’s it.

The beauty of Redis isn’t in complexity - it’s in focus. It does a few things exceptionally well:

  1. Fast parsing - the RESP protocol is dead simple
  2. Fast data structures - everything lives in RAM with extremely fast read/write operation
  3. Fast I/O - single-threaded. so no lock contention

In this guide, we’re going to build our own Redis. Not a toy version - something that can actually handle real redis-cli connections and execute commands.

So what is Redis, Really?

Let’s start with understanding what redis actually is. Redis is an in-memory data store used as a database, cache, and message broker.

Core idea: data lives in RAM, so reads/writes are extremely fast (microsecond latency).

Data structures it supports: Strings, Lists, Sets / Sorted sets, Hashes, Streams etc

Common use cases:

  • Caching - store expensive DB query results with a TTL
  • Sessions - fast user session storage
  • Rate limiting - atomic counters per user/IP
  • Pub/Sub - lightweight message passing between services
  • Queues - using Lists or Streams
  • Leaderboards - sorted sets make ranking trivial

Key properties:

  • Single-threaded command execution (no lock contention)
  • Optional persistence (RDB snapshots or AOF logs)
  • Replication and clustering for scale/HA

Our Building Strategy

We’re going to build Redis from the ground up in layers:

Layer 1: The Foundation (Setting Up)

  • Create a TCP server that accepts connections
  • Understand buffered I/O and why it matters
  • Set up the basic project structure

Layer 2: The Protocol (RESP)

  • Learn why Redis uses RESP instead of JSON/HTTP
  • Understand how RESP encodes data
  • Build a parser that converts bytes into commands

Layer 3: The Commands

  • Implement a key-value store in memory
  • Add support for basic commands (PING, GET, SET, etc.)
  • Handle concurrent clients

By the end, you’ll have a working Redis server that real Redis clients can connect to. More importantly, you’ll understand the design decisions that make Redis so fast.

Setting Up

Before we dive into the RESP protocol, let’s set up our project and understand what we’re actually building.

The Basic Architecture

At its core, our Redis implementation needs three main components:

  1. TCP Server - listens for connections and handles multiple clients
  2. Protocol Parser - converts incoming bytes into commands we understand
  3. Command Handler - executes commands and returns responses

Think of it like a restaurant: customers are the clients, the TCP server is the host greeting customers, the parser is the waiter taking orders, and the command handler is the kitchen preparing food.

Creating the Project

Let’s start with a new Rust project:

cargo new redis-server
cd redis-server

I am planning to build it in rust. You can choose to use language that you are confortable with. Why Rust? It gives us memory safety without garbage collection, which is perfect for a high-performance server. Plus, the type system helps us avoid common networking bugs.

Building the TCP Server

First of all, we need to spin off a server and listen on a particular port. It will act like a host of our restaurant, waiting for the customers. Here is a simple TCP server

use std::net::{TcpListener, TcpStream};
use std::io::{BufReader, Write};

fn main() -> std::io::Result<()> {
    let listener = TcpListener::bind("127.0.0.1:6379")?;
    println!("Redis server listening on port 6379");

    for stream in listener.incoming() {
        let stream = stream?;
        println!("New client connected!");
        handle_client(stream)?;
    }

    Ok(())
}

fn handle_client(mut stream: TcpStream) -> std::io::Result<()> {
    // We'll implement this next
    Ok(())
}

Let’s break down what’s happening:

TcpListener::bind("127.0.0.1:6379"): This tells the OS “reserve port 6379 for me”. The OS will now route any TCP connections to this port to our program.

listener.incoming(): This is a blocking iterator. Each iteration waits for a new client to connect. When one does, it gives us a TcpStream.

TcpStream: This is a bidirectional pipe to the client. We can read their requests and write our responses.

Here’s where it gets interesting. When a client connects and sends data, it might come in chunks:

// Client sends: SET key value
// We might receive:
Chunk 1: "*3\r\n$3\r\n"
Chunk 2: "SET\r\n$3\r\nkey\r\n$5\r\n"
Chunk 3: "value\r\n"

Why does this happen? TCP is a stream protocol. It guarantees bytes arrive in order, but NOT that they arrive in the same groupings you sent them.

This is like receiving a letter one word at a time through the mail slot. You need to collect all the words before you can read the full message.

Buffering the Solution

This is why we use BufReader:

fn handle_client(stream: TcpStream) -> std::io::Result<()> {
    let mut reader = BufReader::new(stream);

    loop {
        // We'll parse RESP commands here
        // For now, just print what we receive
    }
}

What does BufReader do? It maintains an internal buffer. When you ask for data, it:

  1. Checks if it already has enough buffered
  2. If not, reads a big chunk from the TCP stream
  3. Returns what you asked for and keeps the rest

This means fewer system calls and easier parsing. Instead of reading byte-by-byte from the network (slow!), we read in chunks.

Testing Our Server

Let’s add a simple echo to test it:

use std::io::BufRead;

fn handle_client(stream: TcpStream) -> std::io::Result<()> {
    let reader = BufReader::new(stream.try_clone()?);
    let mut writer = stream;

    for line in reader.lines() {
        let line = line?;
        println!("Received: {}", line);
        writeln!(writer, "Echo: {}", line)?;
    }

    Ok(())
}

Run this and connect with telnet:

cargo run
# In another terminal:
telnet localhost 6379

Type anything and you’ll see it echoed back. This proves our TCP server works!

Why this won’t work with Redis: Real Redis clients send binary RESP data, not text lines. Try connecting with redis-cli and you’ll see gibberish - those are RESP commands our server doesn’t understand yet.

What We’ve Built So Far

We now have:

  • A TCP server that accepts connections
  • Buffered reading for efficient I/O
  • A place to add our RESP parser

The key insight: The TCP server and RESP parser are separate concerns. The server handles connections and bytes. The parser handles protocol logic. This separation makes both easier to reason about.

Next Steps

Now we’re ready to understand why Redis uses RESP instead of simple text commands, and how to parse it. With our TCP infrastructure in place, we can focus purely on the protocol.

Why RESP?

Before we start coding, let’s understand why Redis needs its own protocol. You might be thinking: “Why not just use JSON over HTTP like every other API?”

Here’s the thing - Redis is designed for speed. Every microsecond matters when you’re serving millions of requests per second. HTTP adds overhead with headers, connection setup, and JSON parsing is relatively slow.

RESP (Redis Serialization Protocol) solves this by being:

  • Simple: You can literally type it by hand and debug it with telnet
  • Fast to parse: No complex state machines or lookahead needed
  • Binary-safe: Can handle any bytes, not just UTF-8 strings
  • Explicit about types: You know exactly what you’re getting

Think of it like this: HTTP is a formal letter with an envelope, stamps, and return address. RESP is a quick note passed across the table. Both work, but one is way faster when you need it.

RESP Basics

RESP uses a clever trick: the first byte tells you what type of data follows. Let’s explore each type with real examples.

Simple Strings - The “OK” Response

When Redis wants to say “yep, got it”, it sends:

+OK\r\n

The + means “simple string”, then comes the text, then \r\n (carriage return + newline) to mark the end.

Why have this? Simple strings are fast - no length counting needed. Redis uses them for simple confirmations.

Errors - When Things Go Wrong

Errors look similar but start with -:

-ERR unknown command 'TYPO'\r\n

Key insight: By making errors a different type, clients can handle them differently without parsing the message content first.

Integers - Counting Things

When Redis needs to return a number (like “how many items in this list?”):

:42\r\n

The : prefix says “this is an integer”, then the number as ASCII digits.

Why not just send the binary number? Because RESP is designed to be human-readable when you’re debugging. You can see :42 in a network trace and know exactly what it means.

Bulk Strings - The Real Workhorses

This is where it gets interesting. When Redis stores actual data (which could be anything - text, images, binary data), it uses bulk strings:

$5\r\nhello\r\n

Let’s break this down:

  • $ = “bulk string incoming”
  • 5 = “it’s exactly 5 bytes long”
  • \r\n = “now the data starts”
  • hello = the actual 5 bytes
  • \r\n = “data ended”

Why declare length upfront? This is brilliant. The parser knows exactly how many bytes to read. No scanning for delimiters, no escaping issues. If you’re sending binary data with \r\n in it, no problem - the parser just reads 5 bytes and moves on.

Special case - Null: What if a key doesn’t exist? Redis sends:

$-1\r\n

The length of -1 means “null” (no data follows).

Arrays - Commands and Responses

Here’s where it all comes together. Redis commands are sent as arrays of bulk strings:

*3\r\n$3\r\nSET\r\n$3\r\nkey\r\n$5\r\nvalue\r\n

Let’s decode this step by step:

  1. *3\r\n - Array with 3 elements
  2. $3\r\nSET\r\n - First element: “SET” (3 bytes)
  3. $3\r\nkey\r\n - Second element: “key” (3 bytes)
  4. $5\r\nvalue\r\n - Third element: “value” (5 bytes)

This is the command: SET key value

Why arrays of bulk strings? Because it’s unambiguous. Each argument has an explicit length. No worrying about spaces in filenames or escaping quotes.

Try this mental exercise: How would you send a filename with spaces using a simple space-delimited protocol? You’d need escaping. What if the filename contains your escape character? More escaping. RESP sidesteps all of this.

Building the Parser

Now that we understand why RESP is designed this way, let’s think about how to parse it.

The Core Insight

Here’s the key realization: parsing RESP is just following the instructions. The data tells you exactly what to do:

  1. Read one byte - this tells you the type
  2. Based on the type, read the appropriate format
  3. For arrays, recursively parse each element

Let’s think through parsing that SET key value command:

*3\r\n$3\r\nSET\r\n$3\r\nkey\r\n$5\r\nvalue\r\n

Step 1: Read first byte = *. It’s an array!

Step 2: Read until \r\n to get count = 3. We need to parse 3 elements.

Step 3: Parse first element:

  • Read byte = $. It’s a bulk string!
  • Read until \r\n = 3. Length is 3.
  • Read exactly 3 bytes = SET
  • Skip the trailing \r\n

Step 4: Parse second element (same process):

  • $, then 3, then read 3 bytes = key

Step 5: Parse third element:

  • $, then 5, then read 5 bytes = value

Result: Array of [“SET”, “key”, “value”]

Implementation Strategy

Here’s an approach to implementing this:

// First, define what we can parse
enum RespValue {
    SimpleString(String),
    Error(String),
    Integer(i64),
    BulkString(Option<Vec<u8>>),  // Option handles null case
    Array(Option<Vec<RespValue>>), // Can nest arrays!
}

The parsing function follows the mental model:

fn parse(reader: &mut impl BufRead) -> Result<RespValue, Error> {
    // Read one byte to know the type
    let type_byte = read_one_byte(reader)?;

    match type_byte {
        b'+' => parse_simple_string(reader),
        b'-' => parse_error(reader),
        b':' => parse_integer(reader),
        b'$' => parse_bulk_string(reader),
        b'*' => parse_array(reader),
        _    => Err("Unknown type!"),
    }
}

Key teaching points:

  1. The match structure directly reflects the protocol spec
  2. Each parse function is simple because the protocol tells us exactly what to expect
  3. Errors happen when reality doesn’t match the protocol

For bulk strings, the interesting part is:

fn parse_bulk_string(reader: &mut impl BufRead) -> Result<RespValue, Error> {
    let length = read_line_as_number(reader)?;

    if length == -1 {
        return Ok(RespValue::BulkString(None)); // Null!
    }

    // Read exactly `length` bytes
    let mut buffer = vec![0u8; length as usize];
    reader.read_exact(&mut buffer)?;

    // Don't forget to consume the trailing \r\n
    skip_crlf(reader)?;

    Ok(RespValue::BulkString(Some(buffer)))
}