GET, SET, or DEL to a single central Redis Server. The server reads the command from the connection, parses it,
evaluates
it, and writes the response back to the client.
But how does the server actually understand a command? To
a computer, SET userName Alejandro is nothing more than
a raw string of characters. Before the server can act on it,
it needs to transform it into a meaningful representation.
That process is called Parsing.
So, before we can evaluate a Command, we are going to transform it twice into more accessible representations:
The first transformation is turning our Raw Command into Tokens, which are the atomic units that compose a Command Representation.
The process of turning raw text into meaningful Tokens is called Lexical Analysis, and it's done by a Lexer.
For example, feeding this command to our Lexer:
The Lexer Produces the following:
setCommandTokens := []Token{
{Kind: SET, Literal: "SET"},
{Kind: IDENT, Literal: "userName"},
{Kind: IDENT, Literal: "Alejandro"},
}
Notice that not all Tokens are equal. SET is a command token — it represents the operation to perform. The remaining two are identifier tokens — they carry the
arguments. Each
Token knows both what kind it is and what
it contains.
This is the theory — let's define some tokens!
Defining Tokens
As shown above, SET userName Alejandro is not treated as
one long string. The Lexer identifies three distinct Tokens: a command token
and two identifier tokens — each with its own TokenKind.
TokenKind is the first entity we will define in our project.
Enough theory — it's time to write our first lines of code.
Defining TokenKinds
type TokenKind byte
// Check the source code to see all the Tokens!
const (
EOF TokenKind = iota // EOF = End Of File
ILLEGAL
// Special
CRLF
// All Commands have their respective TokenKind
keyword_beg
IDENT
SET
GET
INCR
INCRBY
...
As you can see, the first TokenKinds we define are EOF and ILLEGAL. These two play very specific roles in our Lexer:
-
EOF: Stands for End Of File and signals that there is no more input to read. -
ILLEGAL: represents any character our Lexer doesn't recognize.
byte for TokenKind?
You might have noticed that we defined TokenKind as a byte. This is a deliberate engineering decision — here's why
we chose it over a string or a standard int:
- Memory Efficiency: Since our Lexer will generate a new instance for every single word or symbol in a command, using a 1-byte type keeps our memory footprint as small as possible.
- Sufficient Capacity: A
byte( an alias foruint8) can represent up to 256 unique values. Since our Redis clone will only have a few dozen command types and symbols, this is the perfect, lightweight fit for ourTokenKind.
With our TokenKind defined, we now need a container to hold
two things: the kind of token it is, and the raw text it was derived from — which as seen in the SET Tokens example we'll call the Literal.
🔥 Fun Fact: Even thought i just said that the Token struct should be as lightweight as possible, this is not the most lightweight representation for a Token.
The Golang Parser uses an amazing technique: instead of storing a piece of text in the struct as we do, they just store the indices from where the piece of text beings and ends, a simple and ingenious trick if you want to implement it, Check the details here.
Now the only thing left is creating a constructor for our Token struct. This will make instantiating new Tokens clean and consistent throughout
the codebase: