How CouchDB Revision Number Generation Works
(Technical post, yadda yadda)
CouchDB uses data-derived revision numbers for documents stored in it; these revision numbers are a big aid in replicating databases between servers: it’s trivial to tell if two replicas were starting with the same revision history because the revision numbers tell the story exactly.
The short answer to how CouchDB generates these revision numbers is “it’s the MD5 of the document and its attachments”, but the truth is unfortunately not that simple. If you check out the code that computes revisions, you will see this:
couch_util:md5(term_to_binary([Deleted, OldStart, OldRev, Body, Atts2]))
Which looks simple, but unfortunately is too simple: the above depends on Erlang’s encoding of “terms” into binary strings (the term_to_binary call), and the hash is computed on that binary string.
An aside: shit like this is super lazy, and destroys interoperability with other systems. If you ever have to encode something into binary for purposes like hashing, use a common format. Don’t ever depend on language-specific encodings like this.
It is, happily, not too difficult to compute this encoding — the format is pretty well documented online. The hash is made over a list with five elements: a Deleted flag, an OldStart number (this seems to be the update sequence of the document), an OldRev value (this is the MD5 hash of the previous version of the document), a Body (the document, the object that was decoded from JSON), and an Atts2 list (I believe these are the attachments; I haven’t investigated how this variable is hashed, so I’ll just treat it as an empty list below).
First, to start off, you will need to emit six bytes for the serialization prologue, list begin marker, and list length (which is 5). Lengths are usually four bytes, in big-endian byte order:
83 6c 00 00 00 05 |.l....|
And the sequence always ends with a nil value (all lists end with a nil list); the stream will always end with the byte 0x6a.
Now, encode whether the document is a deleted version or not; CouchDB keeps “tombstone” entries for documents that have been deleted, for proper revision tracking. The deleted marker is a boolean, but actually gets encoded as an atom, so if your document is deleted, emit the atom true:
64 00 04 74 72 75 65 |d..true|
If not, emit the atom false:
64 00 05 66 61 6c 73 65 |d..false|
Next the sequence number, which is emitted as an integer in one of two ways; if it fits into one byte, emit a “small int” (the example below is for a sequence == 0):
61 00 |a.|
If it’s larger than a byte, emit it as a normal, four-byte integer (below the value is 256):
62 00 00 01 00 |b....|
The OldRev field is either null (for the first revision of a document), or a 16-byte string. For the null case, just emit a small integer 0, as above. For a non-null revision, emit a binary string, which consists of a marker, the length as four bytes, then that many bytes. For example, the 16-byte revision string consisting of all 0xff bytes:
6d 10 ff ff ff ff ff ff ff ff ff ff ff ff ff ff |m...............|
ff ff |..|
The Body is the trickiest one; it is the decoded JSON document, minus internal variables like _rev and _id. It is represented as:
- A tuple with arity 1, which contains:
- A list, containing as many key/value pairs as are in the document, each of which is:
- A tuple with arity 2, which contains the key (a binary string), followed by the value.
For example, the JSON document:
{
"foo": "bar",
"baz": "baz",
"quux": 1234,
"boolean": true,
"otherboolean": false,
"list": [ 1, 2, 3 ],
"obj": { "a": "b" }
}
Becomes this, in Erlang:
{[
{<<102,111,111>>,<<98,97,114>>},
{<<98,97,122>>,<<98,97,122>>},
{<<113,117,117,120>>,1234},
{<<98,111,111,108,101,97,110>>,true},
{<<111,116,104,101,114,98,111,111,108,101,97,110>>,false},
{<<108,105,115,116>>,[1,2,3]},
{<<111,98,106>>,{[{<<97>>,<<98>>}]}}
]}
The order of keys in the source document is preserved, so there is no dependence on hash order or other implementation details.
The mapping between JSON types and Erlang is:
- Booleans are mapped to the atoms
trueorfalse. - Nulls are mapped to the atom
null. - Integers are integers.
- Floating-point numbers are floating-point numbers.
- Strings are binary strings.
- Lists are lists.
- Objects are arity-1 tuples, containing a list of key/value pairs, as arity-2 tuples.
Booleans, nulls, integers, and strings should be obvious from the usage above. Now, for the rest of the possible items:
Float values begin with the marker 0x63 followed by 31 bytes that contain the string representation of the float, as if formatted with the format string "%.20e. If the string is shorter than 31 bytes, it is padded with zero bytes.
Lists should be obvious from above; they begin with the byte 0x6c, have the number of items in the list as a four-byte integer in big endian byte order, and end with a nil value, byte 0x6a. E.g.:
6c 00 00 00 05 |l....|
...
6a |j|
Objects begin with a small tuple header, and count of one:
68 01 |h.|
Then, there is the list of key/value pairs, encoded as a list like above. Each key/value pair begins with an arity-2 tuple header:
68 02 |h.|
Then the key and the value.
Serialization can continue from there; keys should be strings, and values can be encoded recursively.
The Attr2 element has some form, but I haven’t determined what yet (my assumption is a list of tuples, mapping the attachment name to its value). If you aren’t dealing with attachments, emit an empty list (a nil, byte 0x6a).