RDX is a syncable data format that is very similar to JSON in look and feel, but has a formal data model and a formal mutation model, so any implementations can interpret the data identically and keep replicas in perfect sync with no global consensus.
RDX is influenced by JSON, relational model, LSM key-value stores and CRDTs, and it can express these data models at least on the 80/20 basis. The RDX formal model can guarantee that RDX implementations map 1:1 to one another. The mutation model can guarantee that the same update merged to the same data produces the same result, irrespectively of the implementation or the format variant being used. Compare that to e.g. JSON where parsers may diverge in interpretation and note that no “binary JSON” format maps to JSON 1:1.
# 10.0 is 10.0 irrespectively of the notation
$ xx hash '1e1'
BLAKE256: 16c8bb58dffda8523a97834cfa8fcc2b7cd132edd4bccfde11729802784b7ae9
$ xx hash '10.0'
BLAKE256: 16c8bb58dffda8523a97834cfa8fcc2b7cd132edd4bccfde11729802784b7ae9
# here we play with a set
$ xx hash '{1 2 1 4 3 five}'
BLAKE256: 8b534c0e309768fa57ebcc4028ce3c457836b5221da2ced1907c47baeadbe828
$ xx hash '{five 1 2 3 4}'
BLAKE256: 8b534c0e309768fa57ebcc4028ce3c457836b5221da2ced1907c47baeadbe828
# here we play with subsets of that set
$ xx merge '{1 2 3}' '{4 five}' | xx hash
BLAKE256: 8b534c0e309768fa57ebcc4028ce3c457836b5221da2ced1907c47baeadbe828
$ xx merge '{1 2 3}' '{4}' '{1 2 five}' | xx hash
BLAKE256: 8b534c0e309768fa57ebcc4028ce3c457836b5221da2ced1907c47baeadbe828
The RDX data change model is defined through the merge operation: updates get merged into the preexisting state, producing the resulting state. The difference between “state”, “update”, “patch”, or “operation” is the one of quantity, not quality. Those are chunks of RDX, big and small. To enable deterministic merges, RDX allows for element versioning metadata.
# here we merge versions of a tuple
$ xx merge '(1 2 4)' '(1 2 3@2 5)' '(1 2 4 5@1)'
(1, 2, 3@2, 5@1)
# odd revision numbers are tombstones (deleted elements)
$ xx merge '(1 2 3@2 5)' '(1 2 4 5@1)' '(1 2 4 5)'
(1, 2, 3@2, 5@1)
# here we strip the metadata and the deleted element
$ xx strip '(1 2 3@2 5@1)'
(1, 2, 3)
An RDX documents consists of elements. Five primitive elements types are:
double in most languages,int64_t little-endian, two’s complement,true, null and suchlike.On top of that, there are four collection types which allow for arbitrary nesting:
RDX makes no attempt to innovate on the syntax; everything is JSON like.
For the FIRST elements, the notation is identical to JSON. Similarly,
Linear and Eulerian collections use the same brackets, [] and {}.
JSON has rudimentary tuples, like "key":"value"; RDX generalizes that
to an arbitrary number of elements, e.g.
TEL555230975:"John":"Smith":"johnsmith@gmail.com".
RDX tuples not only turn Eulerian sets into maps; they also express
relational records. Alternatively to the colon notation, RDX tuples
can go in brackets like the other collection types:
(TEL555230975, "John", "Smith", "johnsmith@gmail.com").
Finally, multiplexed collections can store version vectors, distributed
counters and similar types. They use angled brackets <>.
Overall, the syntax is trying to follow the rule of least surprise.
The last, but not the least important aspect of RDX is the invariants.
For all the format varieties, there is a guarantee that any valid document
in one version of the format can be converted into another version and back
with no corruption. For example, the JSON-like text format to the in-memory
representation in whatever language and back. That is the round-trip
guarantee.
With the merge operation, the invariants are pretty standard: idempotence,
commutativity, associativity. Namely, having RDX documents A, B, C and +
being the merge operation, it is guaranteed that:
A+A = AA+B = B+AA + (B + C) = (A + B) + CThese invariants are easy to formalize, analyze, prove, test and fuzz.
RDX provides a unified iterator interface for traversing documents across
all format variants (JDR text, TLV binary, SKIL indexed, etc). The core
abstraction is the rdx struct which serves as both parsed element and
iterator state.
Three operations define the traversal:
rdxNext(rdxp x) — move horizontally to the next sibling elementrdxInto(rdxp child, rdxp parent) — descend into a collection (PLEX)rdxOuto(rdxp child, rdxp parent) — ascend back to parentReturn codes: OK on success, END when exhausted, error codes otherwise.
// Iterate elements in a set
rdx root = {.format = RDX_FMT_JDR};
u8csFork(root.data, input);
call(rdxNext, &root); // position at root element
rdx child = {};
call(rdxInto, &child, &root); // enter collection
scan(rdxNext, &child) { // iterate children
// child.type has RDX_TYPE_*, child.i/f/s/r has value
}
call(rdxOuto, &child, &root); // exit collection
For nested structures, use rdxb — a stack of iterators:
a_pad(rdx, stack, RDX_MAX_NESTING);
rdxbFed1(stack);
(**stack).format = RDX_FMT_JDR;
u8csFork((**stack).data, input);
call(rdxbNext, stack); // uses top of stack
call(rdxbInto, stack); // pushes new iterator
call(rdxbOuto, stack); // pops iterator
The same API works for writing by setting RDX_FMT_WRITE:
rdx writer = {.format = RDX_FMT_TLV | RDX_FMT_WRITE};
u8sFork(writer.into, output_buffer);
// Write a primitive: set type, value, then call Next
writer.type = RDX_TYPE_INT;
writer.i = 42;
call(rdxNext, &writer);
// Write a string
writer.type = RDX_TYPE_STRING;
writer.cformat = RDX_UTF_ENC_UTF8;
u8csMv(writer.s, some_string);
call(rdxNext, &writer);
For collections, write the container header, enter it, write children, exit:
// Write a tuple (key, value)
writer.type = RDX_TYPE_TUPLE;
writer.id = (id128){source, seq}; // optional metadata
call(rdxNext, &writer); // emit container header
rdx child = {};
call(rdxInto, &child, &writer); // enter container
child.type = RDX_TYPE_STRING;
u8csMv(child.s, key);
call(rdxNext, &child); // emit key
child.type = RDX_TYPE_STRING;
u8csMv(child.s, value);
call(rdxNext, &child); // emit value
call(rdxOuto, &child, &writer); // close container
To seek within a collection, set the child’s type and value before calling
rdxInto. For keyed containers (EULER, MULTIX), it positions at the first
element >= the search key. For positional containers (TUPLE, LINEAR), an
INT value seeks to that index position.
rdx child = {};
child.type = RDX_TYPE_STRING;
u8csMv(child.s, search_key);
ok64 o = rdxInto(&child, &parent); // positions at >= search_key
// Verify exact match using the parent's comparator
if (o == NONE || Z(&search, &child)) {
// not found or passed it
}
For write mode, a similar insert convention (optional to implement, depends
on format): set the child’s type and value, call rdxInto. If a child with
that key exists, it positions on it; if not, it creates one and positions.
rdx child = {};
child.type = RDX_TYPE_STRING;
u8csMv(child.s, key);
call(rdxInto, &child, &writer); // find or create
// Now positioned on existing or new child - write into it
Format conversion happens through rdxMerge which reads from input
iterators and writes to an output iterator, normalizing as it goes.