librdx

I. SCM as a database for the code

Software development is changing rapidly and the tool stack has yet to catch up. As we see, the value of IDEs diminishes as developers are less inclined to edit the code now. More and more of the work is browsing and talking to LLMs, less and less is coding and debugging. About 8 years ago I gave a talk at the internal JetBrains conference “Code is hypertext, IDE is a browser”. Those points look even more relevant now: effective browsing of code and history is a prerequisite to effective understanding. Understanding underlies everything now. No understanding = no control, then a developer is like a rider who fell off a LLM horse with his foot caught in the stirrup (you may search YouTube to understand what I mean).

git is increasingly becoming a point of friction. LLMs have high throughput in regard to code edits. Sorting out the changes then takes disproportionate time and often repeats your previous work, if you actually read the diffs during the session, which I highly recommend. Even single-person development now becomes collaborative: at the very least, your collaborator is an LLM. In calm waters, running several agents is nothing special. Then I have an entire team, with all the merges and rebases (which we like to do beyond any measure).

That is why I think it is the right time to look for git replacements, and that is why I am working on one. I definitely reject the “git compatible” approach despite the immense gravitation of the existing mass of git repos. jj to git is what subversion was to cvs. What we need is what git was to cvs: a level-up. All the long-standing and all the new issues are all rooted in the core architecture of git. In any other case, those issues would be fixed by now just by gradual and incremental improvement.

The issues are:

Overall, we need a database for the code!

Again, these points I mentioned at various conferences during the past 10 years, and many other people in the CRDT community talked about “overlay branches” and “CRDT revision control” for 10-15 years. In essence it all boils down to two things:

  1. versioning data structures, not blobs and
  2. having formal deterministic merge algorithms (associative, commutative, idempotent).

One approach to it was to represent text as a CRDT vector of letters, and it was quite popular in the field. Zed’s DeltaDB aligns with that approach. I also made such systems in the past. It is safe to assume it the default. On the other hand, if we look into the inners of any JetBrains IDE or LLVM internals, we will see AST trees. Because code has structure. If you want to treat all source code the same, you use line-based text (like all UNIX tools do). If you want to do fancy stuff, you parse the source and work with ASTs. Git is a filesystem, so it treats everything as a blob (git diff receives input blobs and reconstructs the most plausible edits algorithmically).

Here I see the opportunity: a revision control system working with AST-like trees, with very formal, deterministic and reversible split/join/fork/merge semantics and a structure-aware query language. As a substrate, I use Replicated Data eXchange format (RDX), a JSON superset with very nice CRDT merge semantics.

Part II. CLI and REST interfaces

Part III. Inner workings of CRDT revision control.

Part IV. Experiments.

Part V. The Vision.