Beagle: git, URIs and all the dirty words

Human authored

Git's basic model is a wonderfully simple system of blob trees and commit chains that one can explain in 5 minutes to anyone. Further up the stack, that wonderful simplicity devolves into a mess of commands and flags developers with 20 years of git experience have difficulty remembering.

That is doubly so when multi-tasking with LLMs. "I believe we implemented it on Tuesday, but it is not here. Where is it?" "Which branch corresponds to that remote?" And so on.

If only we had some universal language to address and access local and remote resources, files and locations in files! Oh wait, we have HTTP and URI, which are as standard as it gets. Those were specifically designed for this task. Supported in so many apps and libs. Can we apply that to git?

URIs

The URI layout we all remember by heart:

  1. scheme: -- the access protocol / addressing scheme,
  2. //authority -- most often the network host,
  3. /path -- path in the remote filesystem,
  4. ?query -- other stuff (like arguments),
  5. #fragment -- location within the document.

Can we retrofit that to a versioned store? Well, if all the versioning info goes into the query, the rest is obvious. http://somehost/dir/file?branch#L101 for example. In fact, Beagle is a git-compatible SCM doing exactly that.

HTTP verbs

The case of HTTP is more interesting. Originally, HTTP has a vocabulary of verbs: HEAD, GET, PUT, POST, PATCH, DELETE. Although, people only use GET and POST nowadays. But, there was some reason for the other verbs to exist, right?

While the vocabulary is a bit vague, fundamentally it grows out of the need to access a remote filesystem. That fits naturally the git model, which is, as described, a [content- addressed filesystem]f. For that reason, Beagle uses the HTTP verbs exclusively.

Wait, but it only has patch? What about merge vs rebase?

Git's dirty words

There is always plenty of confusion around merge, rebase, squash, cherry-pick and all the related techniques of git-handling the twisted history of edits. Each command does several often unrelated things and each thing can be done by several commands, subtly differently.

Beagle decomposes those practices into a set of orthogonal operations, building on that wonderfully simple underlying model of git:

As you might see, there is no way to supplement one operation by another: they are strictly orthogonal. Let's see how that applies to the pandemonium of merge/rebase/squash/cherrypick.

Let's see what all git merge variants do:

  1. they apply changes from a diverging commit or branch,
  2. they reuse (rebase) or add new (merge, squash) message,
  3. they refer to the original (merge) or not (rebase, squash).

Consequently, we have 8 options: commit/branch, reuse/retitle, and refer/forget. In fact, only some of these 8 have git terms defined. For example, to squash we have to apply a diverging branch in its entirety, add a new commit message, do not refer to the original branch. To rebase, we apply separate commits, reuse the messages, do not refer back. To merge, we apply all of a branch, add a new message, refer back (the parent header).

The way to express it in Beagle CLI:

    # rebase one commit: apply, post
    be patch ?feature
    be post #!

    # merge a branch: apply all, post with a new message
    be patch ?feature!
    be post '#merge the feature'

    # squash a branch
    be patch ?feature!
    be post '#add a new feature!'

    # rebase the entire branch
    while be patch ?feature; do
        make && make test && be post #!;
    done

    # cherry pick one commit
    be patch #391a0d33
    be post #!

Here we use the bang modifier to:

  1. '?branch!' apply the entire branch (default: one commit),
  2. '#message!' dont link the original commit (the parent ref).

Note: when we supply no message, the original one gets reused. We may keep message/author but drop the original commit: #!.

Branch rebase here may only happen as a cycle, because we make as many posts as many commits we have. This also ensures that all the commited revisions build and pass the tests.

FAQ

So, how PUT is different from POST?

POST does commit and/or fast-forward. PUT resets a branch or marks a file for commit/removal (reflog-only operations).

How does that compare to the URIs git uses?

git only uses URIs to access repos, e.g. git://github.com/gritzko/beagle.git That is very limiting, so we want to extend that addressing scheme to access files, revisions, locations in files.

How does that compare to GitHub URIs?

GitHub URIs have a typical web-app structure, that makes them invonvenient for our case.

https://github.com/gritzko/beagle/blob/main/keeper/README.md

In particular, beagle URIs orthogonalize all the versioning information into the query part to avoid overusing the path for everything (project, user, branch, path). Beagle branches are tree-ordered filesystem-like and the top level entries are project trunks, so the GitHub URI above becomes

be://replicated.live/keeper/README.md?/beagle

Note that a Beagle repo may host any number of projects, and the default way to convey a project is the query. If we want to peek into a branch, the URI becomes

be://replicated.live/keeper/README.md?/beagle/MEM-issues