Human authored
Git's basic model is a wonderfully simple system of blob trees and commit chains that one can explain in 5 minutes to anyone. Further up the stack, that wonderful simplicity devolves into a mess of commands and flags developers with 20 years of git experience have difficulty remembering.
That is doubly so when multi-tasking with LLMs. "I believe we implemented it on Tuesday, but it is not here. Where is it?" "Which branch corresponds to that remote?" And so on.
If only we had some universal language to address and access local and remote resources, files and locations in files! Oh wait, we have HTTP and URI, which are as standard as it gets. Those were specifically designed for this task. Supported in so many apps and libs. Can we apply that to git?
The URI layout we all remember by heart:
scheme: -- the access protocol / addressing scheme,//authority -- most often the network host,/path -- path in the remote filesystem,?query -- other stuff (like arguments),#fragment -- location within the document.
Can we retrofit that to a versioned store? Well, if all the
versioning info goes into the query, the rest is obvious.
http://somehost/dir/file?branch#L101 for example.
In fact, Beagle is a git-compatible SCM doing exactly that.
The case of HTTP is more interesting. Originally, HTTP has a vocabulary of verbs: HEAD, GET, PUT, POST, PATCH, DELETE. Although, people only use GET and POST nowadays. But, there was some reason for the other verbs to exist, right?
While the vocabulary is a bit vague, fundamentally it grows out of the need to access a remote filesystem. That fits naturally the git model, which is, as described, a [content- addressed filesystem]f. For that reason, Beagle uses the HTTP verbs exclusively.
Wait, but it only has patch? What about merge vs rebase?
There is always plenty of confusion around merge, rebase, squash, cherry-pick and all the related techniques of git-handling the twisted history of edits. Each command does several often unrelated things and each thing can be done by several commands, subtly differently.
Beagle decomposes those practices into a set of orthogonal operations, building on that wonderfully simple underlying model of git:
As you might see, there is no way to supplement one operation by another: they are strictly orthogonal. Let's see how that applies to the pandemonium of merge/rebase/squash/cherrypick.
Let's see what all git merge variants do:
Consequently, we have 8 options: commit/branch, reuse/retitle, and refer/forget. In fact, only some of these 8 have git terms defined. For example, to squash we have to apply a diverging branch in its entirety, add a new commit message, do not refer to the original branch. To rebase, we apply separate commits, reuse the messages, do not refer back. To merge, we apply all of a branch, add a new message, refer back (the parent header).
The way to express it in Beagle CLI:
# rebase one commit: apply, post
be patch ?feature
be post #!
# merge a branch: apply all, post with a new message
be patch ?feature!
be post '#merge the feature'
# squash a branch
be patch ?feature!
be post '#add a new feature!'
# rebase the entire branch
while be patch ?feature; do
make && make test && be post #!;
done
# cherry pick one commit
be patch #391a0d33
be post #!
Here we use the bang modifier to:
Note: when we supply no message, the original one gets reused.
We may keep message/author but drop the original commit: #!.
Branch rebase here may only happen as a cycle, because we make as many posts as many commits we have. This also ensures that all the commited revisions build and pass the tests.
POST does commit and/or fast-forward. PUT resets a branch or marks a file for commit/removal (reflog-only operations).
git only uses URIs to access repos, e.g.
git://github.com/gritzko/beagle.git
That is very limiting, so we want to extend that addressing
scheme to access files, revisions, locations in files.
GitHub URIs have a typical web-app structure, that makes them invonvenient for our case.
https://github.com/gritzko/beagle/blob/main/keeper/README.md
In particular, beagle URIs orthogonalize all the versioning information into the query part to avoid overusing the path for everything (project, user, branch, path). Beagle branches are tree-ordered filesystem-like and the top level entries are project trunks, so the GitHub URI above becomes
be://replicated.live/keeper/README.md?/beagle
Note that a Beagle repo may host any number of projects, and the default way to convey a project is the query. If we want to peek into a branch, the URI becomes
be://replicated.live/keeper/README.md?/beagle/MEM-issues