StrictMark: rational Markdown

Markdown is a wonderful lightweight markup — minimal, easy to read and write, widely supported — but precedent-based: it codified diverse practices, so it is messy and inconsistent. StrictMark is Markdown refactored: a rational subset that keeps every feature while shrinking the grammar to its shortest formal form, for a uniform, unambiguous syntax. It reuses existing Markdown tooling without legacy weight, and is the markup every Beagle wiki page uses, from the Home overview to the Verbs vocabulary.

Markdown critique

Markdown implementations disagree — Vim, VS Code, and the output HTML all differ — and the textbook fix, a formal grammar, is blocked by the syntax itself: every element has its own shape and those shapes interact, so N elements spawn N×N corner cases.

StrictMark principles

StrictMark makes a Markdown subset a proper markup language by formalizing its grammar — a minimal hypertext markup, not an elephantine HTML engine. Feature sprawl is held back by transclusion, not new syntax. This document is itself StrictMark.

Inline markup

Markdown inline markup looks easy but is not: ambiguity makes it hard to implement. StrictMark treats it all as bracketing — brackets matched by regex, effective only when they satisfy precedence. The cuts are deliberate; inline formatting is secondary.

Links, images, transclusion

StrictMark links come in exactly two cases, both defined out of line: an explicit [text text][l] pairing any display text with a one-symbol label l, and a shortcut [page] that keys on the bracket text, so a page name is its own key.

    see [Replicated Object Notation][1]
    [1]: http://doc.replicated.cc/ron.sm "What is RON"

    a bare [StrictMark] link, defined below
    [StrictMark]: StrictMark.mkd "the markup"

    ![here is the table][T]
    [T]: /table?@tab "this might be any object"

Block markup

Blocks are containers (lists, blockquotes, divs) or leaves (paragraphs, headers, …); containers nest, leaves do not. Markup sits at line start in 4-char groups — the block stack — as (INDENT|QUOTE)* LIST? LEAF?; absent a leaf, a paragraph is implied.

Headers

StrictMark allows four header levels, ATX only, and a header may span lines. Markings appear only at the line start and, like all block markup, are padded to four chars; when the last char is not a space, the text must begin with a gap space.

    #   Top header
    ##  Subheader
    ### Small header
    #### Smallest header

Lists

The unordered marker is the dash -; * is dropped as ambiguous and + as unpopular — there must be one way only. Ordered lists use 12. (digits then dot), again four chars per level, and the GitHub TODO syntax [ ] is supported as another leaf block.

    -   bulleted item
    -   still bulleted
        1.  nested numbered list
        2.  more numbered
        plain paragraph, nested, indented 4 chars
    -   resume the bulleted list

Blockquotes

Blockquote markup is one > and three spaces, in any order. The blockquote is the only block whose continuation line may carry the quotation marker instead of a bare indent — an exception kept for historical reasons; using indents is still highly advised.

    >   #   Quoted header
    >   Quoted paragraph text.

Code blocks

Code blocks use the fenced syntax with three or four backticks; the opening fence may name the language. The code is indented four chars and follows the same continuation rules as any container, so the closing fence is optional — a drop in indent ends it.

    console.log("JavaScript is the best worst lang ever");

Grammar

The structural layer is a regular language and the inline layer is regex-matched brackets, so the whole parser stays compact. The canonical grammar lives in the Beagle source, not in prose — the block lexer and the inline Ragel machine below.