BLAME-003: parallelize blame's inflate-bound fetch across cores

blame's fetch/descent is inflate-bound and embarrassingly parallel (independent per commit), while the weave fold is small and sequential. After the per-commit work is minimized (BLAME-001/BLAME-002), the residual descent+inflate can be sharded across the 16 cores: a parallel pass resolves each commit's leaf blob sha, a parallel pass inflates the distinct changed blobs, then a serial pass folds them in topo order. The blocker is that the keeper/graf read path uses shared singleton scratch buffers; those must become per-thread (or caller-owned) first. This layers on top of — and is gated by — the single-thread wins. See Plan.

Issues

The per-commit loop (graf/BLAME.c:344) is serial though its dominant cost (inflate) is independent across commits.

Blockers

Ordering dependency: land BLAME-001 first and re-measure — sha-dedup alone may evaporate enough that parallelism's ROI drops. Threads are new to the codebase (build/ASAN/fuzz matrix + a TSan build).

Planned

Phase the work, make the read path reentrant by passing scratch as parameters (CLAUDE.md §5), then shard.

Landed

None yet.