blame's fetch/descent is inflate-bound and embarrassingly parallel (independent per commit), while the weave fold is small and sequential. After the per-commit work is minimized (BLAME-001/BLAME-002), the residual descent+inflate can be sharded across the 16 cores: a parallel pass resolves each commit's leaf blob sha, a parallel pass inflates the distinct changed blobs, then a serial pass folds them in topo order. The blocker is that the keeper/graf read path uses shared singleton scratch buffers; those must become per-thread (or caller-owned) first. This layers on top of — and is gated by — the single-thread wins. See Plan.
The per-commit loop (graf/BLAME.c:344) is serial though its dominant cost (inflate) is independent across commits.
WEAVEApply/WEAVEDiff) is sequential but only runs for the few changed commits.GRAF.obj_buf/tree_buf (graf/GRAF.h:42) and keeper buf1..buf4 (keeper/KEEP.h:167, written by KEEPGetPacked).find_package(Threads)) — adding it is a new dependency decision.Ordering dependency: land BLAME-001 first and re-measure — sha-dedup alone may evaporate enough that parallelism's ROI drops. Threads are new to the codebase (build/ASAN/fuzz matrix + a TSan build).
Phase the work, make the read path reentrant by passing scratch as parameters (CLAUDE.md §5), then shard.
[0,nord) over workers.WEAVEApply path.GRAFBlobAtCommit/GRAFTreeStep a scratch-context arg and add a KEEPGetPacked variant taking caller-owned buf1..4; each worker u8bMaps its own ABC_BASS (already _Thread_local).packs/puppies registries, and the pure DAG readers (DAGLookup/DAGCommitTree/DAGParents).<threads.h> + Threads::Threads on graflib, atomic work-counter; add a graf/bench/ blame benchmark; Amdahl est. ~6–9× (≈1.16s → 0.15–0.25s), memory-bandwidth capped.None yet.