Skip to content

docs(readme): canonicalize benchmark blog link to uffs.io#482

Open
githubrobbi wants to merge 20 commits into
mainfrom
docs/canonicalize-blog-url-uffs-io
Open

docs(readme): canonicalize benchmark blog link to uffs.io#482
githubrobbi wants to merge 20 commits into
mainfrom
docs/canonicalize-blog-url-uffs-io

Conversation

@githubrobbi

Copy link
Copy Markdown
Collaborator

Canonicalizes the public benchmark-blog link in the main README to the uffs.io front door.

Change

  • README.md (~line 80): the "story behind these numbers" benchmark-blog link https://skyllc-ai.github.io/blog/benchmarking-against-everything/ becomes https://uffs.io/blog/benchmarking-against-everything/ (target verified live, HTTP 200).

Scope notes

  • Only the public-facing canonical link changes. The remaining skyllc-ai.github.io references live in gitignored docs/dev/marketing/ artifacts and are legitimate (the GitHub Pages repo name, the www CNAME target, and historical migration notes), so they are intentionally left as-is.
  • CHANGELOG.md line 1154 keeps githubrobbi/UltraFastFileSearch on purpose: it is a historical fact about the pre-org-move fork.

Part of the cross-repo link-canonicalization pass (the three pinned repos were already updated). This was the last remaining item; it was blocked only by an unrelated red lint gate that is now green.

🤖 Generated with Claude Code

githubrobbi and others added 20 commits June 26, 2026 10:35
…lta)

Phase 0 of the two-tier index project. The CSR indexes (trigram /
children / ext) are immutable read-optimized layouts, so "incremental
maintenance" is an LSM/Lucene-segment redesign — immutable base CSR +
mutable delta overlay + tombstones, queried as base ∪ delta minus
tombstones, with the existing full rebuild demoted to an occasional
compaction step. Turns apply from O(total records) into O(changed).

The doc specifies: architecture + per-op semantics, the search-path
integration choke points (trigram_search / children_of / records_with_ext),
phased delivery (trigram-first for the ~80% win), the mandatory oracle test
(base+delta must be observationally identical to a full rebuild, and
byte-identical after compaction), a baseline + timing-regression gate, the
removable IDXDELTA dev-instrumentation convention (build-id, per-apply /
per-search timing) mirroring USNFIX, the WIN dev test-script
(idx-delta-verify.rs), and a tracking table. Junior-dev-executable.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…y timing

Scaffolding for the incremental-index-maintenance work (design:
docs/architecture/incremental-index-maintenance.md) — measure first, build
later. All dev-only, marked IDXDELTA, removable in Phase 5.

Which-build stamp: a uffs-daemon build.rs emits UFFS_GIT_SHA (short commit +
-dirty); startup logs `IDXDELTA build active version=… git=…`. The WIN
test-script fails fast if the running daemon lacks it — closing the
stale-binary trap we hit during USN testing.

Fine-grained per-apply timing (each meaningful step, not just the rebuild):
whole-body CLONE (shard.rs — the Arc-swap copies the entire index, the big
cost the rebuild timing alone misses and the one base+delta shrinks most),
per-change LOOP (the O(changed) mutation, timed apart), and REBUILD
(children / paths / trigram / ext, each separately). Logged in whole
microseconds (`*_us`, integers) — uffs-core denies float arithmetic, so this
respects that policy (and keeps sub-ms loop precision) rather than allow-ing
around it; the WIN script renders ms.

Refactor: the rebuild + IDXDELTA timing + batch-summary log move to a new
compact_loader/rebuild.rs submodule (cohesive O(n) step; keeps
compact_loader.rs under the 800-LOC policy; houses the temp timing for
Phase-5 removal). No behaviour change.

scripts/windows/idx-delta-verify.rs: the WIN rig (mirrors usn-verify.rs).
Confirms the build, drives escalating create bursts + a rename/delete smoke,
extracts the IDXDELTA-TIMING lines, writes _run/baseline.txt for regression
detection in later phases.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Phase-0 baseline (build 629966b, live C: = 3.89M records) overturned the
doc's original assumption that trigram was the ~80% win. Measured per-apply:

  compute_path_lengths 623ms   <- #1, bigger than trigram
  trigram rebuild      378ms
  whole-body clone     166ms   <- hidden by rebuild-only timing
  ext / loop / children 84/62/54ms
  FULL APPLY        ~1367ms   (not the ~600ms guessed)

Re-sequenced §4 phases by measured cost (biggest lever first):
  1. incremental compute_path_lengths (per-record + renamed-subtree Δ; NOT a
     base+delta overlay) — full §5.5 junior-dev guide added
  2. trigram delta   3. Arc-share the clone   4. ext+children delta
  5. unify + re-tune interval   6. remove IDXDELTA dev helpers

Adds the captured numbers as docs/architecture/baselines/incremental-index-
2026-06-26.json (the §8 regression reference) and marks the done Phase-0 items
(build stamp, timing, WIN rig) in §11.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Replace the per-apply O(total-records) compute_path_lengths BFS (623ms,
the #1 cost in the measured baseline) with an O(changed) per-record
update for normal USN poll batches.

- compact.rs: PathChange{idx, subtree} + update_path_lengths_incremental
  + path_len_from_parent + shift_subtree_path_len (iterative DFS over the
  children CSR, propagating a directory-rename's length delta to the whole
  subtree, clamped to u16).
- apply_create / apply_rename thread &mut Vec<PathChange>; create/file-
  rename push a single O(1) record, directory-rename pushes subtree:true.
- rebuild.rs: rebuild children CSR FIRST (so the subtree walk sees current
  adjacency), then gate incremental-vs-full path update on a 50k batch
  threshold; cold loads (empty change set) still take the full BFS.
- Oracle gate (compact_loader_path_oracle_tests.rs): the incremental
  path_len must be byte-identical to a from-scratch compute_path_lengths
  rebuild across a batch of dir-rename + create + file-rename. Passes.

IDXDELTA-TIMING now reports paths_us for the incremental path so the WIN
rig can confirm the 623ms -> ~0 win against the committed baseline.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Incremental compute_path_lengths landed (9806bc3); path-len oracle
gate is green. Phase 2 (trigram delta) is next.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add the mutable delta-overlay type that the trigram / ext / children
base+delta search will read through (incremental-index-maintenance
§5.1). compact/delta.rs:

- IndexDelta { trigram, ext, children: FxHashMap<_, Vec<u32>>,
  tombstones: FxHashSet<u32>, touched_records }.
- add_record (sorted+deduped binary-search insert across all three
  posting maps; root u32::MAX parent adds no child posting),
  tombstone (idempotent), is_tombstoned, len/is_empty (compaction
  trigger), and the per-key postings accessors.
- The sorted/deduped posting invariant is what makes the eventual
  base∪delta merge a linear pass.

Unit-tested (sorted/dedup insert, root sentinel, idempotent tombstone,
rename-as-two-touches). The base∪delta sorted-merge primitive itself
lands in the Phase-2 commit wired directly into trigram_search, so it
is never dead scaffolding. No DriveCompactIndex field yet — that is
added in Phase 2 where each of the ~20 construction sites is touched
once, with the change that gives the field meaning.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…se 2

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The first WIN run exposed a 0.5 s-per-apply regression on small batches:
two applies (8 and 1 changes, loop_us=0) hit paths_us≈507 ms while every
create/rename batch was 1-18 µs. Those were delete-only batches — a
delete tombstones its record and pushes no PathChange, so path_changes is
empty, and the gate wrongly fell back to the full O(total) compute_path_
lengths BFS.

apply_usn_patch is never the cold-load path (build_compact_index does the
cold BFS directly), so an empty path_changes during apply means "no
record's path_len changed" → the correct work is none. A delete never
shifts any surviving record's path_len. Drop the is_empty() arm; the only
apply-time full-recompute fallback is now a >50k pathological batch.
update_path_lengths_incremental is already a no-op over an empty slice.

Oracle: add delete_only_batch_leaves_path_lengths_correct_without_full_
recompute. The shared assert now compares LIVE records only — a tombstoned
record's path_len is meaningless (incremental leaves it stale, a full BFS
recomputes it as a root); that divergence is correct and excluded.

Expected effect: mean paths drops from 145 ms to sub-ms; full_apply
~800 ms -> ~640 ms (trigram ~390 ms now dominant -> Phase 2).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…s HEAD

Kills the stale-binary trap for good. The rig now, before anything else:

- BIN SYNC: resolves the release dir cargo *actually* uses via
  `cargo metadata` target_directory (honours CARGO_TARGET_DIR /
  .cargo/*.toml build.target-dir; override with UFFS_RELEASE_DIR), then
  copies uffs/uffsd (+ uffs-broker/uffsmcp if built) into ~/bin, printing
  each binary's build mtime. Required bin missing → bail "build first".
- BUILD-ID MATCH GUARD: build-confirmation now extracts git="<sha>" from
  the IDXDELTA marker and asserts it equals `git rev-parse --short HEAD`;
  a resident daemon from an older build → hard fail with the fix.

So the WIN loop is just: build → run. No manual `copy C:\rust-target\...
\release\* ~/bin`. The target_directory JSON parse is a focused hand-scan
(no serde) that unescapes Windows `C:\\..` paths; unit-checked locally.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…mbing)

Routes every trigram caller through one DriveCompactIndex::trigram_search
that reads the base ∪ delta overlay. No behavior change yet: the delta is
always None until apply populates it (Phase 2b), so trigram_search is a
zero-overhead delegate to the base TrigramIndex::search.

- DriveCompactIndex gains `delta: Option<IndexDelta>` (None on fresh /
  compacted / cache-loaded; never serialized). All ~20 construction sites
  updated to `delta: None`.
- trigram_search: when a delta is present, merge per needle-trigram the
  base posting with the delta posting (delta::merge_postings), intersect
  (trigram::intersect_in_place, now pub(crate)), then resolve tombstones
  on the FINAL candidate set — keeping a tombstoned record only if it was
  re-added under a name covering every needle trigram. This is what lets a
  renamed file appear under its new name yet vanish from its old one;
  filtering per posting list would wrongly hide the re-added record.
- trigram.rs: extract the shared needle->trigram packing into
  needle_trigrams(); expose get_posting + intersect_in_place as pub(crate).
- delta.rs: merge_postings sorted-union (no tombstone — see above).
- Migrate the 3 trigram callers (tree, prefix_search, query) to
  trigram_search; each previously passed drive.fold, so behavior-identical.

Tests (compact_trigram_delta_tests.rs) pin the overlay semantics with a
manually-populated delta: create-visible, rename-visible-under-new-name +
gone-from-old, delete-invisible, short-needle None. 867/867 green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…dules

Pure, behavior-preserving split of the oversized compact.rs into cohesive
compact/ submodules. Every public item is re-exported so the canonical
`crate::compact::X` paths used across the workspace are unchanged — no call
site outside the module moved.

  compact/record.rs     CompactRecord + NTFS metafile-name allowlist   (189)
  compact/children.rs   ChildrenIndex (CSR parent→children)            (111)
  compact/extension.rs  ExtensionIndex (CSR ext_id→records)            (102)
  compact/path_len.rs   compute_path_lengths + Phase-1 incremental fns (214)
  compact/builder.rs    build_compact_index + ADS/links/shrink/upcase  (422)
  compact.rs            DriveCompactIndex + HeapReport + impl + re-exports (385)

compact.rs drops off the file-size exception list (was "13 over"; now 385,
well under 800). 867/867 uffs-core tests pass unchanged (identical count
pre/post — proves a pure move); clippy -D warnings, rustdoc -D warnings,
lint-ci-windows all clean.

This also tidies the tree for Phase 2b: the compact() method + apply
delta-population drop cleanly into builder.rs / a slim compact.rs rather
than a 1363-line file.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… rebuild

The 338 ms per-apply trigram rebuild is gone. apply_usn_patch now overlays
each batch onto the IndexDelta instead of rebuilding the base:

- DriveCompactIndex::apply_trigram_delta(adds, tombstones): adds each
  created/renamed record's new-name trigrams to the delta and masks the
  deleted/renamed-away/reused-slot records via tombstones. Folds back to a
  fresh base (compact_base) only when the delta crosses
  TRIGRAM_COMPACT_THRESHOLD (50k touched records) — so trigram_us is ~0 on
  normal applies, a one-off full build on the occasional compaction tick.
- compact_loader/apply.rs: the per-change mutation cluster (StagedCreate,
  stage_create, overwrite_slot, apply_{delete,create,rename}) extracted to
  a submodule; each apply fn now also collects the trigram tombstone set
  (deletes, renames, FRS-reuse overwrites). path_changes doubles as the
  trigram-ADD set. compact_loader.rs 826 -> 592 LOC.
- rebuild.rs: replace the TrigramIndex::build call with apply_trigram_delta;
  IDXDELTA-TIMING gains a `compacted` flag.

End-to-end oracle (compact_loader_trigram_oracle_tests.rs): a real
apply_usn_patch batch (create + rename + delete), then assert trigram_search
through base + delta returns IDENTICAL candidates to a compacted rebuild —
across created, renamed (new + old name), deleted, and untouched files.
868/868 green; clippy/rustdoc/file-size all clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Per the request to stress beyond 1k files:
- BURSTS 10/100/1000 → 1000/10000/100000. The 100k burst crosses
  TRIGRAM_COMPACT_THRESHOLD (50k) so it also exercises a delta compaction
  (full trigram refold) under load; the smaller bursts measure steady-state
  delta-overlay apply (trigram_us ≈ 0).
- Replace the fixed-sleep freshness probe with poll_until_visible: polls a
  per-round filename prefix until that burst's `count` is search-visible (or
  a size-scaled budget elapses), so the report shows true creation
  throughput (files/s) AND apply-to-searchable latency, and flags an apply
  backlog instead of silently measuring the settle constant.

Also marks Phases 1/2a/2b + the compact.rs decomposition done in the
design-doc tracking table.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…n threshold

A burst larger than TRIGRAM_COMPACT_THRESHOLD (e.g. the verify rig's 100k
create) would populate the delta with 100k postings only to discard them at
the post-population compaction check — pure wasted work. apply_trigram_delta
now checks `pending_delta + batch_size > threshold` up front and, if so,
refolds the base directly via compact_base (the records already reflect every
change in the batch). This also catches the accumulation case where a small
batch tips an already-large delta over the line.

Reduces to compact_base (oracle-proven equivalent to a full rebuild), so the
end-of-fn compaction branch is now unreachable and removed. Trigram + path
oracles green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ounter

- Bin sync: a running uffs-broker (LocalSystem service) holds its exe open,
  so the best-effort optional copy hit os error 32 and aborted the whole run
  before any measurement. Optional-bin copy failures now warn ("skip …") and
  continue; only uffs + uffsd (the rig's actual dependencies) hard-fail.
- Remove the now-unused `total_created` accumulator (each burst polls its own
  per-round count) that tripped unused_variables/unused_assignments.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The stale-daemon guard compared the daemon's git SHA to HEAD verbatim, so a
HEAD that advanced purely through scripts/ or docs/ (e.g. a verify-rig tweak)
falsely flagged a current binary as stale and aborted the run. The guard now
diffs the daemon SHA against HEAD and only bails when a build-affecting path
changed (crates/**, Cargo.toml, Cargo.lock, rust-toolchain*); a non-source
advance prints "binary is current" and proceeds. Fail-safe: assumes stale if
git can't answer.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The old mutate round searched `idx_0_1` to check a deleted file — but that
substring matches 111 bulk `idx_0_1*` files, so "expect 0" was a false
signal (the live run showed 111, which was correct-by-accident). Replace it
with `idxmutate_*` sentinels that share no trigram with the bulk files, and
poll-until-applied (visible / absent) instead of a fixed sleep:

- rename idxmutate_src → idxmutate_renamed: expect 'idxmutate_renamed' >= 1
- delete idxmutate_del: expect 'idxmutate_del' → 0
- old name idxmutate_src → 0 (renamed away)

Each now gives a clean pass/fail with the real apply latency. Drops the
now-unused SETTLE constant (every probe polls).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The daemon clones the whole DriveCompactIndex before each apply (lock-free
COW snapshot for readers). That deep-copied the immutable inverted indexes
— the trigram CSR alone is ~hundreds of MB on a multi-million-record drive.

Make trigram / children / ext_index `Arc<…>`. The apply path never mutates
them in place (it overlays on the delta and only ever *replaces* the whole
index at compaction/rebuild), so Arc + replace-the-pointer is a perfect fit:
the per-apply clone now pointer-clones these bases (a refcount bump) and only
deep-copies records + names + the small delta. Read sites are unchanged —
Arc derefs transparently through `.search()` / `.get_posting()` / `&drive.children`.

- compact.rs: field types → Arc; compact_base wraps the refold in Arc::new.
- rebuild.rs: the per-apply children/ext rebuilds wrap in Arc::new.
- builder.rs / compact_cache.rs / fixtures: construction sites wrap in Arc::new.
- New code uses alloc::sync::Arc (workspace lint convention); each touched
  file keeps its existing Arc import.

Expect `clone_us` to drop materially on the WIN baseline (the CSR portion of
the ~135ms clone). 868/868 uffs-core + 333/333 daemon green; clippy -D
warnings, rustdoc, lint-prod, lint-ci-windows, file-size all clean.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rebuild)

Drop the ~58 ms per-apply ExtensionIndex rebuild. `--ext` queries now read
through DriveCompactIndex::records_with_ext (base ∪ delta):

- records_with_ext(ext_id) -> Cow<[u32]>: zero-alloc borrow of the base CSR
  slice when delta is None; otherwise merges base + delta postings and
  validates each candidate against the live records (keep iff
  records[idx].extension_id == ext_id && name_len != 0). That records check
  is what makes a renamed extension (foo.log -> foo.pdf) and a delete correct
  WITHOUT an ext tombstone — a stale base posting just fails the check.
- apply_trigram_delta renamed apply_index_delta; it now always adds the
  ext + children postings (only the trigram postings stay gated on name >= 3
  chars), so a short-named create/rename is never missed by --ext.
- compact_base refolds the ext base too; rebuild.rs drops the ext rebuild
  (ext_us now ~0 in IDXDELTA-TIMING).
- Migrate the 3 ext readers (path_sorted / numeric / path_only top-N) to
  records_with_ext; the 3 post-apply ext unit tests assert through it.

Oracle extended: records_with_ext through the overlay equals the compacted
rebuild for every ext id, across create / rename / delete. 868/868 core +
333/333 daemon; clippy -D warnings, rustdoc, lint-prod, file-size all clean.

Children stays full-rebuilt — Phase 4b moves it onto the overlay (higher
care: it feeds FastPathResolver + the Phase-1 subtree walk).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The site migrated from skyllc-ai.github.io to the canonical uffs.io domain
(old URL now 301-redirects). Point the benchmark story link at uffs.io directly.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant