Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
13 changes: 13 additions & 0 deletions .claude/board/EPIPHANIES.md
Original file line number Diff line number Diff line change
Expand Up @@ -1805,6 +1805,19 @@ Why this is the right move, not just a bug patch:
3. **Flexibility + the one cost.** A node mixes in up to 16 family adjacencies (huge flexibility, any-to-any within 256). The named limitation is **mixin dependency**: a referenced family must exist or the slot is a dangling adapter (skipped). That is the honest trade — and it is cheap, because a missing family is a render no-op, not a corruption.

The general rule for graph edges on this substrate: **resolve to the stable grouping (family), not the volatile leaf (member)** — unless a richer flavor (8×16-bit, 32×4 residue, member→member second-hop) is measured to be needed. Cross-ref: `E-ANCHOR-IS-A-HEAD-FIELD-NOT-A-VALUE-TYPE` (the static dual), `E-GUID-IS-THE-GRAPH`, the operator's deferred helix-basin-anchor (CLAM ⇄ Louvain turbovec edge residue) as the eventual richer flavor; `aiwar.rs` (the POC: 221 aiwar entities → 60 category family hubs).
## 2026-06-20 — E-CPP-PARITY-6 — the UNICHARSET `direction` + `mirror` columns are byte-identical to libtesseract; the sixth leaf, and the first to read PAST the bbox CSV into the multi-column tail

**Status:** FINDING (in-env, real trained data). `lance_graph_contract::unicharset::UniCharSet::{get_direction, get_mirror}` dump the `eng.lstm-unicharset` per-id bidi direction codes and mirror ids **byte-identical to tesseract's own `get_direction` / `get_mirror`, 112/112 each** (same self-validating oracle, `direction` + `mirror` modes). Sixth + seventh proven accessor surfaces.

**Why this was the "multi-tier parser" leaf — and why it turned out simple.** `direction`/`mirror` sit two/three columns past the script, after the bbox+stats CSV. Tesseract places them via a 5-tier `istringstream` fallback (`unicharset.cpp:833-868`). But the bbox+stats group is always a SINGLE whitespace token (comma-separated, no spaces), so on a whitespace split the columns land at fixed offsets regardless of tier: `script`, `other_case`, `direction`, `mirror` are simply the 1st/2nd/3rd/4th tokens after the optional CSV. Continuing the existing per-line token walk one and two positions past `other_case` reads them; a tier without the columns leaves the walk exhausted → defaults. No bespoke tier detector needed — the token walk IS the tier collapse. (The float stats inside the CSV still need decimal parsing; that's the remaining sub-leaf.)

**Two transcode subtleties the oracle pinned (read-the-truth-first, again).** (1) `direction`'s load default is `U_LEFT_TO_RIGHT` (0) for an absent column, but `get_direction`'s OUT-OF-RANGE return is `U_OTHER_NEUTRAL` (10) — two different "defaults" for two different conditions (`unicharset.h:712-714`). (2) `mirror` is clamped at load exactly like `other_case` (`>= size` → self) and returns `INVALID_UNICHAR_ID` (-1) out of range. The oracle confirmed direction is genuinely varied on eng (55× LTR=0, 33× OTHER_NEUTRAL=10, plus 2/3/4/6 for digit-class chars) and mirror has 10 real pairs (bracket/paren/brace mirrors, e.g. `(`↔`)`), so this exercises the parse, not just the defaults.

**Pattern holds (E-CPP-KEYSTONE-1).** +2 accessors + 2 dumps + one `diff` each, no new architecture, no Core gap. +3 contract tests (26 unicharset total). Consumed by `tesseract-core::CharSet::{get_direction,get_mirror}`. Reproducible via the committed `examples/unicharset_dump.rs {direction,mirror}`.

**Tooling note (TECH_DEBT filed):** the contract crate is NOT fmt-gated in CI (`style.yml` checks only `lance-graph` + `deepnsm`), so merged symbiont/SoA PRs left rustfmt-1.9.0 drift in `hhtl.rs`/`nan_projection.rs`/`soa_graph.rs`. My leaf files are fmt-clean; I did not reformat others' merged files. See TECH_DEBT.

Cross-ref: `E-CPP-PARITY-1..5` (the prior leaves), `E-CPP-KEYSTONE-1`, `.claude/knowledge/core-first-transcode-doctrine.md`. Branch `claude/happy-hamilton-0azlw4`, lance-graph + tesseract-rs.

---

Expand Down
4 changes: 4 additions & 0 deletions .claude/board/LATEST_STATE.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,8 @@ Membrane consumers can now pull BOTH halves of a render `classid` BBB-safely fro

---

> **2026-06-20 — branch work (`claude/happy-hamilton-0azlw4`)** — **UNICHARSET `direction` + `mirror` transcoded + byte-parity proven (E-CPP-PARITY-6), the sixth leaf — first to read PAST the bbox CSV.** `UniCharSet` now parses the two columns after `other_case` into `directions: Vec<i32>` + `mirrors: Vec<i32>` by continuing the per-line token walk (the bbox+stats group is one whitespace token, so columns land at fixed offsets regardless of the 5-tier fallback — no bespoke tier detector). `get_direction` (`unicharset.h:712`, load default `U_LEFT_TO_RIGHT` 0, out-of-range → `U_OTHER_NEUTRAL` 10) + `get_mirror` (`unicharset.h:721`, clamped like other_case, out-of-range → -1) + `dump_direction`/`dump_mirror`. **Byte-identical 112/112 each** on real `eng.lstm-unicharset` (self-validating oracle; direction varied: 55× LTR / 33× OTHER_NEUTRAL / 2·3·4·6 for digit chars; mirror has 10 bracket/paren pairs). Additive, zero-dep; +3 contract tests (26 unicharset total), my files clippy + fmt clean; reproducible via `examples/unicharset_dump.rs {direction,mirror}`. Consumed by `tesseract-core::CharSet::{get_direction,get_mirror}`. No Core gap. Remaining UNICHARSET sub-leaf: the float stats (bbox ints + width/bearing/advance) inside the CSV. EPIPHANIES `E-CPP-PARITY-6`; TECH_DEBT (contract crate not fmt-gated in CI).
>
> **2026-06-20 — IN PR (`claude/jirak-math-theorems-harvest-rfii13`)** — **kanban×Rubicon SoA value tenant + per-tenant counters (capstone S1 green).** NEW `ValueTenant::Kanban = 9` at value-slab `[112,120)` (8 B: `phase|exec|reserved|cycle`), added to `ValueSchema::{Cognitive,Full}` — reserve-don't-reclaim, **layout-preserving** (Full 112→120 B, stride 512 untouched, no version bump). `KanbanTenant` Copy view + `NodeRow::{kanban,set_kanban}` (owner-gated write / surreal read-only / Rubicon); `KanbanColumn`/`ExecTarget` `from_u8`. **Subsumes the envelope-pointer G1** — the node carries its own phase+cycle, pinning SoA↔kanban in the LE blob (a `FixedSizeBinary(512)` store reads kanban zero-copy at any version). NEW `tenant_counter` module + feature `tenant-counters` (default OFF, zero-cost no-op; one relaxed atomic/tenant-write when on) — the capstone NaN-census instrument; `set_kanban` is the first wired cascade point. Decisions kept (I-VSA-IDENTITIES + AGI-glove): thinking-style is ClassView+`Meta`, NOT a 128-bit tenant; plan-shape ClassView-derived; MUL flow-trigger is a function, not a tenant. Contract lib **714**/715(tenant-counters)/720(guid-v2-tail), clippy `-D warnings` + fmt clean all three. Refs: AGENT_LOG (cont.¹⁷), EPIPHANIES `E-KANBAN-IS-A-VALUE-TENANT-SUBSUMES-G1`, plan `capstone-cognitive-loop-wiring-nan-census-v1` (S1 green).
>
> **2026-06-20 — IN PR (`claude/jirak-math-theorems-harvest-rfii13`)** — **Zero-copy SoA read contract: `node_rows_from_le_bytes` (the surrealdb "second brain" primitive).** The inverse of `NodeRowPacket::as_le_bytes` (WRITE) — `canonical_node::node_rows_from_le_bytes(&[u8]) -> Option<&[NodeRow]>`, a CHECKED zero-copy cast (`len % 512 == 0` AND `ptr % 64 == 0`, else `None` → caller copies, no UB; empty→Some(empty)). This IS the LE contract a backing store satisfies so its bytes ARE the SoA the cognitive shader reads in place. **Brutal verdict:** lance-graph side now zero-copy-ready end-to-end; surrealdb's kv-lance does NOT qualify as scaffolded (`val: DataType::Binary` variable-length → needs `FixedSizeBinary(512)`), and value zero-copy holds only if stored UNcompressed (key/address always zero-copy). 712 contract lib green, clippy `-D warnings` both configs + fmt clean. Refs: AGENT_LOG 2026-06-20 (cont.¹⁴), EPIPHANIES `E-SURREALDB-SECOND-BRAIN-IS-ZERO-COPY-IFF-FIXEDSIZEBINARY`.
Expand Down Expand Up @@ -160,6 +162,8 @@ Membrane consumers can now pull BOTH halves of a render `classid` BBB-safely fro

> **2026-06-18 — ADDED (D-DO-ARM-1, the OGAR DO arm)**: `lance_graph_contract::action::{ActionState, StateGuard, ActionDef, ClassActions, actions_for, effective_actions, ActionInvocation}` — the Perdurant DO arm completing the OGAR IR (the action-axis sibling of `codegen_manifest`'s `MethodSig`/THINK). Both the 4-agent `sale_order` AR→DO probe (runtime-archaeologist) AND the merged cross-repo PR survey (ruff/OGAR/lance-graph/openproject/tesseract) agreed this was the ONE missing wire: the THINK arm (`classid → ClassView`, `has_function → MethodSig`) is converged + merged; the DO-arm `ActionInvocation`/`ActionDef` type was ABSENT. **`ActionDef`** (static, `const`-constructible, all `&'static`/`Copy`): `predicate` (= harvested `has_function` method), `object_class` (classid), `exec` (`ExecTarget` incl `SurrealQl`), `guard` (`StateGuard` = KausalSpec field==value), `required_role` (RBAC), `overrides` (OGAR `classid→ClassView` inheritance). **`ClassActions`+`actions_for`** (zero-fallback) mirror `ClassMethods`/`methods_for`. **`effective_actions(parent, child)`** = OGAR inheritance on the action axis (child overrides parent by predicate). **`ActionInvocation`** (dynamic, `Copy`): lifecycle `ActionState{Pending→Committed|Failed|Cancelled}` (sticky terminals), S2.5 `cycle` stamp, idempotency/trace keys, HLC `emitted_at_millis`. **`ActionInvocation::commit(def, actor, impact, now)`** is the gated egress — RBAC FIRST (`auth::ActorContext` must hold `required_role` or be admin → else `Failed`), THEN MUL impact (`mul::GateDecision`: `Flow→Committed`+stamped, `Hold→`Pending/escalate, `Block→Cancelled`). This IS "commit to the external consumer (odoo/openproject/woa/tesseract) after the cycle decides sound." Dispatched via `UnifiedStep`/`ExecTarget`, NOT a per-crate endpoint. Additive, zero-dep. +5 tests green. Consumer reference: `docs/OGAR_CONSUMER_API.md`. Branch `claude/soa-write-deinterlace-inc2`.

> **2026-06-20 — ADDED (D-UNICHARSET-DIR-MIRROR, the bidi-direction + mirror leaf)**: `lance_graph_contract::unicharset::UniCharSet` gained `get_direction(id) -> i32` + `get_mirror(id) -> i32` + `dump_direction()` + `dump_mirror()`, backed by `directions: Vec<i32>` + `mirrors: Vec<i32>`. The two columns after `other_case`, read by continuing the per-line token walk (the bbox+stats CSV is one whitespace token → fixed offsets across all 5 column tiers; no bespoke tier detector). `direction` = ICU `UCharDirection` code, load default `U_LEFT_TO_RIGHT` 0, out-of-range → `U_OTHER_NEUTRAL` 10 (`unicharset.h:712`). `mirror` clamped like other_case, out-of-range → -1 (`unicharset.h:721`). **Byte-identical 112/112 each** vs tesseract's own `get_direction`/`get_mirror` on real `eng.lstm-unicharset` (self-validating oracle; direction 6 distinct values, mirror 10 bracket pairs). Additive, zero-dep. +3 tests (26 unicharset total). Consumed by `tesseract-core::CharSet::{get_direction,get_mirror}`. EPIPHANIES `E-CPP-PARITY-6`; sixth leaf of `PROBE-OGAR-ADAPTER-UNICHARSET`; first to read past the bbox CSV. Remaining sub-leaf: the float stats inside the CSV. Branch `claude/happy-hamilton-0azlw4`.

> **2026-06-20 — ADDED (D-UNICHARSET-OTHERCASE, the case-pair leaf)**: `lance_graph_contract::unicharset::UniCharSet` gained `get_other_case(id) -> i32` + `dump_other_case()`, backed by `other_cases: Vec<i32>`. The case-paired unichar id (`'C'`→`'c'`), parsed as the token after the script and clamped at load (`unicharset.cpp:901`: a value `>= size`, and the absent default = size, fold to the id itself). Out-of-range id → `INVALID_UNICHAR_ID` -1 (`unicharset.h:703`). **Byte-identical 112/112** vs tesseract's own `get_other_case` on real `eng.lstm-unicharset` (self-validating oracle `other_case` mode; 60 self / 52 pairs). Additive, zero-dep. +4 tests (23 unicharset total). Consumed by `tesseract-core::CharSet::get_other_case`. EPIPHANIES `E-CPP-PARITY-5`; fifth leaf of `PROBE-OGAR-ADAPTER-UNICHARSET`; the last field reachable by token-offset (direction/mirror/bbox need the multi-tier parser). Branch `claude/happy-hamilton-0azlw4`.

> **2026-06-20 — ADDED (D-UNICHARSET-SCRIPT, the script-table leaf)**: `lance_graph_contract::unicharset::UniCharSet` gained `get_script(id) -> i32` / `get_script_table_size()` / `script_from_script_id(sid) -> Option<&str>` / `script_of(id) -> Option<&str>` / `dump_script()`, backed by new `script_ids: Vec<i32>` + an interned `scripts: Vec<String>`. The first leaf to transcode an **interning side-table** (`add_script`, `unicharset.cpp:1063`): `null_script` "NULL" seeded at sid 0 (the `unichar_insert` set_script, `unicharset.cpp:680` → `null_sid_ == 0`), real scripts intern from 1 in id order. Script name = token after the optional bbox/stats CSV (mixed-tier safe). Out-of-range → `null_sid_` 0 (`unicharset.h:681`). **Byte-identical 112/112** vs tesseract's own `get_script` on real `eng.lstm-unicharset` (self-validating oracle `script` mode; table `["NULL","Common","Latin"]`). Additive, zero-dep, behaviour-preserving on the bijection. +4 tests (19 unicharset total). Consumed by `tesseract-core::CharSet::{get_script,script_of}`. EPIPHANIES `E-CPP-PARITY-4`; fourth leaf of `PROBE-OGAR-ADAPTER-UNICHARSET`. Branch `claude/happy-hamilton-0azlw4`.
Expand Down
16 changes: 16 additions & 0 deletions .claude/board/TECH_DEBT.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,22 @@ enum, not a local rename. Canonical = the contract. Surfaced while grounding
Deferred: cross-crate dep addition, out of scope for the convergence-probe
increment. Same class as the resolved `CausalEdge64` shadow.

### TD-CONTRACT-NOT-FMT-GATED — `lance-graph-contract` is not fmt-checked in CI (2026-06-20)

**Open.** `.github/workflows/style.yml` runs `cargo fmt --check` only on
`crates/lance-graph/` and `crates/deepnsm/` — NOT on `crates/lance-graph-contract/`.
Consequence: merged PRs (symbiont / SoA work) have left rustfmt-1.9.0 drift in
contract files — observed in `hhtl.rs:682`, `nan_projection.rs:125`, and
`soa_graph.rs:{178,248,326,413}` (all whitespace/wrapping, behaviour-preserving).
A local `cargo fmt -p lance-graph-contract -- --check` is therefore red on `main`
even when a given PR's own files are clean. This has been re-discovered 3×
(class_view.rs, nan_projection.rs, now hhtl/soa_graph) — recording it so the next
session doesn't a 4th time. **Pay by** either adding a contract-crate `cargo fmt
--check` step to `style.yml` (and a one-shot `cargo fmt -p lance-graph-contract`
normalization commit), OR a deliberate decision to leave the contract crate
ungated. Until then: leaf PRs keep their OWN files fmt-clean and do not reformat
others' merged files (avoids muddied diffs + conflicts with in-flight PRs).

### TD-ONTOLOGY-LINT — `lance-graph-ontology` pre-existing clippy (12) on toolchain 1.95 (2026-06-18)

`cargo clippy -p lance-graph-ontology -- -D warnings` exits 101 with 12 errors on the pinned 1.95 toolchain — all **pre-existing on `main`** (e.g. `odoo_blueprint/op_emitter.rs:182` is byte-identical on `origin/main`), in `hydrators/owl.rs` (2), `odoo_blueprint/op_emitter.rs` (1), `ttl_parse.rs` (3), + others. Mostly mechanical (`iter_cloned_collect` → `.to_vec()`, etc.). The crate is not in the CI clippy sweep ("CI tests 4 of ~30 crates"), so the debt accumulated un-gated. Surfaced while wiring `class_id_for_guid` (E-OGAR-ONTOLOGY-WIRED-1; `registry.rs` itself is clippy-clean + fmt-clean). Fix is a focused lint pass, out of scope for the wiring increment. Same class as `TD-CAUSAL-EDGE-LINT`.
Expand Down
14 changes: 9 additions & 5 deletions crates/lance-graph-contract/examples/unicharset_dump.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
//! Dump a `.unicharset`'s id→unichar table (default), its per-id property bits
//! (`properties` mode), its per-id script ids (`script` mode), or its per-id
//! case-pair ids (`other_case` mode) — the Rust side of the byte-parity probe
//! `PROBE-OGAR-ADAPTER-UNICHARSET`.
//! Dump a `.unicharset`'s id→unichar table (default) or a per-id column:
//! `properties` (category bits), `script` (script ids), `other_case` (case-pair
//! ids), `direction` (bidi codes), `mirror` (mirror ids) — the Rust side of the
//! byte-parity probe `PROBE-OGAR-ADAPTER-UNICHARSET`.
//!
//! ```sh
//! # on a box with libtesseract + libleptonica installed:
Expand Down Expand Up @@ -33,7 +33,9 @@ use lance_graph_contract::unicharset::UniCharSet;

fn main() -> ExitCode {
let Some(path) = std::env::args().nth(1) else {
eprintln!("usage: unicharset_dump <path/to/eng.unicharset> [properties|script|other_case]");
eprintln!(
"usage: unicharset_dump <path/to/eng.unicharset> [properties|script|other_case|direction|mirror]"
);
return ExitCode::FAILURE;
};
let mode = std::env::args().nth(2).unwrap_or_default();
Expand All @@ -43,6 +45,8 @@ fn main() -> ExitCode {
"properties" => print!("{}", unicharset.dump_properties()),
"script" => print!("{}", unicharset.dump_script()),
"other_case" => print!("{}", unicharset.dump_other_case()),
"direction" => print!("{}", unicharset.dump_direction()),
"mirror" => print!("{}", unicharset.dump_mirror()),
_ => print!("{}", unicharset.dump()),
}
ExitCode::SUCCESS
Expand Down
Loading
Loading