Add line-diff, char-diff, and unified diff rendering#5
Conversation
Add line-level and character-level diffing functions that split strings before comparing. Add unified diff format renderer with proper hunk headers and automatic multi-hunk splitting using 3 lines of context.
There was a problem hiding this comment.
Build & Tests
CI passes on both macOS and Ubuntu — all 22 tests pass (13 existing + 9 new). No Carp compiler on this machine to build locally.
Findings
1. unified emits insertions before deletions (minor, non-blocking)
Standard diff -u output places deletions (- lines) before insertions (+ lines). This implementation emits them in the order the diff algorithm produces, which is typically insertion-first. For example, unified &(line-diff "old" "new") outputs +new\n-old instead of the conventional -old\n+new.
This is technically valid unified diff format — parsers should handle either order — but it looks non-standard. Human readers and stricter tools expect deletions first. The issue originates in the diff algorithm (which returns Insertion before Deletion when no common prefix exists), not in unified itself, so fixing it would mean either:
- Sorting within
unified(reorder adjacent Insertion/Deletion groups), or - Changing the
diffbase algorithm's output order
Not blocking since the format is parseable, but worth documenting in the unified docstring.
2. Trailing newline semantics (observation, non-blocking)
String.split-by "foo\n" &[\newline] produces ["foo", ""], so a trailing newline in the input becomes a real empty-string line in the diff. This means line-diff "a\n" "a\n" produces [Eq ["a" ""]] instead of [Eq ["a"]]. This is actually reasonable (it preserves trailing-newline information), but might surprise callers. Worth a note in the line-diff docstring.
3. Code complexity of unified (observation, non-blocking)
The function at diff.carp:90-197 has 14 mutable variables and deeply nested if/let-do blocks. This is inherent to implementing a stateful unified-diff renderer in Carp's imperative style — I don't see a simpler way to do it within the language's constraints. The logic is correct after careful tracing through all test cases and additional edge cases (pure insertion, pure deletion, single-line change, exactly-6-line gap between changes). Just noting it as future maintenance context.
4. Correctness verification
I manually traced unified through all 5 test cases and 3 additional edge cases:
- Single-hunk change: header counts correct (
@@ -1,3 +1,3 @@) - Change at start (no leading context): correctly starts hunk at line 1
- Change at end (no trailing context): correctly emits final hunk on loop exit
- Multi-hunk split: 7+ equal lines between changes correctly triggers hunk split. Trailing context (3 lines) and leading context for next hunk (3 lines from end) are computed correctly
- Identical strings: returns empty string (correct)
- Pure insertion/deletion: counts correctly reflect added/removed lines
- Hunk boundary at exactly 2*ctx (6 lines): correctly kept as single hunk
emit-hunk (diff.carp:70-85) correctly builds the header and joins body lines with newlines. The String.join + String.concat pattern is the right Carp idiom here.
line-diff and char-diff are trivial wrappers around diff — correct and well-typed.
Verdict: merge
Solid addition that fills a real gap in the library. The code is correct, well-tested (9 new tests covering line-diff, char-diff, and unified rendering with edge cases), and fits the style of the existing codebase. The insertion-before-deletion ordering is the only notable quirk, and it's a property of the underlying diff algorithm rather than a bug in unified.
Summary
line-diff: Splits two strings by newlines and diffs the resulting line arrays.char-diff: Splits two strings into individual characters and diffs them.unified: Renders a(Array (Diff (Array String)))(typically fromline-diff) in standard unified diff format with@@ -start,count +start,count @@hunk headers. Automatically splits into multiple hunks when changes are separated by more than 6 lines of context (3 before + 3 after).Note:
strandprnare already auto-derived by Carp for theDiffsum type and work correctly for all inner types (tested viaassert-equalwhich requiresstr).9 new tests covering line-diff, char-diff, and unified rendering (single-hunk, multi-hunk, edge cases).
Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.