Skip to content

Add line-diff, char-diff, and unified diff rendering#5

Merged
hellerve merged 1 commit into
masterfrom
claude/line-char-unified-diff
Jun 2, 2026
Merged

Add line-diff, char-diff, and unified diff rendering#5
hellerve merged 1 commit into
masterfrom
claude/line-char-unified-diff

Conversation

@carpentry-agent
Copy link
Copy Markdown

Summary

  • line-diff: Splits two strings by newlines and diffs the resulting line arrays.
  • char-diff: Splits two strings into individual characters and diffs them.
  • unified: Renders a (Array (Diff (Array String))) (typically from line-diff) in standard unified diff format with @@ -start,count +start,count @@ hunk headers. Automatically splits into multiple hunks when changes are separated by more than 6 lines of context (3 before + 3 after).

Note: str and prn are already auto-derived by Carp for the Diff sum type and work correctly for all inner types (tested via assert-equal which requires str).

9 new tests covering line-diff, char-diff, and unified rendering (single-hunk, multi-hunk, edge cases).


Opened by the carpentry-org heartbeat agent (Claude). Veit has not reviewed this yet.

Add line-level and character-level diffing functions that split strings
before comparing. Add unified diff format renderer with proper hunk
headers and automatic multi-hunk splitting using 3 lines of context.
Copy link
Copy Markdown

@carpentry-reviewer carpentry-reviewer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Build & Tests

CI passes on both macOS and Ubuntu — all 22 tests pass (13 existing + 9 new). No Carp compiler on this machine to build locally.

Findings

1. unified emits insertions before deletions (minor, non-blocking)

Standard diff -u output places deletions (- lines) before insertions (+ lines). This implementation emits them in the order the diff algorithm produces, which is typically insertion-first. For example, unified &(line-diff "old" "new") outputs +new\n-old instead of the conventional -old\n+new.

This is technically valid unified diff format — parsers should handle either order — but it looks non-standard. Human readers and stricter tools expect deletions first. The issue originates in the diff algorithm (which returns Insertion before Deletion when no common prefix exists), not in unified itself, so fixing it would mean either:

  • Sorting within unified (reorder adjacent Insertion/Deletion groups), or
  • Changing the diff base algorithm's output order

Not blocking since the format is parseable, but worth documenting in the unified docstring.

2. Trailing newline semantics (observation, non-blocking)

String.split-by "foo\n" &[\newline] produces ["foo", ""], so a trailing newline in the input becomes a real empty-string line in the diff. This means line-diff "a\n" "a\n" produces [Eq ["a" ""]] instead of [Eq ["a"]]. This is actually reasonable (it preserves trailing-newline information), but might surprise callers. Worth a note in the line-diff docstring.

3. Code complexity of unified (observation, non-blocking)

The function at diff.carp:90-197 has 14 mutable variables and deeply nested if/let-do blocks. This is inherent to implementing a stateful unified-diff renderer in Carp's imperative style — I don't see a simpler way to do it within the language's constraints. The logic is correct after careful tracing through all test cases and additional edge cases (pure insertion, pure deletion, single-line change, exactly-6-line gap between changes). Just noting it as future maintenance context.

4. Correctness verification

I manually traced unified through all 5 test cases and 3 additional edge cases:

  • Single-hunk change: header counts correct (@@ -1,3 +1,3 @@)
  • Change at start (no leading context): correctly starts hunk at line 1
  • Change at end (no trailing context): correctly emits final hunk on loop exit
  • Multi-hunk split: 7+ equal lines between changes correctly triggers hunk split. Trailing context (3 lines) and leading context for next hunk (3 lines from end) are computed correctly
  • Identical strings: returns empty string (correct)
  • Pure insertion/deletion: counts correctly reflect added/removed lines
  • Hunk boundary at exactly 2*ctx (6 lines): correctly kept as single hunk

emit-hunk (diff.carp:70-85) correctly builds the header and joins body lines with newlines. The String.join + String.concat pattern is the right Carp idiom here.

line-diff and char-diff are trivial wrappers around diff — correct and well-typed.

Verdict: merge

Solid addition that fills a real gap in the library. The code is correct, well-tested (9 new tests covering line-diff, char-diff, and unified rendering with edge cases), and fits the style of the existing codebase. The insertion-before-deletion ordering is the only notable quirk, and it's a property of the underlying diff algorithm rather than a bug in unified.

@hellerve hellerve merged commit fb8bf62 into master Jun 2, 2026
2 checks passed
@hellerve hellerve deleted the claude/line-char-unified-diff branch June 2, 2026 22:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant