Skip to content

Add JSON encode fuzz runner#1

Open
membphis wants to merge 15 commits into
masterfrom
codex/fuzz-encode-v0.7.2
Open

Add JSON encode fuzz runner#1
membphis wants to merge 15 commits into
masterfrom
codex/fuzz-encode-v0.7.2

Conversation

@membphis
Copy link
Copy Markdown

@membphis membphis commented May 29, 2026

Summary

  • add a Lua-side JSON encode fuzz runner with make fuzz
  • generate schema-guided and pure recursive-random Lua tables with a default 1:2 mix
  • validate encoded JSON for decodeability, key ordering, object/array metadata, scalar paths, and reproducible failures
  • support configurable duration, interval, seed, sort_keys, and multi-worker runs

Verification

  • /Users/yuanshengwang/.luarocks/bin/busted -> 114 successes
  • make fuzz LUA=luajit DURATION=2 INTERVAL=1 WORKERS=1 SEED=123
  • make fuzz LUA=luajit DURATION=2 INTERVAL=1 WORKERS=2 SEED=123

Summary by CodeRabbit

  • New Features

    • Added comprehensive fuzzing infrastructure for JSON encoding validation with deterministic seeding, parallel worker pools, and multiple predefined schema templates for realistic test scenarios
    • Includes interval-based progress reporting, detailed failure diagnostics with seed/case/schema tracking, configurable timeout-driven test execution, and key ordering validation
  • Chores

    • Updated .gitignore to exclude RapidJSON build artifacts and documentation directories

Review Change Stack

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 29, 2026

Warning

Review limit reached

@membphis, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 20 minutes and 48 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8118e561-9275-46e0-a0c3-7f9177a0735a

📥 Commits

Reviewing files that changed from the base of the PR and between 16f843b and 51d17fa.

📒 Files selected for processing (4)
  • Makefile
  • spec/fuzz_encode_lib_spec.lua
  • tools/fuzz_encode.lua
  • tools/fuzz_encode_lib.lua
📝 Walkthrough

Walkthrough

This PR introduces a complete fuzzing framework for lua-rapidjson that generates diverse JSON payloads via schema-guided or recursive-random modes, encodes them with RapidJSON, validates sorted-key ordering and structural correctness, and runs workers concurrently with progress reporting and failure diagnostics.

Changes

RapidJSON Fuzzing Framework

Layer / File(s) Summary
Configuration & RNG Foundations
tools/fuzz_encode_lib.lua, spec/fuzz_encode_lib_spec.lua
Configuration parsing from environment variables with defaults and coercion (including SORT_KEYS=0 special case), deterministic PRNG with seed normalization, and summary formatting utilities; comprehensive tests validate config behavior, argument parsing, RNG determinism, and output format.
Random Payload Generation & Schemas
tools/fuzz_encode_lib.lua, spec/fuzz_encode_lib_spec.lua
Schema definition pools and random scalar/null/string/array/object generators with recursive depth limiting; five concrete schema builders (LLM response, GitHub issue, social feed, paginated list, metadata config) that populate expected metadata; metadata tracking helpers and case generation that rotates between deterministic schema-guided and recursive-random modes; extensive tests verify generated shapes, null round-tripping, and recursive metadata collection.
JSON Validation
tools/fuzz_encode_lib.lua, spec/fuzz_encode_lib_spec.lua
JSONPath-like path lookup through decoded objects with error diagnostics; JSON text scanning for offset tracking; object member enumeration and path tokenization; key-order validation against rapidjson-encoded member names; comprehensive validate_encoded_case that checks kind, object key counts/order, array lengths, and scalar equality including null; tests confirm sorted-key enforcement at top level and nested paths, plus decode error handling.
Error Reporting & Diagnostics
tools/fuzz_encode_lib.lua, spec/fuzz_encode_lib_spec.lua
Cycle-safe value pretty-dumping with truncation and depth limits; multi-line FUZZ FAILURE report format including seed/worker/case/schema/reason/value/json fields; tests verify deterministic and comprehensive failure output.
Fuzzing Worker Script
tools/fuzz_encode.lua
Standalone worker that loads library, parses config from CLI/environment with fallbacks, seeds RNG, and runs time-bounded encode/validate loop until deadline; increments counters, validates results, and on encode/validation failure writes formatted failure to stderr and exits; periodically and finally prints progress summaries.
Build Configuration & Orchestration
Makefile, .gitignore
Makefile defines configurable fuzz target spawning concurrent workers with unique seeds per worker, temp-dir marker files for completion/failure coordination, cleanup on exit, and signal handling; .gitignore adds RapidJSON build artifacts (rapidjson.so, rapidjson.so.dSYM/) and /docs/superpowers/ directory.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
E2e Test Quality Review ⚠️ Warning Tests are unit-only (no E2E); no actual fuzz runner invocation. SEED arithmetic fails with non-numeric; INT/TERM polling hangs; sort_keys=false validation bug without regression test. Add E2E test for make fuzz; fix Makefile SEED fallback and INT/TERM; pass sort_keys to validate_encoded_case with regression test.
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'Add JSON encode fuzz runner' directly and clearly summarizes the main change: adding a new fuzzing tool for JSON encoding. It accurately reflects the primary purpose of the PR.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
Security Check ✅ Passed Fuzz runner for lua-rapidjson library. Generates synthetic test data only—no real credentials, API keys, or sensitive data. No database, auth, TLS, or secret resolution code found.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch codex/fuzz-encode-v0.7.2

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@Makefile`:
- Line 20: The Makefile's fuzz recipe must coerce SEED to a numeric fallback
before doing shell arithmetic; modify the recipe to compute a numeric SEED value
(e.g., set SEED_NUM from SEED if it matches an integer, otherwise use the
current epoch like date +%s) and then compute per-worker seed using that numeric
SEED_NUM (replace the existing seed=$$(( $(SEED) + $$worker - 1 )) with
arithmetic based on SEED_NUM). Reference the fuzz recipe, the SEED variable, the
per-worker seed calculation, and tools/fuzz_encode_lib.lua::normalize_seed() so
the behavior matches Lua's fallback to os.time().
- Around line 15-17: The polling loop in the fuzz target can hang after
stop_workers() calls cleanup() and removes $$tmpdir; update the loop that waits
on ls "$$tmpdir"/fail.* / done.* (the while :; do ... done loop) so it breaks if
the temp directory is gone (e.g., add a check if [ ! -d "$$tmpdir" ]; then
break; fi) or also check for a sentinel file; alternatively, have stop_workers()
create a sentinel outside $$tmpdir before calling cleanup() and make the loop
exit when that sentinel is present. Reference: stop_workers(), cleanup, the fuzz
target while loop, $$tmpdir, fail.*, and done_count.

In `@tools/fuzz_encode_lib.lua`:
- Around line 1005-1009: validate_key_order is always enforcing alphabetical
member order because validate_encoded_case (in tools/fuzz_encode_lib.lua)
doesn’t receive the sort_keys flag; when cfg.sort_keys is false we should skip
order validation. Change the signature of M.validate_encoded_case to accept a
sort_keys boolean (e.g. function M.validate_encoded_case(rapidjson, case, json,
sort_keys)) and thread cfg.sort_keys from the caller in tools/fuzz_encode.lua
when calling fuzz.validate_encoded_case; inside
validate_encoded_case/validate_key_order, if sort_keys is false, skip or
short-circuit the member-order check (the parts that use entry.keys /
string_keys()), so generated JSON encoded with sort_keys=false won’t be
rejected. Also add a regression test covering SORT_KEYS=0 to ensure order
validation is skipped.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 5f2ed8d4-d991-4b7c-a57e-258d52176b8e

📥 Commits

Reviewing files that changed from the base of the PR and between ac827bd and 16f843b.

📒 Files selected for processing (5)
  • .gitignore
  • Makefile
  • spec/fuzz_encode_lib_spec.lua
  • tools/fuzz_encode.lua
  • tools/fuzz_encode_lib.lua

Comment thread Makefile
Comment on lines +15 to +17
stop_workers() { for pid in $$pids; do kill "$$pid" 2>/dev/null || true; done; cleanup; }; \
trap cleanup EXIT; \
trap stop_workers INT TERM; \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify trap + loop dependency on tmpdir markers.
rg -n -C2 'stop_workers\(\)|trap stop_workers|tmpdir|done_count|fail\.\*|done\.\*' Makefile

Repository: api7/lua-rapidjson

Length of output: 954


Stop the fuzz polling loop from continuing after INT/TERM cleanup
In Makefile’s fuzz target, stop_workers() (lines ~15–17) calls cleanup() which rm -rfs $$tmpdir, but the parent while :; do polling loop (lines ~39–50) keeps waiting on ls "$$tmpdir"/fail.* / ls "$$tmpdir"/done.*. After $$tmpdir is removed, fail.* never appears and done_count stays at 0, so the loop can sleep indefinitely.

Suggested fix
-	stop_workers() { for pid in $$pids; do kill "$$pid" 2>/dev/null || true; done; cleanup; }; \
+	stop_workers() { \
+		for pid in $$pids; do kill "$$pid" 2>/dev/null || true; done; \
+		exit 130; \
+	}; \
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
stop_workers() { for pid in $$pids; do kill "$$pid" 2>/dev/null || true; done; cleanup; }; \
trap cleanup EXIT; \
trap stop_workers INT TERM; \
stop_workers() { \
for pid in $$pids; do kill "$$pid" 2>/dev/null || true; done; \
exit 130; \
}; \
trap cleanup EXIT; \
trap stop_workers INT TERM; \
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 15 - 17, The polling loop in the fuzz target can hang
after stop_workers() calls cleanup() and removes $$tmpdir; update the loop that
waits on ls "$$tmpdir"/fail.* / done.* (the while :; do ... done loop) so it
breaks if the temp directory is gone (e.g., add a check if [ ! -d "$$tmpdir" ];
then break; fi) or also check for a sentinel file; alternatively, have
stop_workers() create a sentinel outside $$tmpdir before calling cleanup() and
make the loop exit when that sentinel is present. Reference: stop_workers(),
cleanup, the fuzz target while loop, $$tmpdir, fail.*, and done_count.

Comment thread Makefile
trap stop_workers INT TERM; \
worker=1; \
while [ "$$worker" -le "$(WORKERS)" ]; do \
seed=$$(( $(SEED) + $$worker - 1 )); \
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Verify mismatch between Makefile arithmetic requirement and fuzz lib seed normalization.
rg -n -C2 'seed=\$\$\(\(|SEED' Makefile
rg -n -C3 'normalize_seed|tonumber\(|SEED' tools/fuzz_encode_lib.lua

Repository: api7/lua-rapidjson

Length of output: 1770


🏁 Script executed:

#!/bin/bash
# Inspect Makefile around SEED and seed arithmetic.
sed -n '1,60p' Makefile

# Find other SEED usage in repo.
rg -n 'SEED\b' Makefile tools -S

Repository: api7/lua-rapidjson

Length of output: 1750


🏁 Script executed:

#!/bin/bash
# Inspect how SEED is passed into Lua and normalized.
sed -n '1,120p' tools/fuzz_encode.lua
sed -n '1,120p' tools/fuzz_encode_lib.lua

Repository: api7/lua-rapidjson

Length of output: 4798


Coerce SEED to numeric (matching Lua) before Make arithmetic
The fuzz recipe computes per-worker seeds with shell arithmetic (seed=$$(( $(SEED) + $$worker - 1 ))), which requires $(SEED) to be numeric; SEED=abc fails before Lua starts. Lua’s tools/fuzz_encode_lib.lua normalize_seed() already falls back to os.time() for non-numeric inputs, so the Makefile should apply the same numeric fallback (e.g., use date +%s when SEED isn’t an integer) instead of failing early.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` at line 20, The Makefile's fuzz recipe must coerce SEED to a
numeric fallback before doing shell arithmetic; modify the recipe to compute a
numeric SEED value (e.g., set SEED_NUM from SEED if it matches an integer,
otherwise use the current epoch like date +%s) and then compute per-worker seed
using that numeric SEED_NUM (replace the existing seed=$$(( $(SEED) + $$worker -
1 )) with arithmetic based on SEED_NUM). Reference the fuzz recipe, the SEED
variable, the per-worker seed calculation, and
tools/fuzz_encode_lib.lua::normalize_seed() so the behavior matches Lua's
fallback to os.time().

Comment thread tools/fuzz_encode_lib.lua
Comment on lines +1005 to +1009
local ordered, order_err = validate_key_order(rapidjson, json, entry)
if not ordered then
return false, order_err
end
end
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# Encode call should use cfg.sort_keys; validate call should currently omit it.
rg -nP -C3 '\bvalidate_encoded_case\s*\(' --type=lua
rg -nP -C2 'sort_keys\s*=\s*cfg\.sort_keys' --type=lua

Repository: api7/lua-rapidjson

Length of output: 3573


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Inspect validate_encoded_case and validate_key_order usage + relevant logic
sed -n '930,1060p' tools/fuzz_encode_lib.lua | nl -ba | sed -n '1,200p'

Repository: api7/lua-rapidjson

Length of output: 105


🏁 Script executed:

#!/bin/bash
set -euo pipefail
# Inspect validate_encoded_case and key-order check logic
sed -n '920,1060p' tools/fuzz_encode_lib.lua | nl -ba

Repository: api7/lua-rapidjson

Length of output: 105


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Validate key-order logic and where it's called
sed -n '940,1025p' tools/fuzz_encode_lib.lua | cat -n

echo "----"
sed -n '1025,1105p' tools/fuzz_encode_lib.lua | cat -n

echo "----"
# show the exact snippet around the originally referenced lines
sed -n '990,1025p' tools/fuzz_encode_lib.lua | cat -n

Repository: api7/lua-rapidjson

Length of output: 6863


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# Locate and inspect validate_key_order and string_keys
rg -n --hidden --no-mmap "function validate_key_order|local function validate_key_order|string_keys\s*\(" tools/fuzz_encode_lib.lua
echo "----"
rg -n --hidden --no-mmap "local function string_keys|function string_keys" tools/fuzz_encode_lib.lua
echo "----"

# Print the relevant regions around the definitions
# (Use sed ranges based on the match lines; adjust after seeing first pass.)
python3 - <<'PY'
import re, subprocess, textwrap, os, sys, json, pathlib
import subprocess
p=subprocess.run(["rg","-n","--hidden","--no-mmap","function validate_key_order|local function validate_key_order|string_keys\\s*\\(","tools/fuzz_encode_lib.lua"],capture_output=True,text=True)
print(p.stdout)
PY

Repository: api7/lua-rapidjson

Length of output: 653


🏁 Script executed:

#!/bin/bash
set -euo pipefail

sed -n '270,360p' tools/fuzz_encode_lib.lua | cat -n
echo "----"
sed -n '770,900p' tools/fuzz_encode_lib.lua | cat -n
echo "----"
sed -n '900,980p' tools/fuzz_encode_lib.lua | cat -n

Repository: api7/lua-rapidjson

Length of output: 9789


🏁 Script executed:

#!/bin/bash
set -euo pipefail

rg -n --hidden --no-mmap -S "SORT_KEYS|sort_keys\s*=" -g'*.lua' .

Repository: api7/lua-rapidjson

Length of output: 1600


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- tools/fuzz_encode.lua around validate call and failure handling ----"
sed -n '1,140p' tools/fuzz_encode.lua | cat -n

echo "---- tools/fuzz_encode_lib.lua config defaults / env parsing for sort_keys ----"
sed -n '1,120p' tools/fuzz_encode_lib.lua | cat -n

echo "---- spec/fuzz_encode_lib_spec.lua around SORT_KEYS=0 ----"
sed -n '1,120p' spec/fuzz_encode_lib_spec.lua | cat -n

Repository: api7/lua-rapidjson

Length of output: 11320


🏁 Script executed:

#!/bin/bash
set -euo pipefail

echo "---- validate_encoded_case usages in spec/fuzz_encode_lib_spec.lua ----"
rg -n "validate_encoded_case" spec/fuzz_encode_lib_spec.lua

echo "---- sort_keys usages in spec/fuzz_encode_lib_spec.lua ----"
rg -n "sort_keys" spec/fuzz_encode_lib_spec.lua

Repository: api7/lua-rapidjson

Length of output: 961


🏁 Script executed:

#!/bin/bash
set -euo pipefail
rg -n --hidden --no-mmap -S "\bvalidate_encoded_case\s*\(" .

Repository: api7/lua-rapidjson

Length of output: 855


Fix key-order validation to respect sort_keys=false

tools/fuzz_encode.lua encodes with sort_keys = cfg.sort_keys, but fuzz.validate_encoded_case/validate_key_order always enforces member order against entry.keys (built from string_keys() which sorts keys alphabetically). When SORT_KEYS=0 makes cfg.sort_keys false, validation can still reject the generated JSON and the worker aborts via os.exit(1).

Skip the order check when sorting is disabled (and thread the flag into validate_encoded_case):

🐛 Proposed fix
-    local ordered, order_err = validate_key_order(rapidjson, json, entry)
-    if not ordered then
-      return false, order_err
-    end
+    if sort_keys ~= false then
+      local ordered, order_err = validate_key_order(rapidjson, json, entry)
+      if not ordered then
+        return false, order_err
+      end
+    end

Update signature in tools/fuzz_encode_lib.lua:

function M.validate_encoded_case(rapidjson, case, json, sort_keys)

Pass from the worker in tools/fuzz_encode.lua:

local valid, reason =
  fuzz.validate_encoded_case(rapidjson, generated_case, json_or_err, cfg.sort_keys)

Add a regression test for the SORT_KEYS=0/sort_keys=false path: existing validate_encoded_case coverage encodes with { sort_keys = true } and doesn’t cover “encoded without sorting, so don’t enforce sorted member order”.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tools/fuzz_encode_lib.lua` around lines 1005 - 1009, validate_key_order is
always enforcing alphabetical member order because validate_encoded_case (in
tools/fuzz_encode_lib.lua) doesn’t receive the sort_keys flag; when
cfg.sort_keys is false we should skip order validation. Change the signature of
M.validate_encoded_case to accept a sort_keys boolean (e.g. function
M.validate_encoded_case(rapidjson, case, json, sort_keys)) and thread
cfg.sort_keys from the caller in tools/fuzz_encode.lua when calling
fuzz.validate_encoded_case; inside validate_encoded_case/validate_key_order, if
sort_keys is false, skip or short-circuit the member-order check (the parts that
use entry.keys / string_keys()), so generated JSON encoded with sort_keys=false
won’t be rejected. Also add a regression test covering SORT_KEYS=0 to ensure
order validation is skipped.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant