Carve deleted rows out of a SQLite database without trusting it, without writing to it, and without re-surfacing a single live row.
Measured against independent third-party ground truth — the SQLite Forensic Corpus (Nemetz, Schmitt & Freiling, DFRWS-EU 2018, CC0), whose authors shipped a per-row deleted answer key — and reported as a reproducible per-database confusion matrix (
docs/recovery-comparison.md). The honest headline: precision is the highest of any tool measured (it never re-surfaces a live row as "deleted", and emits only a small low-confidence phantom class), and freeblock-aware reconstruction recovers ~75 % of in-page deletions on the cleanest category — essentially matchingfqlite(~80 %) at roughly five times fewer false rows, and Pareto-dominating it on overflow. We report the true numbers, not the fixture-flattering ones.
Every browser history, every chat app, every mobile artifact is a SQLite file — and the forensically interesting rows are usually the deleted ones. The standard sqlite3/rusqlite path cannot see them: it reads the live b-tree and stops. sqlite-forensic reads the raw file format itself — freelist pages, in-page free blocks, dropped-table pages, and an uncheckpointed WAL overlay — and recovers what the live query cannot, as severity-graded, confidence-scored observations.
This is a Rust library workspace (two crates, no CLI yet). Point the analyzer at the file bytes and get graded findings plus carved deleted records:
use sqlite_core::Database;
use sqlite_forensic::{audit, carve_all_deleted_records};
let db = Database::open(std::fs::read("History")?)?; // read-only, owns the bytes
// 1. Graded header / freelist / WAL anomalies
for anomaly in audit(&db) {
println!("[{:?}] {} — {}", anomaly.severity, anomaly.code, anomaly.kind.note());
}
// 2. Deleted rows carved from free space — column count inferred per record
for rec in carve_all_deleted_records(&db) {
println!("recovered rowid {} from page {} (allocated: {})",
rec.rowid, rec.page, rec.allocated);
}The reader (sqlite-core) answers "what does this file actually contain?"; the analyzer (sqlite-forensic) grades the forensically notable parts and recovers the deleted ones.
| sqlite-forensic | rusqlite / sqlite3 |
|
|---|---|---|
| Read live rows | ✅ | ✅ |
| Read-only on the evidence file | ✅ | ✅ (with care) |
| Recover deleted rows from freelist pages | ✅ | — |
| Recover deleted rows from in-page free blocks | ✅ | — |
| Recover dropped-table rows (column count inferred) | ✅ | — |
| Read uncheckpointed WAL overlay as a separate view | ✅ | applied silently |
| Graded, confidence-scored anomaly findings | ✅ | — |
| Refuses to ever re-surface a live row as "deleted" | ✅ | n/a |
forbid(unsafe), panic-free on hostile input |
✅ | C / FFI |
This is one workspace (sqlite-forensic) with two members, following the fleet reader/analyzer split:
| Crate | Role | Entry points |
|---|---|---|
sqlite-core |
The raw, read-only, panic-free file-format reader: header parse, b-tree walk, freelist + overflow chains, and a read-only WAL overlay. No findings. | Database::open, Database::open_with_wal, freelist_pages, read_table, carve_free_regions, live_rowids |
sqlite-forensic |
The anomaly auditor + deleted-record carver: grades observations into forensicnomicon::report::Findings and recovers deleted rows. Depends on sqlite-core. |
audit, audit_findings, carve_all_deleted_records, carve_deleted_records |
sqlite-forensic accepts an in-memory Database (built from &[u8]) — it is medium-agnostic and has no dependency on any image format or container layer. Findings flow into the shared forensicnomicon::report model, so a SQLite database's anomalies aggregate uniformly with the partition / container / filesystem layers in a triage report.
audit() emits stable, scheme-prefixed codes (a published contract — never re-spelled). Each is an observation ("consistent with …"), graded for severity; the examiner draws the conclusion.
| Code | Severity | What it observes |
|---|---|---|
SQLITE-DELETED-RECORD-RECOVERED |
Medium | A record-shaped cell recovered from unallocated space — consistent with a deleted row not yet overwritten. Carries page / offset / rowid provenance. |
SQLITE-FREELIST-NONEMPTY |
Low | The database holds free pages — consistent with prior deletions (DELETE without VACUUM); those pages may retain recoverable rows. |
SQLITE-WAL-UNCHECKPOINTED |
Medium | A -wal sidecar carries committed page versions the main file does not reflect — the main file alone under-reports the true state. |
SQLITE-PAGECOUNT-MISMATCH |
High | The in-header page count disagrees with the count implied by file length — consistent with truncation, carving, or out-of-band modification. |
SQLITE-RESERVED-SPACE-NONZERO |
Low | The header reserves bytes per page — non-standard; consistent with a page-level extension such as encryption (SQLCipher/SEE) or a checksum VFS. |
The AnomalyKind enum is #[non_exhaustive]: new codes can be added without a breaking change, so downstream match arms must carry a _ arm.
A carver that over-reports is worse than useless on an evidence database — it manufactures rows that were never deleted. The design goal of this carver is therefore precision over recall, enforced structurally rather than by inspection:
- Read-only, panic-free,
forbid(unsafe)—Database::openowns aVec<u8>and never writes back to the artifact; the whole workspace deniesunsafeat compile time and reads every length/offset through bounds-checked helpers, so a malformed, attacker-controlled database cannot reach a raw-pointer path or panic. - Measured against independent third-party ground truth. Recall and precision are computed per database against the SQLite Forensic Corpus (Nemetz, Schmitt & Freiling, DFRWS-EU 2018, CC0), whose authors shipped a per-row deleted-record answer key — so the truth set is theirs, not ours. The harness (
forensic/tests/nemetz_metrics.rs) emits a reproducible confusion matrix; the full table is indocs/recovery-comparison.md. - High precision, structurally — never a live-row re-read. Our carver carves only the complement of the live cell extents on a page, then drops any carved record whose rowid is currently live. Across the Nemetz recall corpus it produces 0 live-re-reads (verified against the answer key's live rows), with only a small, low-confidence phantom class (all-empty/NULL records the inferred carver matches on a run of zero bytes). The two over-reporting failure modes the reference oracles exhibit on no-deletion databases — re-reading live cells, and re-surfacing a stale byte-copy of a live row — our carver does not.
- Strong in-page recall via freeblock reconstruction — reported honestly. On the cleanest category (
0C: records deleted in place,secure_delete=0, no overwrite, so every deleted row's bytes survive) the carver recovers 79 of 101 rows (~78 %). SQLite overwrites a freed cell's first four bytes (payload-length + rowid varints,header_len, leading serial) with the freeblock pointer;reconstruct_freeblock_recordsrebuilds each record from its surviving serial-type tail plus a schema template derived from a live cell on the same page, with the destroyed rowid surfaced as unknown. This recovers fqlite-level recall (fqlite ~80 %) at higher precision and 0 live-re-reads. The earlier "163 of 163, 0 false positives" claim was specific to our own whole-freed-page fixture and is retracted as not representative. - Secondary checks stay labelled as such. The undark/fqlite differential (
docs/validation.md) is inter-tool concordance (the oracles disagree with each other — agreement, not correctness), and the DC3sqlite_dissectcorpus is a no-false-positive regression set (itsexpected_rowsare live content, not a deleted set), never a recall oracle.
Carved records remain confidence-graded observations ("consistent with a deleted row"), never a verdict. The honest summary: a strict precision discipline confirmed against independent ground truth, and a documented in-page recall gap — not a claim of perfect recall or proof of correctness.
Honest gaps (tracked, not hidden): there is no CI workflow and no line-coverage gate in this repo yet, and the carver is not yet fuzzed — all three are planned to bring it level with the Paranoid-Gatekeeper bar the rest of the fleet enforces. The safety lints (unsafe_code = forbid, unwrap_used/expect_used = deny) and the cargo-deny supply-chain gate are enforced today.
docs/validation.md— the Doer-Checker differential: how the carver was reconciled against undark and fqlite, page-level divergence diagnosis, build recipes.docs/recovery-comparison.md— the measured per-database recall/precision confusion matrix against independent Nemetz ground truth, with the undark/fqlite concordance and DC3 no-FP regression set as secondary checks.docs/corpus-catalog.md— every test fixture with its verbatim generator command and MD5.tests/data/README.md— the committed synthetic fixtures, co-located.
sqlite-forensic is the SQLite file-format parser in the RapidTriage DFIR toolkit:
| Crate | Artifact family |
|---|---|
| sqlite-forensic | SQLite databases (b-tree, freelist, WAL, deleted-record carving) |
| browser-forensic | Chrome / Firefox / Safari |
| winevt-forensic | Windows Event Logs (EVTX) |
| srum-forensic | Windows SRUM / ESE |
| memory-forensic | Process memory, page tables |
| forensicnomicon | Artifact catalog, format constants, report model |
Privacy Policy · Terms of Service · © 2026 Security Ronin Ltd