Skip to content

Chunk queries to avoid going over the sql parameter limit#72

Merged
hahn-kev merged 5 commits into
mainfrom
prevent-too-many-parameters
Jun 12, 2026
Merged

Chunk queries to avoid going over the sql parameter limit#72
hahn-kev merged 5 commits into
mainfrom
prevent-too-many-parameters

Conversation

@hahn-kev

@hahn-kev hahn-kev commented Jun 11, 2026

Copy link
Copy Markdown
Collaborator

fixes sillsdev/languageforge-lexbox#2339

Summary by CodeRabbit

  • Bug Fixes
    • Fixes filtering and sync behavior when very large batches exceed database parameter limits, preventing missed or duplicated records.
  • Chores
    • Improved performance and reliability when processing large numbers of snapshot updates.
  • Tests
    • Added tests that validate correct behavior under constrained database parameter limits.

@hahn-kev hahn-kev requested a review from myieye June 11, 2026 03:55
@coderabbitai

coderabbitai Bot commented Jun 11, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

📝 Walkthrough

Walkthrough

Queries that used unbounded Contains/IN were changed to use EF.Parameter(...) for both snapshot and commit filtering; tests were added that lower SQLite’s variable limit to verify behavior when many IDs/entities are involved.

Changes

SQLite parameterization fixes

Layer / File(s) Summary
UpdateSnapshots: parameterized snapshot lookup
src/SIL.Harmony/DataModel.cs
Initializes snapshotLookup and, when commitsToApply.Count > 10, builds distinct entity IDs and queries repo.CurrentSnapshots() using EF.Parameter(entityIds).Contains(s.EntityId) to populate the lookup.
FilterExistingCommits: parameterized commit filter
src/SIL.Harmony/Db/CrdtRepository.cs
Precomputes commitIds and uses EF.Parameter(commitIds).Contains(c.Id) in the exclusion WHERE clause instead of inlining commits.Select(...).Contains(...).
SyncTests: SQLite-limit integration test
src/SIL.Harmony.Tests/SyncTests.cs
Adds using directives for Microsoft.Data.Sqlite/EF and a [Fact] that lowers SQLite's variable limit on the connection, creates commits whose total entity changes exceed that limit, runs AddRangeFromSync, and asserts final table counts.
RepositoryTests: FilterExistingCommits limit test
src/SIL.Harmony.Tests/RepositoryTests.cs
Adds Microsoft.Data.Sqlite using and a test that reduces SQLite's limit, inserts many commits, calls FilterExistingCommits, and asserts the count of filtered commits and oldest change identity.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

  • sillsdev/harmony#56: Modified UpdateSnapshots previously; this PR further adjusts snapshot-query behavior to avoid SQLite parameter overflow.

Suggested reviewers

  • myieye

Poem

🐰 I nibbled on binds and found a fix,
Batching IDs so queries play tricks,
SQLite's limits now softly yield,
Commits and snapshots safely sealed,
Hoppity happy — no more SQL mix!

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and concisely describes the main change: implementing logic to avoid exceeding SQLite's parameter limit in queries.
Linked Issues check ✅ Passed The PR successfully addresses issue #2339 by implementing parameter limiting in both DataModel.UpdateSnapshots and CrdtRepository.FilterExistingCommits using EF.Parameter() to handle unbounded entity/commit ID lists.
Out of Scope Changes check ✅ Passed All changes are directly related to fixing the SQLite parameter limit issue: DataModel and CrdtRepository were modified to use EF.Parameter(), and comprehensive tests were added to validate the fix.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch prevent-too-many-parameters

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

myieye added 4 commits June 11, 2026 11:46
Appears to be significantly faster as the snapshot table grows. Only marginally slower on an empty DB.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/SIL.Harmony/Db/CrdtRepository.cs (1)

116-116: 💤 Low value

Consider materializing commitIds for clarity.

The deferred execution of commitIds is safe here because commits is an ICollection<Commit> (already materialized) and isn't modified before the query executes. However, materializing the projection explicitly would make the intent clearer and guard against subtle bugs if the code is refactored later.

♻️ Optional materialization for clarity
-        var commitIds = commits.Select(c => c.Id);
+        var commitIds = commits.Select(c => c.Id).ToArray();
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/SIL.Harmony/Db/CrdtRepository.cs` at line 116, The projection assigned to
commitIds uses deferred LINQ (var commitIds = commits.Select(c => c.Id));
materialize it immediately to make intent explicit and guard against future
refactoring by replacing the assignment with an eagerly-evaluated collection
(e.g., commitIds = commits.Select(c => c.Id).ToList() or ToArray()); update the
code in CrdtRepository.cs where commitIds is defined so downstream usage
consumes the materialized list/array instead of an IEnumerable deferred query.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/SIL.Harmony/Db/CrdtRepository.cs`:
- Line 116: The projection assigned to commitIds uses deferred LINQ (var
commitIds = commits.Select(c => c.Id)); materialize it immediately to make
intent explicit and guard against future refactoring by replacing the assignment
with an eagerly-evaluated collection (e.g., commitIds = commits.Select(c =>
c.Id).ToList() or ToArray()); update the code in CrdtRepository.cs where
commitIds is defined so downstream usage consumes the materialized list/array
instead of an IEnumerable deferred query.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a9f376aa-aed7-446d-b899-404023c37a7d

📥 Commits

Reviewing files that changed from the base of the PR and between 3f4e685 and dbe2d55.

📒 Files selected for processing (4)
  • src/SIL.Harmony.Tests/RepositoryTests.cs
  • src/SIL.Harmony.Tests/SyncTests.cs
  • src/SIL.Harmony/DataModel.cs
  • src/SIL.Harmony/Db/CrdtRepository.cs

@myieye myieye left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude benchmarked your chunking fix vs just telling EF to emit a single parameter.

  • On an empty DB they're more or less equal (which is arguably the most common scenario that will run into this)
  • A single parameter (i.e. EF.Parameter) quickly becomes faster as the project gains content. E.g. with only 1,000 snapshots/commits it's already twice as fast and its advantage grows linearly with the size of the tables.

So, I refactored to EF.Parameter, which seemed to be the overall winner and is also a much smaller change.

Claude also noticed another spot that would have the same problem on big projects, so I fixed that too.

@hahn-kev

Copy link
Copy Markdown
Collaborator Author

Awesome, I was wondering why it wasn't using json already, I didn't realize you could force it to do that using Parameter.

@hahn-kev hahn-kev merged commit 7df9a3d into main Jun 12, 2026
6 checks passed
@hahn-kev hahn-kev deleted the prevent-too-many-parameters branch June 12, 2026 01:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Project download exception: 'too many SQL variables'

2 participants