Unify SolrIndexSearcher paths with CollectorManager integration#4487
Open
markrmiller wants to merge 1 commit into
Open
Unify SolrIndexSearcher paths with CollectorManager integration#4487markrmiller wants to merge 1 commit into
markrmiller wants to merge 1 commit into
Conversation
…ased entry point Replaces the dual sequential/parallel branches in getDocListNC and getDocListAndSetNC with a single private searchAndCollect helper that internally selects between single-slice and multi-slice execution. The top-level methods no longer fork on MultiThreadedSearcher.allowMT. Key changes: - Extracts TopDocsCM, MaxScoreCM, DocSetCM, FixedBitSetCollector, SearchResult, TopDocsResult, and MaxScoreResult into a new SolrCollectorManagers utility. Adds PostFilterCM<T> (single-slice; setLastDelegate per newCollector, complete() at reduce) and DocSetCollectorCM (preserves the SortedIntDocSet sparse optimization). Multi-slice paths keep the FixedBitSet-based DocSetCM. - Deletes MultiThreadedSearcher; allowMT is inlined as the canParallelize check inside searchAndCollect. - Migrates the collector chain (segment-terminate-early, max-hits early termination, optional post-filter, optional cancellation) to a CollectorManager-based ChainCM applied per newCollector(). Single-slice vs multi-slice is chosen by ChainCM.requiresSingleSlice() OR the caller's parallelEligible hint. - Replaces the outer CancellableCollector with a shared AtomicBoolean polled by every per-slice CancellableCollector, so a single tracker.cancel(queryID) is observed across all parallel slices. - ChainCM.getScoreMode() returns the chain-wrapped collector's score mode so a Solr post-filter override (e.g. CollapsingPostFilter forcing COMPLETE) reaches populateScoresIfNeeded. Guaranteed non-null at the call site: Lucene's IndexSearcher.search calls newCollector() unconditionally before its empty-index short-circuit, so the call site uses Objects.requireNonNull rather than an unreachable fallback. - TopDocsCM.reduce skips TopDocs.merge for a single input, preserving the underlying TopDocsCollector's EQUAL_TO / GREATER_THAN_OR_EQUAL_TO relation exactly. - Aligns SolrMultiCollectorManager with Lucene's MultiCollector: per-child CollectionTerminatedException handling, per-child setMinCompetitiveScore aggregation (MinCompetitiveScoreAwareScorable), and removal of the EarlyTerminatingCollector wrap (ChainCM owns it, with a shared LongAdder across slices to preserve cross-slice maxHitsAllowed semantics). Clamps EarlyTerminatingCollector chunkSize to >= 1 to avoid divide-by-zero when maxDocsToCollect < 10. A few hot-path alignments with Lucene's MultiCollector pattern that together avoid a regression on heavy multi-term boolean OR queries: - searchAndCollect skips SolrMultiCollectorManager entirely when only one inner CollectorManager is built (top-N without fl=score and without docSet — the common case), routing through a new SolrCollectorManagers.wrapSingle adapter. Mirrors Lucene's own MultiCollectorManager.wrap() short-circuit for the one-collector case. - SolrMultiCollectorManager$Collectors$LeafCollectors.collect now matches MultiCollector.MultiLeafCollector#collect exactly: the per-iteration body is load-slot / null-check / call, and "all children terminated" is detected lazily inside the rare CollectionTerminatedException catch via allLeavesTerminated(). - QueryCancelledException and EarlyTerminatingCollectorException override fillInStackTrace() to return this, matching CollectionTerminatedException — these are control-flow exceptions and should not pay for stack-trace fill. No behavior change: numFound, ranked doc_ids, maxScore, numFoundExact and start match the parent commit on every probed (query, mode, multiThreaded) cell, including post-filter, STE, and post-filter + STE combinations. Verified with the full org.apache.solr.search.* suite plus targeted runs for TestMultiThreadedSearcher, TestEarlyTerminatingQueries, TestQueryLimits, CursorPagingTest, TestRankQueryPlugin, TestCollapseQParserPlugin, TestRandomCollapseQParserPlugin, and TestReRankQParserPlugin.
dsmiley
reviewed
Jun 2, 2026
Contributor
dsmiley
left a comment
There was a problem hiding this comment.
Very impressive... this is really an overhaul of multithreaded collection/search, not just a unification.
Glad to see benchmarks! But I'm surprised to see that solr/benchmark (JMH) wasn't used... why not? I'm also having trouble interpreting what you posted.
100 segments is a lot; not very realistic
| } | ||
| } | ||
|
|
||
| static class FixedBitSetCollector extends SimpleCollector { |
Contributor
There was a problem hiding this comment.
could use a bit more docs/comments to explain what's going on
Contributor
There was a problem hiding this comment.
surprisingly a net increase in LOC ... but more documentation so that's good
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Replaces the dual sequential/parallel branches in getDocListNC and getDocListAndSetNC with a single private searchAndCollect helper that internally selects between single-slice and multi-slice execution. The top-level methods no longer fork on MultiThreadedSearcher.allowMT. Also makes canceling a query when doing parallel segment search actually do something.
Performance — this commit vs parent
1.5M docs / 150 segments, 300 samples per cell per build, combined from two independent round-robin sessions (parent and this commit runs alternated within each session against a single shared
solr-home). 30 warmup + 30 timed runs per cell per round.sudo powermetrics --samplers cpu_power,thermalran in parallel at 1s cadence for both sessions; thermal verification below.Negative delta = this commit faster than parent.
sudo powermetrics --samplers cpu_power,thermal -i 1000ran in parallel for both sessions (~13 min each).No
Fair/Serious/Criticalevents for any sample in any run across either session.