Reduce regular CI test suite runtime#4495
Conversation
…subclass with full 10 iterations
…class with full 10 iterations
…tly subclass with full 3 iterations
…add nightly subclass with full 5 iterations
… not meaningful at that scale
…ow for regular CI
…c via if (!TEST_NIGHTLY) return
This is an integration test with heavy ZooKeeper interaction and multiple cluster restarts. Move to nightly-only to reduce regular CI time.
Integration test requiring embedded Kafka cluster — inherently slow and resource-intensive. Move to nightly-only to reduce regular CI overhead.
Full GCS backup/restore integration test — slow by nature. Move to nightly to reduce regular CI run time.
Full S3 backup/restore integration test using an embedded S3Mock — slow by nature. Move to nightly to reduce regular CI run time.
Integration test that progressively degrades a live cluster. Slow and resource-intensive by design. Move to nightly-only CI.
Integration test for per-replica ZK state management — creates multiple clusters per test method. Even the file's Javadoc notes it would be faster with ZK simulation. Move to nightly-only CI.
Has @repeat(30) on testCreateDelete and exercises pull replica replication across multiple test methods — one of the heaviest cloud tests. Move to nightly-only CI where the high repeat count is appropriate.
There was a problem hiding this comment.
Pull request overview
This PR aims to reduce regular CI runtime by (1) restructuring an expensive distributed test to reuse cluster setup, (2) lowering @Repeat iteration counts in regular CI while preserving full coverage via @Nightly subclasses, and (3) moving inherently slow integration/stress tests to nightly CI only.
Changes:
- Consolidate
DistributedCombinedQueryComponentTestcoverage into a single test method to reduce repeated cluster lifecycle overhead. - Reduce
@Repeatiterations for regular CI and add@Nightlysubclasses that run the original iteration counts in nightly CI. - Annotate multiple slow integration/stress tests with
@Nightlyso they run only in nightly CI.
Reviewed changes
Copilot reviewed 19 out of 19 changed files in this pull request and generated 2 comments.
Show a summary per file
| File | Description |
|---|---|
| solr/solrj/src/test/org/apache/solr/common/cloud/PerReplicaStatesIntegrationTest.java | Mark per-replica-states integration test as @Nightly to keep regular CI faster. |
| solr/solrj-streaming/src/test/org/apache/solr/client/solrj/io/stream/BadClusterTest.java | Move slow/cluster-degrading streaming test to nightly via @Nightly. |
| solr/modules/s3-repository/src/test/org/apache/solr/s3/S3IncrementalBackupTest.java | Run S3 incremental backup integration test only in nightly. |
| solr/modules/gcs-repository/src/test/org/apache/solr/gcs/GCSIncrementalBackupTest.java | Run GCS incremental backup integration test only in nightly. |
| solr/cross-dc-manager/src/test/org/apache/solr/crossdc/manager/SolrAndKafkaIntegrationTest.java | Move embedded-Kafka integration test to nightly via @Nightly. |
| solr/core/src/test/org/apache/solr/search/TestSolr4Spatial2.java | Reduce regular CI repeat count for testLLPDecodeIsStableAndPrecise. |
| solr/core/src/test/org/apache/solr/search/TestSolr4Spatial2Nightly.java | Add nightly subclass restoring the original repeat count. |
| solr/core/src/test/org/apache/solr/search/facet/SpatialHeatmapFacetsTest.java | Reduce regular CI repeat count for testPng. |
| solr/core/src/test/org/apache/solr/search/facet/SpatialHeatmapFacetsNightlyTest.java | Add nightly subclass restoring the original repeat count. |
| solr/core/src/test/org/apache/solr/handler/tagger/RandomizedTaggerTest.java | Reduce regular CI repeat count at class level. |
| solr/core/src/test/org/apache/solr/handler/tagger/RandomizedTaggerNightlyTest.java | Add nightly subclass restoring the original repeat count. |
| solr/core/src/test/org/apache/solr/handler/component/DistributedCombinedQueryComponentTest.java | Merge multiple test methods into one to reduce repeated distributed cluster setup. |
| solr/core/src/test/org/apache/solr/cloud/UnloadDistributedZkTest.java | Move distributed unload test to nightly via @Nightly. |
| solr/core/src/test/org/apache/solr/cloud/TestPullReplica.java | Move heavy pull-replica test class to nightly via @Nightly. |
| solr/core/src/test/org/apache/solr/cloud/SyncSliceTest.java | Move leader-election/peer-sync style test to nightly via @Nightly. |
| solr/core/src/test/org/apache/solr/cloud/RollingRestartTest.java | Move rolling restart stress test to nightly via @Nightly. |
| solr/core/src/test/org/apache/solr/cloud/RecoveryZkTest.java | Move recovery stress test to nightly via @Nightly. |
| solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderTest.java | Reduce regular CI repeat count for testCreepThenBite. |
| solr/core/src/test/org/apache/solr/cloud/CloudExitableDirectoryReaderNightlyTest.java | Add nightly subclass restoring the original repeat count. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…ponentTest Replaces escaped string concatenations with Java text blocks, making the JSON structure readable. Also fixes a latent bug in facetQuery where a missing comma between "offset":1 and "fields" caused noggit to silently ignore the fields parameter.
|
More of a comment, but I wonder if we need to change from a |
| + "\"lexical2\":{\"lucene\":{\"query\":\"id:(4^1 OR 5^2 OR 7^3 OR 10^2)\"}}}," | ||
| + "\"fields\":[\"id\",\"score\",\"title\"]," | ||
| + "\"params\":{\"combiner\":true,\"combiner.query\":[\"lexical1\",\"lexical2\"]}}", | ||
| """ |
There was a problem hiding this comment.
I really like the use of """ as well... Kind of wish we had one sweep through our code base for all this!
There was a problem hiding this comment.
I like the idea of a swiping modernization for large complex strings.
And we could also do a swiping change to identify candidates for class->record
|
|
||
| /** Nightly variant of {@link RandomizedTaggerTest} that runs the full 10 iterations. */ | ||
| @Nightly | ||
| @Repeat(iterations = 10) |
There was a problem hiding this comment.
I thought I had seem a pattern where you would change the number of iterations based on some system property? Versus ahving seperate Java files?
There was a problem hiding this comment.
Yea, that works some places, but this annotation requires a constant, so we need the subclass override hack. Tried to find a better way...
| import software.amazon.awssdk.regions.Region; | ||
|
|
||
| // Backups do checksum validation against a footer value not present in 'SimpleText' | ||
| @LuceneTestCase.Nightly |
There was a problem hiding this comment.
So we can't do anything to make this faster instead?
This PR reduces regular CI test suite runtime through two strategies:
@Repeatiteration counts for regular vs nightly CI@NightlyTests annotated
@Nightlycontinue to run in the dedicated nightly CI job (-Ptests.nightly=true) with no loss of coverage. Regular PR/branch CI becomes significantly faster.Strategy 1 — Fix inefficient test structure
DistributedCombinedQueryComponentTest(~80s → ~10s, 10x speedup)Root cause: The test had 6 separate
@Testmethods, all operating on an identical document set.BaseDistributedSearchTestCaseuses a method-level@Rule(ShardsRepeatRule) — not a@ClassRule— so it creates and destroys the full distributed cluster for each test method. With 6 + 2 extra methods, the setup/teardown overhead (≈9s each) dominated the 80s runtime.Fix: Merged the 6 same-dataset methods into a single
testCombinedQueries()method, reducing cluster lifecycles from 8 to 3. All assertions for single-lexical matching, multi-lexical matching, sorting, pagination, faceting, and facet+highlighting are preserved — just executed within one cluster lifecycle.Strategy 2 — Calibrate
@Repeatcounts (regular CI vs nightly)@Repeatrequires a compile-time constant soTEST_NIGHTLY ? N : Mcannot be used directly in the annotation. The solution uses the subclass pattern: reduce the count in the base class for regular CI, then create a one-liner*NightlyTestsubclass annotated@Nightly @Repeat(originalCount)that inherits all tests and runs with the full count nightly.This preserves all framework semantics: each iteration gets a distinct random seed, independent setup/teardown, separate failure reporting, and unique test naming — benefits that would be lost by converting to a plain loop.
RandomizedTaggerTestRandomizedTaggerNightlyTestTestSolr4Spatial2(testLLPDecodeIsStableAndPrecise)TestSolr4Spatial2NightlySpatialHeatmapFacetsTest(testPng)SpatialHeatmapFacetsNightlyTestCloudExitableDirectoryReaderTest(testCreepThenBite)CloudExitableDirectoryReaderNightlyTestStrategy 3 — Move 10 tests to
@NightlyThese tests are slow not because of a fixable design issue, but because they are inherently integration/stress tests that exercise complex distributed behavior, external infrastructure, or require many repetitions to catch race conditions. They belong in nightly CI.
RollingRestartTestsolr:coreSyncSliceTestsolr:coreRecoveryZkTestsolr:coreif (!TEST_NIGHTLY)branch also reveals it was written with nightly in mind.UnloadDistributedZkTestsolr:coreSolrAndKafkaIntegrationTestsolr:cross-dc-managerEmbeddedKafkaCluster) alongside a full SolrCloud cluster. The external broker startup/shutdown alone makes this integration-only.GCSIncrementalBackupTestsolr:modules:gcs-repositoryS3IncrementalBackupTestsolr:modules:s3-repositoryS3MockRule. Full backup lifecycle per test method.BadClusterTestsolr:solrj-streamingPerReplicaStatesIntegrationTestsolr:solrjTestPullReplicasolr:core@Repeat(30)ontestCreateDelete— 30 full collection create/delete cycles. Multiple other methods exercise pull replica replication, which requires waiting for index replication to complete. One of the heaviest cloud tests in the suite.