Skip to content

[client] Fix tiering hang on first_row merge engine empty batches#3242

Merged
luoyuxia merged 1 commit into
apache:mainfrom
Kaixuan-Duan:issue-2371-tiering-stuck
Jun 5, 2026
Merged

[client] Fix tiering hang on first_row merge engine empty batches#3242
luoyuxia merged 1 commit into
apache:mainfrom
Kaixuan-Duan:issue-2371-tiering-stuck

Conversation

@Kaixuan-Duan
Copy link
Copy Markdown
Contributor

Purpose

Linked issue: close #2371

Brief change log

  • fluss-client
    • ScanRecords: add a Map<TableBucket, Long> nextLogOffsets field with a new two-arg constructor (legacy single-arg constructor preserved for backwards compatibility); expose two new accessors:
      • polledBuckets() — union of buckets that produced records and buckets that only advanced their next fetch offset.
      • nextLogOffset(bucket) — exclusive upper bound of consumed offsets in this poll round, or null if the bucket was not polled.
    • LogFetchCollector#collectFetch: always record the advanced nextFetchOffset per polled bucket, even when the materialized record list is empty, and pack it into the new ScanRecords constructor.
  • fluss-flink-common
    • TieringSplitReader#forLogRecords: iterate scanRecords.polledBuckets() instead of scanRecords.buckets(). Determine end-of-range by comparing the scanner-reported nextLogOffset against the bucket's stoppingOffset (with the legacy lastRecord.logOffset() >= stoppingOffset - 1 check kept as a fallback for callers that don't supply nextLogOffset). Tolerate splits that finish with no real record observed by falling back to UNKNOWN_BUCKET_TIMESTAMP when computing the finish timestamp.

Tests

./mvnw -pl fluss-client,fluss-flink/fluss-flink-common \
-Dtest='ScanRecordsTest,LogFetchCollectorTest,TieringSplitReaderTest#testTieringFirstRowMergeEngineFinishes' \
-DfailIfNoTests=false test

API and Format

Documentation

@Kaixuan-Duan
Copy link
Copy Markdown
Contributor Author

@luoyuxia @zuston Hi, I have tried to resolve issue #2371. PTAL

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a tiering hang for tables using the FIRST_ROW merge engine where the log offset can advance via “empty” WAL batches (recordCount=0) even when no ScanRecord is materialized, causing the tiering layer to repeatedly poll the same range (Issue #2371).

Changes:

  • Extend fluss-client ScanRecords to carry per-bucket nextLogOffsets, and expose polledBuckets() + nextLogOffset(bucket) so callers can observe offset progress even when no records were produced.
  • Update LogFetchCollector to always record nextFetchOffset for each polled bucket and construct ScanRecords with these offsets.
  • Update Flink TieringSplitReader#forLogRecords to iterate polledBuckets() and determine end-of-range using nextLogOffset (with fallback to last-record checks), plus add a regression test reproducing Issue #2371.

Reviewed changes

Copilot reviewed 6 out of 6 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/tiering/source/TieringSplitReaderTest.java Adds regression test to ensure tiering completes under FIRST_ROW with duplicate keys/empty batches.
fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/tiering/source/TieringSplitReader.java Uses polledBuckets() and nextLogOffset to finish splits even when only empty batches occur.
fluss-client/src/test/java/org/apache/fluss/client/table/scanner/log/ScanRecordsTest.java Adds unit tests for legacy constructor behavior and new polledBuckets()/nextLogOffset semantics.
fluss-client/src/test/java/org/apache/fluss/client/table/scanner/log/LogFetchCollectorTest.java Verifies empty/filtered responses still expose offset advancement via polledBuckets()/nextLogOffset.
fluss-client/src/main/java/org/apache/fluss/client/table/scanner/log/ScanRecords.java Introduces nextLogOffsets, new constructor, polledBuckets(), and nextLogOffset(bucket).
fluss-client/src/main/java/org/apache/fluss/client/table/scanner/log/LogFetchCollector.java Always records advanced nextFetchOffset per polled bucket and returns it in ScanRecords.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@Kaixuan-Duan
Copy link
Copy Markdown
Contributor Author

@luoyuxia Hi, I have addressed the feedback. PTAL

@luoyuxia
Copy link
Copy Markdown
Contributor

luoyuxia commented May 8, 2026

@Kaixuan-Duan Thanks for the pr. Will take a look when i got some time

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kaixuan-Duan Thanks for the pr. Left minor comments. PTAL

@Kaixuan-Duan Kaixuan-Duan force-pushed the issue-2371-tiering-stuck branch from 6c434d4 to 996de58 Compare May 10, 2026 13:24
@Kaixuan-Duan
Copy link
Copy Markdown
Contributor Author

@luoyuxia Thank you for the review. Addressed, PTAL

@luoyuxia
Copy link
Copy Markdown
Contributor

Thanks. WIll have a review later

@luoyuxia
Copy link
Copy Markdown
Contributor

cc @beryllw

@luoyuxia
Copy link
Copy Markdown
Contributor

Will have a review today. Hope we can merge it in next week.

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kaixuan-Duan Thanks for that. And sorry for delay review. Left some comments again.

@Kaixuan-Duan Kaixuan-Duan force-pushed the issue-2371-tiering-stuck branch 2 times, most recently from ac5ac25 to 656b27c Compare May 31, 2026 12:34
@Kaixuan-Duan
Copy link
Copy Markdown
Contributor Author

@luoyuxia Thank you for the review. Addressed, PTAL.

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Kaixuan-Duan Thanks. LGTM. Only one minor comment.

@Kaixuan-Duan Kaixuan-Duan force-pushed the issue-2371-tiering-stuck branch from 3518359 to 9e9080a Compare June 4, 2026 17:45
@Kaixuan-Duan
Copy link
Copy Markdown
Contributor Author

@luoyuxia Hi, I’ve resolved the merge conflicts and reverted the commit you pushed. Please take a look when you have a chance. Thanks!

Copy link
Copy Markdown
Contributor

@luoyuxia luoyuxia left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@luoyuxia luoyuxia merged commit 0afc21a into apache:main Jun 5, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Tiering may stuck for first row merge engine table

3 participants