[lake/paimon] TieringSourceReader adjust to scan arrow record batch and write arrow record batch to lake#3430
Merged
luoyuxia merged 3 commits intoJun 5, 2026
Conversation
Contributor
There was a problem hiding this comment.
Pull request overview
This PR updates the Flink tiering pipeline to consume log data as Arrow record batches (for ARROW log tables) and write those batches directly into Paimon, reducing row-by-row conversion overhead and improving tiering efficiency for append-only tables.
Changes:
- Add an Arrow record-batch tiering path in
TieringSplitReader, including split completion logic based on the last scanned offset. - Extend the Paimon lake writer and append-only writer to accept
RecordBatchwrites and persist Arrow batches via a dedicated helper. - Enhance Arrow batch utilities (
ArrowBatchData,ArrowRecordBatch) and adjust tiering tests to align with batched writes.
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| fluss-lake/fluss-lake-paimon/src/test/java/org/apache/fluss/lake/paimon/tiering/PaimonTieringITCase.java | Adjusts alter-table tiering test to write log-table rows in one larger write call. |
| fluss-lake/fluss-lake-paimon/src/main/java/org/apache/paimon/arrow/converter/Arrow2PaimonVectorConverter.java | Adds Arrow-to-Paimon column vector conversion utilities (copied/adapted). |
| fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/tiering/PaimonLakeWriter.java | Implements SupportsRecordBatchWrite and routes Arrow batches to append-only writer. |
| fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/tiering/append/AppendOnlyWriter.java | Adds lazy Arrow-batch writing support and ensures helper resources are closed. |
| fluss-lake/fluss-lake-paimon/src/main/java/org/apache/fluss/lake/paimon/tiering/append/AppendOnlyArrowBatchHelper.java | New helper that enriches batches with system columns and writes via Paimon ArrowBundleRecords. |
| fluss-lake/fluss-lake-paimon/pom.xml | Adds paimon-arrow and Arrow dependencies as provided for runtime batch writing. |
| fluss-flink/fluss-flink-common/src/test/java/org/apache/fluss/flink/tiering/source/TieringSplitReaderTest.java | Updates tests to pass a user classloader into the reader. |
| fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/tiering/source/TieringSplitReader.java | Adds Arrow batch polling/handling path and refactors shared log-processing workflow. |
| fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/tiering/source/TieringSourceReader.java | Wires user code classloader into TieringSplitReader construction. |
| fluss-common/src/main/java/org/apache/fluss/record/ArrowBatchData.java | Adds batch size estimation and a truncate/ownership-transfer helper. |
| fluss-common/src/main/java/org/apache/fluss/lake/batch/ArrowRecordBatch.java | Wraps ArrowBatchData and makes it closeable for safe resource management. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
beryllw
reviewed
Jun 4, 2026
beryllw
reviewed
Jun 4, 2026
94c3077 to
a00c17c
Compare
Contributor
|
LGTM! |
4a6fe84 to
acb5196
Compare
acb5196 to
c1f14f4
Compare
5392f95 to
b6476bf
Compare
bab1f3d to
e297634
Compare
| throws IOException; | ||
| } | ||
|
|
||
| private static boolean checkUnshadedArrowAvailable(ClassLoader classLoader) { |
Member
There was a problem hiding this comment.
shall we check for ArrowBundleRecords from paimon-arrow?
it's provided, so it might fail, if I understand this relationship correctly
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
Linked issue: close #2966
Adjust
TieringSourceReaderto scan arrow record batch and write arrow record batch to lake, as a follow-up of #2965.Brief change log
TieringSplitReaderto scan arrow record batch instead of row-based recordsAppendOnlyArrowBatchHelperandArrow2PaimonVectorConverterto support writing arrow batches directly to PaimonArrowBatchDatato carry arrow batch data through the tiering pipelineTests
PaimonTieringITCaseTieringSplitReaderTestAPI and Format
No.
Documentation
No.