[flink] Add union read support to datastream by polyzos · Pull Request #3432 · apache/fluss

polyzos · 2026-06-04T14:30:17Z

The DataStream FlussSource/FlussSourceBuilder never created a LakeSource, it always passed null to FlinkSource. So DataStream jobs read only Fluss data, even for datalake-enabled tables. Union read worked only via the Flink SQL/Table connector.

This pr addresses this missing piece

Copilot

Pull request overview

This PR fixes a functional gap in the Flink DataStream connector: FlussSourceBuilder now constructs and wires a LakeSource into the underlying FlinkSource when reading a datalake-enabled table in full startup mode, enabling true Fluss+Lake historical/real-time union reads for DataStream jobs (previously only available via the Flink SQL/Table connector).

Changes:

Create a LakeSource in FlussSourceBuilder.build() when the table has datalake enabled and the source starts from OffsetsInitializer.full() (snapshot mode), including projection pushdown to the lake source.
Extend FlussSource constructors to accept and forward an optional LakeSource to FlinkSource.
Add Iceberg integration tests that validate DataStream union read semantics for log tables, PK tables, and projection pushdown.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated no comments.

File	Description
`fluss-lake/fluss-lake-iceberg/src/test/java/org/apache/fluss/lake/iceberg/flink/FlinkUnionReadDataStreamITCase.java`	Adds DataStream-focused union read IT coverage (Iceberg tiered data + Fluss-only data, incl. projection).
`fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/source/FlussSourceBuilder.java`	Creates/configures a `LakeSource` for full startup on datalake-enabled tables and passes it into the built source.
`fluss-flink/fluss-flink-common/src/main/java/org/apache/fluss/flink/source/FlussSource.java`	Plumbs the optional `LakeSource` through to `FlinkSource` to activate existing hybrid lake+Fluss split logic.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

fresh-borzoni

@polyzos Ty for the PR, left some minor comments, PTAL

fresh-borzoni · 2026-06-05T00:04:03Z

+        if (tableInfo.getTableConfig().isDataLakeEnabled()
+                && offsetsInitializer instanceof SnapshotOffsetsInitializer) {
+            lakeSource =
+                    LakeSourceUtils.createLakeSource(tablePath, tableInfo.getProperties().toMap());


What about filters?

fresh-borzoni · 2026-06-05T00:11:57Z

+                }
+                lakeSource.withProject(nestedProjectedFields);
+            }
+        }


setBounded() only works on a lake table started from full(). Other options seems that the job starts and then crashes later with a message like "lakeSource' is null in batch mode". Both bad cases just mean lakeSource came out null.

Should we check it explicitly and early?

fresh-borzoni · 2026-06-05T00:12:44Z

+ *
+ * <p>These tests mirror the Flink SQL union-read coverage ({@code FlinkUnionReadLogTableITCase} and
+ * {@code FlinkUnionReadPrimaryKeyTableITCase}) but exercise the programmatic DataStream source.
+ * Each test asserts the three properties that make a union read meaningful:


polyzos · 2026-06-05T06:59:24Z

@fresh-borzoni Thank you for your comments, good catch.. I updated the PR to address them

fresh-borzoni

@polyzos Ty, LGTM 👍
Can you please file a followup about partition pruning?

[flink] add union read support to datastream

60e17a8

polyzos requested a review from Copilot June 4, 2026 14:30

Copilot started reviewing on behalf of polyzos June 4, 2026 14:30 View session

Copilot AI reviewed Jun 4, 2026

View reviewed changes

[flink] support bounded mode in ds

257fe4e

fresh-borzoni reviewed Jun 5, 2026

View reviewed changes

[flink] address comments

6cd9644

polyzos added this to the v1.0 milestone Jun 5, 2026

fresh-borzoni approved these changes Jun 5, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[flink] Add union read support to datastream#3432

[flink] Add union read support to datastream#3432
polyzos wants to merge 3 commits into
apache:mainfrom
polyzos:datastream-api-union-read-support

polyzos commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

fresh-borzoni left a comment

Uh oh!

fresh-borzoni Jun 5, 2026

Uh oh!

fresh-borzoni Jun 5, 2026

Uh oh!

fresh-borzoni Jun 5, 2026

Uh oh!

polyzos commented Jun 5, 2026

Uh oh!

fresh-borzoni left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

polyzos commented Jun 4, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

fresh-borzoni left a comment

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

fresh-borzoni Jun 5, 2026

Choose a reason for hiding this comment

Uh oh!

polyzos commented Jun 5, 2026

Uh oh!

fresh-borzoni left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants