Skip to content

fix: handle null sub-arrays in flatten#4822

Open
michaelmitchell-bit wants to merge 1 commit into
apache:mainfrom
michaelmitchell-bit:fix-flatten-null-subarray-4788
Open

fix: handle null sub-arrays in flatten#4822
michaelmitchell-bit wants to merge 1 commit into
apache:mainfrom
michaelmitchell-bit:fix-flatten-null-subarray-4788

Conversation

@michaelmitchell-bit

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4788.

Rationale for this change

Spark flatten returns NULL for an output row when any nested sub-array in that row is NULL. Comet was using DataFusion's flatten, which preserves only the outer array null bitmap and can return incorrect values when an inner sub-array is null.

What changes are included in this PR?

This adds a Spark-compatible flatten implementation for Comet that preserves DataFusion's offset/value flattening while computing Spark-compatible output nulls. It also registers the new UDF and extends the existing flatten.sql coverage to compare Comet against Spark for null sub-array cases.

How are these changes tested?

Ran:

cargo test -p datafusion-comet-spark-expr array_funcs::flatten
cargo build --release
./mvnw test -Prelease -Dtest=none -Dsuites="org.apache.comet.CometSqlFileTestSuite flatten" -Dscalastyle.skip=true

@michaelmitchell-bit michaelmitchell-bit marked this pull request as ready for review July 4, 2026 02:15
@michaelmitchell-bit michaelmitchell-bit force-pushed the fix-flatten-null-subarray-4788 branch from 8e5dac2 to 9a4a79b Compare July 4, 2026 05:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: flatten drops nulls and returns wrong results when a sub-array is null

1 participant