feat: support grouping() and grouping_id() indicator functions by andygrove · Pull Request #4815 · apache/datafusion-comet

andygrove · 2026-07-03T14:57:01Z

Which issue does this PR close?

Closes #4814.

Rationale for this change

The grouping() and grouping_id() indicator functions used with ROLLUP, CUBE, and GROUPING SETS were marked as unsupported in the expressions guide. In practice they are Unevaluable expressions that Spark's analyzer rewrites before physical planning:

grouping(col) becomes Cast(BitwiseAnd(ShiftRight(spark_grouping_id, n-1-i), 1L), ByteType)
grouping_id() becomes a reference to the spark_grouping_id virtual column

Every constituent piece is already supported by Comet: the ExpandExec operator that ROLLUP/CUBE/GROUPING SETS lower to, plus Cast, BitwiseAnd, ShiftRight, and Literal. As a result these queries already run natively end-to-end, so no Scala serde or Rust work is required. This PR adds the missing test coverage and updates the documentation to reflect the actual support level.

What changes are included in this PR?

New SQL file test spark/src/test/resources/sql-tests/expressions/aggregate/grouping.sql covering grouping(col) and grouping_id() with CUBE, ROLLUP, and GROUPING SETS, multi-column grouping, the explicit-column grouping_id(a, b) form, combined grouping + grouping_id usage, use in HAVING, and NULL input data.
Mark grouping and grouping_id as supported (✅) in docs/source/user-guide/latest/expressions.md.

This change was scaffolded using the implement-comet-expression skill.

How are these changes tested?

The new SQL file test runs each query through both Spark and Comet in the default query mode, which asserts that the whole plan executes natively on Comet (no fallback) and that results match Spark exactly. All queries pass:

./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite grouping" -Dtest=none

Add SQL file test coverage for grouping/grouping_id with ROLLUP, CUBE, and GROUPING SETS, and mark both as supported in the expressions guide. These are Unevaluable expressions that the analyzer rewrites into arithmetic over the virtual spark_grouping_id column, so the constituent operators and expressions (ExpandExec, Cast, BitwiseAnd, ShiftRight, Literal) already run natively in Comet.

andygrove added this to the 1.0.0 milestone Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support grouping() and grouping_id() indicator functions#4815

feat: support grouping() and grouping_id() indicator functions#4815
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:support-grouping-expression

andygrove commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andygrove commented Jul 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant