Skip to content

feat: support grouping() and grouping_id() indicator functions#4815

Open
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:support-grouping-expression
Open

feat: support grouping() and grouping_id() indicator functions#4815
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:support-grouping-expression

Conversation

@andygrove

Copy link
Copy Markdown
Member

Which issue does this PR close?

Closes #4814.

Rationale for this change

The grouping() and grouping_id() indicator functions used with ROLLUP, CUBE, and GROUPING SETS were marked as unsupported in the expressions guide. In practice they are Unevaluable expressions that Spark's analyzer rewrites before physical planning:

  • grouping(col) becomes Cast(BitwiseAnd(ShiftRight(spark_grouping_id, n-1-i), 1L), ByteType)
  • grouping_id() becomes a reference to the spark_grouping_id virtual column

Every constituent piece is already supported by Comet: the ExpandExec operator that ROLLUP/CUBE/GROUPING SETS lower to, plus Cast, BitwiseAnd, ShiftRight, and Literal. As a result these queries already run natively end-to-end, so no Scala serde or Rust work is required. This PR adds the missing test coverage and updates the documentation to reflect the actual support level.

What changes are included in this PR?

  • New SQL file test spark/src/test/resources/sql-tests/expressions/aggregate/grouping.sql covering grouping(col) and grouping_id() with CUBE, ROLLUP, and GROUPING SETS, multi-column grouping, the explicit-column grouping_id(a, b) form, combined grouping + grouping_id usage, use in HAVING, and NULL input data.
  • Mark grouping and grouping_id as supported (✅) in docs/source/user-guide/latest/expressions.md.

This change was scaffolded using the implement-comet-expression skill.

How are these changes tested?

The new SQL file test runs each query through both Spark and Comet in the default query mode, which asserts that the whole plan executes natively on Comet (no fallback) and that results match Spark exactly. All queries pass:

./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite grouping" -Dtest=none

Add SQL file test coverage for grouping/grouping_id with ROLLUP, CUBE, and
GROUPING SETS, and mark both as supported in the expressions guide.

These are Unevaluable expressions that the analyzer rewrites into arithmetic
over the virtual spark_grouping_id column, so the constituent operators and
expressions (ExpandExec, Cast, BitwiseAnd, ShiftRight, Literal) already run
natively in Comet.
@andygrove andygrove added this to the 1.0.0 milestone Jul 3, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support grouping() and grouping_id() indicator functions

1 participant