feat: support kurtosis aggregate by andygrove · Pull Request #4818 · apache/datafusion-comet

andygrove · 2026-07-03T16:11:07Z

Which issue does this PR close?

Closes #.

Rationale for this change

kurtosis is a standard SQL statistical aggregate and one of the last remaining CentralMomentAgg siblings that Comet didn't run natively (variance and stddev are already supported). Adding it lets queries using kurtosis stay in Comet's native path instead of falling back to Spark for the whole aggregate.

What changes are included in this PR?

Adds a Comet-owned Kurtosis UDAF (native/spark-expr/src/agg_funcs/kurtosis.rs) with a per-row KurtosisAccumulator. Intermediate state is [n, avg, m2, m3, m4] Float64, mirroring Spark's CentralMomentAgg (momentOrder = 4) wire format so a Spark-produced Partial and a Comet-produced Final can share bytes without conversion. Update and merge kernels are direct ports of Spark's updateExpressionsDef and mergeExpressions (Meng 2015 recurrence).
Threads nullOnDivideByZero through the proto so spark.sql.legacy.statisticalAggregate behaves the same on both engines: the default returns NULL when m2 == 0, legacy mode returns NaN.
Adds a Kurtosis protobuf message and wires it through the native planner.
Adds CometKurtosis in spark/src/main/scala/org/apache/comet/serde/aggregates.scala and registers it in QueryPlanSerde.aggrSerdeMap. supportsMixedPartialFinal is left at false to match the conservative policy already used by Variance and Stddev in the same file.
Documents the audit in docs/source/contributor-guide/expression-audits/agg_funcs.md; flips the support status in docs/source/user-guide/latest/expressions.md from planned to supported.
Window use (kurtosis(x) OVER (...)) still falls back because the Comet window path doesn't wire kurtosis today. Captured as an expect_fallback case rather than left as an implicit gap.

Scaffolding produced by the implement-comet-expression skill; the audit-comet-expression skill drove the audit and produced the extra fallback and coverage-gap tests.

How are these changes tested?

Rust unit tests for KurtosisAccumulator covering empty group, single-value divide-by-zero in both nullOnDivideByZero modes, both of Spark's own ExpressionDescription examples (-0.7014368047529627 and 0.19432323191699075), and a two-partition state merge that reproduces the single-batch result.
New Comet SQL test at spark/src/test/resources/sql-tests/expressions/aggregate/kurtosis.sql running under ConfigMatrix: parquet.enable.dictionary=false,true. Covers Spark's documented examples, GROUP BY, global aggregate, empty table, integer/decimal/float/bigint inputs, literal argument, FILTER (WHERE ...), mixed with other aggregates, and expect_fallback cases for skewness and window use of kurtosis. Numerical-stress cases (NaN, Infinity, -Infinity, 1e15 magnitudes) use spark_answer_only mode.
Separate SQL file kurtosis_legacy.sql gated with Config: spark.sql.legacy.statisticalAggregate=true exercises the NaN path for single-value and all-equal groups.
./mvnw test -Dsuites="org.apache.comet.CometSqlFileTestSuite kurtosis" -Dtest=none passes under both the default (Spark 3.5) and -Pspark-4.0 profiles.
cd native && cargo clippy --all-targets --workspace -- -D warnings passes.

Adds a Comet-owned `Kurtosis` UDAF and `CometKurtosis` serde so that Spark's excess-kurtosis aggregate runs natively. The native accumulator stores `[n, avg, m2, m3, m4]` `Float64` state to mirror Spark's `CentralMomentAgg` (momentOrder = 4) buffer, and the update/merge kernels are a direct port of Spark's `updateExpressionsDef` / `mergeExpressions` expressions. `nullOnDivideByZero` is threaded through so `spark.sql.legacy.statisticalAggregate` both branches behave the same as Spark (NULL vs NaN when `m2 == 0`). Window use falls back to Spark today: the window path doesn't wire the Comet aggregate for kurtosis. Captured as an `expect_fallback` case in the SQL test. Scaffolding produced by the `implement-comet-expression` skill; audit performed by the `audit-comet-expression` skill.

andygrove added this to the 1.0.0 milestone Jul 3, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: support kurtosis aggregate#4818

feat: support kurtosis aggregate#4818
andygrove wants to merge 1 commit into
apache:mainfrom
andygrove:feat/kurtosis-native-support

andygrove commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

andygrove commented Jul 3, 2026

Which issue does this PR close?

Rationale for this change

What changes are included in this PR?

How are these changes tested?

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant