Skip to content

feat: support Spark 4.1 TIME type and expressions via codegen dispatch#4821

Open
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:feat/spark-4.1-time-type
Open

feat: support Spark 4.1 TIME type and expressions via codegen dispatch#4821
andygrove wants to merge 2 commits into
apache:mainfrom
andygrove:feat/spark-4.1-time-type

Conversation

@andygrove

Copy link
Copy Markdown
Member

Summary

  • Wires the Spark 4.1 TIME data type end-to-end and adds JVM codegen dispatch for the new TIME expressions (HoursOfTime, MinutesOfTime, SecondsOfTime, SecondsOfTimeWithFraction, TimeAddInterval, SubtractTimes, TimeDiff, TimeTrunc) and casts touching TimeType. No native implementations added.
  • Teaches the batch-kernel input/output emitters, CometScalaUDFCodegen, and ArrowWriters to handle TimeNanoVector as a Long-backed vector.
  • Parquet scans still opt out of TimeType (no native decoder yet); the fallback reason is now explicit.

Details

Every new TIME expression in Spark 4.1 is a RuntimeReplaceable that rewrites to StaticInvoke(DateTimeUtils, ...) before physical planning. The shim (spark-4.1+/CometExprShim.scala) matches those StaticInvoke forms and routes them through CometScalaUDF.emitJvmCodegenDispatch, which runs Spark's own doGenCode inside the Comet batch kernel. CurrentTime needs no work — the optimizer folds it to a literal. CometCast routes any cast touching TimeType through the same dispatcher.

Existing make_time / to_time / try_to_time continue to use the native scalar-function paths already wired up.

LocalTableScan for TimeType columns now works natively via the new TimeNanoWriter in ArrowWriters. The pre-existing "falls back when schema contains TimeType" test flips to "handles TimeType column".

Known limitation: time - time (result type DayTimeIntervalType) still falls back at the projection level because Comet's generic type-support list doesn't cover DayTimeIntervalType outputs. Pre-existing, not TIME-specific — left as a checkSparkAnswer regression test in the suite.

Test plan

  • ./mvnw test -Dsuites="org.apache.spark.sql.CometTimeTypeSuite" -Dtest=none — 15/15 pass on Spark 4.1 (new suite)
  • ./mvnw test -Dsuites="org.apache.comet.exec.CometExecSuite" -Dtest=none — 131/131 pass (LocalTableScan regression check)
  • ./mvnw test -Dsuites="org.apache.comet.CometCastSuite" -Dtest=none — 161/161 pass (cast regression check)
  • ./mvnw test -Dsuites="org.apache.comet.CometCodegenSourceSuite" -Dtest=none — 54/54 pass (codegen kernel regression check)
  • ./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite" -Dtest=none — 135/135 pass

Wires the Spark 4.1 TIME data type end-to-end and adds JVM codegen
dispatch for the new TIME expressions. No native implementations are
added: everything routes through the existing emitJvmCodegenDispatch
path so the enclosing projection stays native while Spark's own
doGenCode supplies the semantics.

Adds TimeType to DataTypeSupport, teaches the batch-kernel input and
output emitters, CometScalaUDFCodegen, and ArrowWriters to handle
TimeNanoVector as a Long-backed vector, and updates the shim so
HoursOfTime, MinutesOfTime, SecondsOfTime, SecondsOfTimeWithFraction,
TimeAddInterval, SubtractTimes, TimeDiff, and TimeTrunc reach the
dispatcher via their StaticInvoke(DateTimeUtils, ...) replacement
form. Casts touching TimeType follow the same route via CometCast.
Parquet scans still opt out of TimeType (no native decoder yet); the
fallback reason is now explicit.
@andygrove andygrove marked this pull request as ready for review July 3, 2026 17:39
@andygrove andygrove marked this pull request as draft July 3, 2026 17:41
Replaces the initial CometTimeTypeSuite Scala tests with per-expression
sql-tests fixtures under expressions/datetime and expressions/cast, so
coverage runs through the same CometSqlFileTestSuite pipeline as other
datetime tests (constant folding disabled, unified reporting).

Also accept Time64(Nanosecond) in the native LongVal literal path so
constant-folded TIME literals from CometCast (cast literal string ->
TIME evaluated at planning time) round-trip through the native scan.
@andygrove andygrove marked this pull request as ready for review July 3, 2026 20:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant