feat: support Spark 4.1 TIME type and expressions via codegen dispatch#4821
Open
andygrove wants to merge 2 commits into
Open
feat: support Spark 4.1 TIME type and expressions via codegen dispatch#4821andygrove wants to merge 2 commits into
andygrove wants to merge 2 commits into
Conversation
Wires the Spark 4.1 TIME data type end-to-end and adds JVM codegen dispatch for the new TIME expressions. No native implementations are added: everything routes through the existing emitJvmCodegenDispatch path so the enclosing projection stays native while Spark's own doGenCode supplies the semantics. Adds TimeType to DataTypeSupport, teaches the batch-kernel input and output emitters, CometScalaUDFCodegen, and ArrowWriters to handle TimeNanoVector as a Long-backed vector, and updates the shim so HoursOfTime, MinutesOfTime, SecondsOfTime, SecondsOfTimeWithFraction, TimeAddInterval, SubtractTimes, TimeDiff, and TimeTrunc reach the dispatcher via their StaticInvoke(DateTimeUtils, ...) replacement form. Casts touching TimeType follow the same route via CometCast. Parquet scans still opt out of TimeType (no native decoder yet); the fallback reason is now explicit.
Replaces the initial CometTimeTypeSuite Scala tests with per-expression sql-tests fixtures under expressions/datetime and expressions/cast, so coverage runs through the same CometSqlFileTestSuite pipeline as other datetime tests (constant folding disabled, unified reporting). Also accept Time64(Nanosecond) in the native LongVal literal path so constant-folded TIME literals from CometCast (cast literal string -> TIME evaluated at planning time) round-trip through the native scan.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
TIMEdata type end-to-end and adds JVM codegen dispatch for the new TIME expressions (HoursOfTime,MinutesOfTime,SecondsOfTime,SecondsOfTimeWithFraction,TimeAddInterval,SubtractTimes,TimeDiff,TimeTrunc) and casts touchingTimeType. No native implementations added.CometScalaUDFCodegen, andArrowWritersto handleTimeNanoVectoras a Long-backed vector.TimeType(no native decoder yet); the fallback reason is now explicit.Details
Every new TIME expression in Spark 4.1 is a
RuntimeReplaceablethat rewrites toStaticInvoke(DateTimeUtils, ...)before physical planning. The shim (spark-4.1+/CometExprShim.scala) matches thoseStaticInvokeforms and routes them throughCometScalaUDF.emitJvmCodegenDispatch, which runs Spark's owndoGenCodeinside the Comet batch kernel.CurrentTimeneeds no work — the optimizer folds it to a literal.CometCastroutes any cast touchingTimeTypethrough the same dispatcher.Existing
make_time/to_time/try_to_timecontinue to use the native scalar-function paths already wired up.LocalTableScanforTimeTypecolumns now works natively via the newTimeNanoWriterinArrowWriters. The pre-existing "falls back when schema contains TimeType" test flips to "handles TimeType column".Known limitation:
time - time(result typeDayTimeIntervalType) still falls back at the projection level because Comet's generic type-support list doesn't coverDayTimeIntervalTypeoutputs. Pre-existing, not TIME-specific — left as acheckSparkAnswerregression test in the suite.Test plan
./mvnw test -Dsuites="org.apache.spark.sql.CometTimeTypeSuite" -Dtest=none— 15/15 pass on Spark 4.1 (new suite)./mvnw test -Dsuites="org.apache.comet.exec.CometExecSuite" -Dtest=none— 131/131 pass (LocalTableScan regression check)./mvnw test -Dsuites="org.apache.comet.CometCastSuite" -Dtest=none— 161/161 pass (cast regression check)./mvnw test -Dsuites="org.apache.comet.CometCodegenSourceSuite" -Dtest=none— 54/54 pass (codegen kernel regression check)./mvnw test -Dsuites="org.apache.comet.CometExpressionSuite" -Dtest=none— 135/135 pass