chore: fallback to Spark if legacy sql configurations are set#4799
chore: fallback to Spark if legacy sql configurations are set#4799comphead wants to merge 4 commits into
Conversation
| `spark.sql.legacy.timeParserPolicy`). Comet implements current Spark semantics and does **not** | ||
| reproduce these legacy behaviors. | ||
|
|
||
| By default, when Comet detects that any `spark.sql.legacy.*` config is set to `true`, it disables |
There was a problem hiding this comment.
can't we just update the serde for specific expressions to return Unsupported rather than fall back for all queries, even queries that would not be affected by a legacy config?
There was a problem hiding this comment.
we should just fall back to the codegen dispatch approach in most cases
There was a problem hiding this comment.
Let me investigate this, sometimes the conf param is vaguely explained like
val LEGACY_JAVA_CHARSETS = buildConf("spark.sql.legacy.javaCharsets")
.internal()
.doc("When set to true, the functions like `encode()` can use charsets from JDK while " +
"encoding or decoding string values. If it is false, such functions support only one of " +
"the charsets: 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', " +
"'UTF-32'.")
.version("4.0.0")
.booleanConf
.createWithDefault(false)
it refers to encode() explicitly but also to something else. Maybe I can fish out exact functions from Spark code depending on legacy params and in this case we can fallback to codegen dispatch
There was a problem hiding this comment.
Also, Spark has legacy configs that default to true, so this will make all jobs fall back by default.
val VIEW_SCHEMA_BINDING_ENABLED = buildConf("spark.sql.legacy.viewSchemaBindingMode")
.internal()
.doc("Set to false to disable the WITH SCHEMA clause for view DDL and suppress the line in " +
"DESCRIBE EXTENDED and SHOW CREATE TABLE.")
.version("4.0.0")
.booleanConf
.createWithDefault(true)
Which issue does this PR close?
Closes #4786.
Motivation
Spark's
spark.sql.legacy.*configs opt queries into pre-modern Spark semantics (e.g.spark.sql.legacy.castComplexTypesToString.enabled,spark.sql.legacy.timeParserPolicy). Comet implements current Spark semantics and does not reproduce these legacy behaviors, so running Comet with a legacy config silentlyreturns results that diverge from Spark. Users hit this as a correctness bug with no clear signal that Comet is the cause.
Summary
spark.sql.legacy.*config is set totrue, with a warning listing the offending keys.spark.comet.legacyConfFallback.enabled(defaulttrue) so users can opt out and keep Comet enabled with legacy configs, at the cost of Spark compatibility.Changes
CometConf.scala: newCOMET_LEGACY_CONF_FALLBACK_ENABLEDboolean conf.CometSparkSessionExtensions.isCometLoaded: scansSQLConffor enabledspark.sql.legacy.*keys and disables Comet with a warning; gated by the new conf.docs/source/user-guide/latest/compatibility/index.md: new "Spark legacy configs" section.