Skip to content

chore: fallback to Spark if legacy sql configurations are set#4799

Draft
comphead wants to merge 4 commits into
apache:mainfrom
comphead:chore
Draft

chore: fallback to Spark if legacy sql configurations are set#4799
comphead wants to merge 4 commits into
apache:mainfrom
comphead:chore

Conversation

@comphead

@comphead comphead commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

Closes #4786.

Motivation

Spark's spark.sql.legacy.* configs opt queries into pre-modern Spark semantics (e.g. spark.sql.legacy.castComplexTypesToString.enabled, spark.sql.legacy.timeParserPolicy). Comet implements current Spark semantics and does not reproduce these legacy behaviors, so running Comet with a legacy config silently
returns results that diverge from Spark. Users hit this as a correctness bug with no clear signal that Comet is the cause.

Summary

  • Fall back to Spark whenever any spark.sql.legacy.* config is set to true, with a warning listing the offending keys.
  • Add spark.comet.legacyConfFallback.enabled (default true) so users can opt out and keep Comet enabled with legacy configs, at the cost of Spark compatibility.
  • Document the behavior and the opt-out in the compatibility guide.

Changes

  • CometConf.scala: new COMET_LEGACY_CONF_FALLBACK_ENABLED boolean conf.
  • CometSparkSessionExtensions.isCometLoaded: scans SQLConf for enabled spark.sql.legacy.* keys and disables Comet with a warning; gated by the new conf.
  • docs/source/user-guide/latest/compatibility/index.md: new "Spark legacy configs" section.

`spark.sql.legacy.timeParserPolicy`). Comet implements current Spark semantics and does **not**
reproduce these legacy behaviors.

By default, when Comet detects that any `spark.sql.legacy.*` config is set to `true`, it disables

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can't we just update the serde for specific expressions to return Unsupported rather than fall back for all queries, even queries that would not be affected by a legacy config?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should just fall back to the codegen dispatch approach in most cases

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me investigate this, sometimes the conf param is vaguely explained like

  val LEGACY_JAVA_CHARSETS = buildConf("spark.sql.legacy.javaCharsets")
    .internal()
    .doc("When set to true, the functions like `encode()` can use charsets from JDK while " +
      "encoding or decoding string values. If it is false, such functions support only one of " +
      "the charsets: 'US-ASCII', 'ISO-8859-1', 'UTF-8', 'UTF-16BE', 'UTF-16LE', 'UTF-16', " +
      "'UTF-32'.")
    .version("4.0.0")
    .booleanConf
    .createWithDefault(false)

it refers to encode() explicitly but also to something else. Maybe I can fish out exact functions from Spark code depending on legacy params and in this case we can fallback to codegen dispatch

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, Spark has legacy configs that default to true, so this will make all jobs fall back by default.

  val VIEW_SCHEMA_BINDING_ENABLED = buildConf("spark.sql.legacy.viewSchemaBindingMode")
    .internal()
    .doc("Set to false to disable the WITH SCHEMA clause for view DDL and suppress the line in " +
      "DESCRIBE EXTENDED and SHOW CREATE TABLE.")
    .version("4.0.0")
    .booleanConf
    .createWithDefault(true)

@comphead comphead marked this pull request as draft July 2, 2026 21:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: null IN () returns false instead of null under legacy null-in-empty behavior

2 participants