Partial project fallback: keep a projection/filter native by evaluating an unsupported subexpression in the JVM

_Disclosure: this issue was drafted with the help of an AI assistant._

## Is your feature request related to a problem?

Today, when a single expression in a `ProjectExec` or `FilterExec` has no native translation, Comet abandons the **whole operator**. `CometProjectExec.convert` requires every `projectList` entry to convert, and `CometFilterExec.convert` requires the predicate to convert; one `None` falls the operator back to Spark. That also breaks the native island below it: a `ColumnarToRow` transition materializes the scan output and the *supported* sibling expressions run row-wise in Spark WSCG. One unsupported expression sitting above a native (Parquet/Iceberg/Delta) scan discards the native-scan investment for the whole pipeline. This is one of the most common real-world reasons native coverage collapses.

## Describe the solution you'd like

Evaluate **only** the unsupported subexpression in the JVM and keep the operator — and the pipeline — native. This is the pattern Gluten ships as `ColumnarPartialProjectExec` and Blaze/Auron ship over Arrow FFI.

Comet is well positioned because the JVM-callback machinery already exists on `main`:
- `JvmScalarUdfExpr` — a DataFusion `PhysicalExpr` that exports its argument arrays over the Arrow C Data Interface and calls back into the JVM.
- `CometUdfBridge` + `CometScalaUDFCodegen` / `CometBatchKernelCodegen`, which Janino-compile Spark's own `doGenCode` into an Arrow-direct batch kernel.

What is missing is a *splitter*: a last-resort hook that routes an otherwise-unsupported projection/filter subexpression to that existing detour. Proposed shape for a first version:

- A last-resort hook in expression serde that retries any unsupported node via the existing codegen dispatcher, at expression granularity (fires at the outermost unsupported node; supported ancestors stay native and reference its output).
- Scoped to `ProjectExec` and `FilterExec` only in v1. Join keys, sort keys, aggregate/group expressions, and partitioning keep today's all-or-nothing behavior.
- Plan-time eligibility via the dispatcher's existing `canHandle` (rejects aggregates, generators, `Unevaluable`, unsupported types, and oversized trees); subquery-bearing trees fall back cleanly rather than failing in the kernel.
- Behind a new config, **default off**, while it stabilizes.
- **No proto or native changes** — reuses `JvmScalarUdfExpr`.

Results run through Spark's own `doGenCode` / `eval` inside the kernel, so they match Spark by construction. Iceberg/Delta inherit this for free since the hook lives in projection/filter conversion, above the scan.

## Describe alternatives you've considered

A dedicated `CometPartialProjectExec` operator (Gluten's shape) — rejected for v1 because Comet already has expression-granularity callback, so an operator adds plan surgery and per-operator batch recomposition for strictly less composability than a `PhysicalExpr` that works in any expression slot.

## Additional context

A prototype implementation with serde/parity/island tests, hook fuzzing, and a microbenchmark is in progress. Early numbers on a projection/filter over a native Parquet scan show the detour keeping the island native and running ~1.2–1.3× vs. Spark where the whole operator would otherwise fall back.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Partial project fallback: keep a projection/filter native by evaluating an unsupported subexpression in the JVM #4825

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Partial project fallback: keep a projection/filter native by evaluating an unsupported subexpression in the JVM #4825

Description

Is your feature request related to a problem?

Describe the solution you'd like

Describe alternatives you've considered

Additional context

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions