feat: import user-defined physical optimizer rules over FFI#1557
Open
timsaucer wants to merge 12 commits into
Open
feat: import user-defined physical optimizer rules over FFI#1557timsaucer wants to merge 12 commits into
timsaucer wants to merge 12 commits into
Conversation
Expose `SessionContext.add_optimizer_rule` and `SessionContext.add_analyzer_rule` symmetric with the existing `remove_optimizer_rule`. Each accepts a Python subclass of the new `datafusion.optimizer.OptimizerRule` / `AnalyzerRule` ABCs. Implementation: * New `crates/core/src/optimizer_rules.rs` wraps user Python instances in `PyOptimizerRuleAdapter` / `PyAnalyzerRuleAdapter`, which implement the upstream `OptimizerRule` / `AnalyzerRule` traits. * `OptimizerRule.rewrite(plan)` returns `None` for "no change" or a new `LogicalPlan`. The adapter maps that to `Transformed::no` / `Transformed::yes` so the upstream optimizer's fixed-point loop terminates correctly. * `AnalyzerRule.analyze(plan)` must always return a `LogicalPlan`; returning `None` surfaces a `DataFusionError::Execution` naming the offending rule. * The upstream `&dyn OptimizerConfig` / `&ConfigOptions` arguments are not surfaced to Python in this MVP; rules that need configuration should capture it at construction time (for example by holding a `SessionContext` reference) or be implemented in Rust. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the Python-defined OptimizerRule/AnalyzerRule approach with FFI-imported physical optimizer rules. The Python logical-rule approach could observe plans but not transform them: there are no Python constructors for LogicalPlan node variants, so a rule could only return None or the input plan unchanged. The audience for custom rules also overlaps strongly with people who can write Rust. DataFusion exposes no FFI bridge for the logical OptimizerRule/AnalyzerRule traits, but it does export FFI_PhysicalOptimizerRule for the physical PhysicalOptimizerRule trait. This commit imports those instead. Changes: * Remove crates/core/src/optimizer_rules.rs, python/datafusion/optimizer.py, python/tests/test_optimizer.py, and the SessionContext.add_optimizer_rule / add_analyzer_rule methods. remove_optimizer_rule is unchanged (pre-existing). * New crates/core/src/physical_optimizer.rs reads a __datafusion_physical_optimizer_rule__ capsule and converts it via Arc<dyn PhysicalOptimizerRule>::from(&FFI_PhysicalOptimizerRule). * SessionContext gains a physical_optimizer_rules constructor argument. Upstream offers no API to add physical rules to a live context, so they are appended to the builder at construction time only. * The datafusion-ffi-example crate gains MyPhysicalOptimizerRule, a counter-backed rule used by _test_physical_optimizer_rule.py to prove the rule fires over FFI during physical planning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the `list[Any]` hint on the SessionContext `physical_optimizer_rules` argument with a `PhysicalOptimizerRuleExportable` Protocol, matching the existing `TableProviderExportable` / `*Exportable` pattern used for other FFI-capsule objects. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…string Point the `physical_optimizer_rules` argument docs at the new `PhysicalOptimizerRuleExportable` Protocol instead of describing the duck type inline. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The PyCapsule / FFI_PhysicalOptimizerRule mechanics describe the Protocol, not the SessionContext constructor. Move that detail onto PhysicalOptimizerRuleExportable and leave the constructor argument docs focused on behavior. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Remove the explanatory comment about FFI bridge availability; the same information already lives on PhysicalOptimizerRuleExportable. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sibling FFI-import modules (udf, udaf, catalog, table) carry no module-level docs, and the rst-style markup did not match Rust conventions. The function doc comment already states intent. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Replace the hand-written crates/core/src/physical_optimizer.rs with a `from_pycapsule!` invocation in the util crate, matching `physical_codec_from_pycapsule` and the other FFI capsule importers. The macro already handles the hasattr/getattr/cast/validate/pointer_checked sequence and the infallible `Arc::from(&FFI)` conversion, so the dedicated module is no longer needed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the sentence about logical-rule FFI availability; it is background, not type-hint information, and keeps the Protocol docstring in line with the other *Exportable hints. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
11 tasks
Drop the `physical_optimizer_rules` constructor argument on `SessionContext` and replace it with `add_physical_optimizer_rule`, matching the existing `register_*` shape on the same class. The new method rebuilds the session state via `SessionStateBuilder::new_from_existing` so previously registered tables, UDFs, and catalogs are preserved. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Coverage subsumed by test_ffi_physical_optimizer_rule_runs_during_planning, which exercises the same capsule export via add_physical_optimizer_rule. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
Related to #1533.
Closes #1238
Rationale for this change
The upstream DataFusion project supports custom physical optimizer rules. These have not been available to
datafusion-python. Now that we have FFI versions of these rules, expose them.What changes are included in this PR?
Are there any user-facing changes?
New
SessionContext.add_physical_optimizer_rule(rule)method. Additive, non-breaking.