feat(sql): chain dataframe-mode SQL blocks via a query registry#104
feat(sql): chain dataframe-mode SQL blocks via a query registry#104robertlacok wants to merge 1 commit into
Conversation
Add register_sql_query plus a module-level registry so a SQL block whose result is a plain DataFrame can still be referenced as a table by a later SQL block (expanded into a CTE) - the same chaining DeepnoteQueryPreview already enables for query-preview mode, but without changing the result type (which shows a preview banner and limits the row count). find_query_preview_references now consults the registry, trusting an entry only while the variable is still bound in __main__ so a deleted or rebound variable doesn't produce a stale CTE. Only single SELECT statements are stored; anything else clears the entry. SAL-90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
📝 WalkthroughWalkthroughThis PR adds dataframe-mode SQL query chaining by introducing a registry that maps result variable names to their originating SQL sources. The Sequence DiagramsequenceDiagram
participant User
participant register_sql_query
participant _sql_query_registry
participant find_query_preview_references
participant __main__
User->>register_sql_query: register_sql_query(var_name, query)
register_sql_query->>_sql_query_registry: store/clear entry for var_name
User->>find_query_preview_references: find_query_preview_references(sql)
find_query_preview_references->>__main__: check if ref_name bound
find_query_preview_references->>_sql_query_registry: lookup ref_name source query
_sql_query_registry-->>find_query_preview_references: registered query
find_query_preview_references->>find_query_preview_references: recursively resolve source refs
find_query_preview_references-->>User: resolved reference map with CTEs
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes 🚥 Pre-merge checks | ✅ 4 | ❌ 2❌ Failed checks (2 warnings)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. Comment |
|
📦 Python package built successfully!
|
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #104 +/- ##
==========================================
+ Coverage 74.09% 74.10% +0.01%
==========================================
Files 95 95
Lines 5678 5693 +15
Branches 843 848 +5
==========================================
+ Hits 4207 4219 +12
- Misses 1195 1196 +1
- Partials 276 278 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@deepnote_toolkit/sql/sql_query_chaining.py`:
- Around line 170-184: The code uses _sql_query_registry keyed only by
variable_name so rebinding the same name leaves a stale expansion; change the
registry check in find_query_preview_references to verify the registered entry
is still the same object by comparing a freshness signal (e.g., store and
compare the original object's id or a weakref when the registry is written)
before trusting registered_source, and if the current object's id/weakref no
longer matches, drop the registry entry instead of using it; update the
registry-writing logic wherever entries are created to include the freshness
signal (and/or clear entries on reassignment) and add a regression test that
binds df_1 to an object producing SQL, then rebinds df_1 to a different object
and asserts the old SQL is not expanded.
- Around line 14-30: Annotate the registry and function with explicit types: add
"from typing import Optional, Dict" and change "_sql_query_registry: dict = {}"
to "_sql_query_registry: Dict[str, str] = {}". Update the function signature for
register_sql_query to "def register_sql_query(variable_name: Optional[str],
query: Optional[str]) -> None:" (keeping the implementation and the
is_single_select_query call unchanged) so static checks enforce the
Optional[str] inputs and a None return type.
In `@tests/unit/test_sql_query_chaining.py`:
- Around line 730-742: Add explicit type annotations to the new test helper and
lifecycle methods: annotate setUp and tearDown with -> None, and annotate _bind
as def _bind(self, name: str, value: typing.Any = "bound") -> None (or use Any
imported from typing) to satisfy ANN202; keep references to
sql_query_chaining._sql_query_registry and __main__ unchanged.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: ASSERTIVE
Plan: Pro
Run ID: 0e17d3d2-abc0-4020-829c-63de436836d8
📒 Files selected for processing (3)
deepnote_toolkit/__init__.pydeepnote_toolkit/sql/sql_query_chaining.pytests/unit/test_sql_query_chaining.py
| _sql_query_registry: dict = {} | ||
|
|
||
|
|
||
| def register_sql_query(variable_name, query): | ||
| """Register a SQL block's source query so downstream blocks can chain on it. | ||
|
|
||
| Only single SELECT statements can be expanded into a CTE, so anything else | ||
| (or an empty query) clears any previous entry for the variable rather than | ||
| storing it. | ||
| """ | ||
| if variable_name is None: | ||
| return | ||
|
|
||
| if query is not None and is_single_select_query(query): | ||
| _sql_query_registry[variable_name] = query | ||
| else: | ||
| _sql_query_registry.pop(variable_name, None) |
There was a problem hiding this comment.
🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win
Add a concrete type contract to the new registry API.
_sql_query_registry: dict and register_sql_query(variable_name, query) currently hide the actual Optional[str] -> None contract. Please annotate the registry and this public function explicitly so callers and static checks enforce the same rules as the implementation.
As per coding guidelines, "Always use Optional[T] for parameters that can be None (not T = None)" and "Use explicit type hints for function parameters and return values".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepnote_toolkit/sql/sql_query_chaining.py` around lines 14 - 30, Annotate
the registry and function with explicit types: add "from typing import Optional,
Dict" and change "_sql_query_registry: dict = {}" to "_sql_query_registry:
Dict[str, str] = {}". Update the function signature for register_sql_query to
"def register_sql_query(variable_name: Optional[str], query: Optional[str]) ->
None:" (keeping the implementation and the is_single_select_query call
unchanged) so static checks enforce the Optional[str] inputs and a None return
type.
| # 1) Dataframe-mode chaining. deepnote-internal registers a block's | ||
| # source query keyed by its result variable name. We only trust the | ||
| # entry while the variable is still bound in __main__, so a deleted or | ||
| # rebound variable doesn't produce a stale CTE expansion. | ||
| if variable_name in _sql_query_registry and hasattr(__main__, variable_name): | ||
| registered_source = _sql_query_registry[variable_name] | ||
| if registered_source: | ||
| query_preview_references[variable_name] = registered_source | ||
| # Recursively resolve the registered query's own references | ||
| find_query_preview_references( | ||
| registered_source, | ||
| query_preview_references, | ||
| processed_queries, | ||
| ) | ||
| continue |
There was a problem hiding this comment.
Rebinding the same name still expands stale SQL.
This branch only checks that the name still exists in __main__. If df_1 is rebound to a different object, the old registry entry is still used, so unchaining can execute SQL that no longer matches the notebook state. The registry needs a freshness signal beyond the variable name, or rebinding must clear the entry. Please add a rebinding regression test with the fix.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@deepnote_toolkit/sql/sql_query_chaining.py` around lines 170 - 184, The code
uses _sql_query_registry keyed only by variable_name so rebinding the same name
leaves a stale expansion; change the registry check in
find_query_preview_references to verify the registered entry is still the same
object by comparing a freshness signal (e.g., store and compare the original
object's id or a weakref when the registry is written) before trusting
registered_source, and if the current object's id/weakref no longer matches,
drop the registry entry instead of using it; update the registry-writing logic
wherever entries are created to include the freshness signal (and/or clear
entries on reassignment) and add a regression test that binds df_1 to an object
producing SQL, then rebinds df_1 to a different object and asserts the old SQL
is not expanded.
| def setUp(self): | ||
| sql_query_chaining._sql_query_registry.clear() | ||
| self._original_vars = vars(__main__).copy() | ||
|
|
||
| def tearDown(self): | ||
| sql_query_chaining._sql_query_registry.clear() | ||
| for key in list(vars(__main__).keys()): | ||
| if key not in self._original_vars: | ||
| delattr(__main__, key) | ||
|
|
||
| def _bind(self, name, value="bound"): | ||
| """Bind a variable in __main__ (simulates the block having executed).""" | ||
| setattr(__main__, name, value) |
There was a problem hiding this comment.
Type-annotate the new test helpers.
The new helper/lifecycle methods should carry explicit -> None annotations, and _bind should type its parameters as well. That keeps this class aligned with the repo rules and fixes Ruff ANN202 on Line 740.
As per coding guidelines, "Use type hints consistently" and "Use explicit type hints for function parameters and return values".
🧰 Tools
🪛 Ruff (0.15.14)
[warning] 740-740: Missing return type annotation for private function _bind
Add return type annotation: None
(ANN202)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@tests/unit/test_sql_query_chaining.py` around lines 730 - 742, Add explicit
type annotations to the new test helper and lifecycle methods: annotate setUp
and tearDown with -> None, and annotate _bind as def _bind(self, name: str,
value: typing.Any = "bound") -> None (or use Any imported from typing) to
satisfy ANN202; keep references to sql_query_chaining._sql_query_registry and
__main__ unchanged.
|
🚀 Review App Deployment Started
|
What
Enables SQL chaining for dataframe-mode SQL blocks: a block whose result is a plain
DataFrame(e.g.df_1) can be referenced as a table in a later SQL block (SELECT * FROM df_1), which is expanded into a CTE at execution time.Today this only works for query-preview mode, because chaining relies on the referenced variable being a
DeepnoteQueryPreviewthat carries its source SQL. Dataframe-mode results are plain DataFrames with no SQL metadata, so they're invisible to chaining.How
register_sql_query(variable_name, query)+ a module-level_sql_query_registry. deepnote-internal calls this (via_dntk.register_sql_query) after a dataframe-mode SQL block runs, recording the block's source query keyed by its result variable name. Only singleSELECTstatements are stored (others clear the entry, since they can't be a CTE body).find_query_preview_referencesnow consults the registry in addition to the existingDeepnoteQueryPreviewpath. An entry is only trusted while the variable is still bound in__main__, so a deleted/rebound variable never produces a stale CTE.unchain_sql_queryis unchanged — it just consumes the references.DataFrame: no preview banner, no 100-row limit.This replaces the prior approach (a
deepnote-toolkit-chained-sql.patchfile committed inside deepnote-internal) with a real toolkit change.Paired change
devin/1779889801-sal-90-chained-sql-blocks) callsregister_sql_query, guarded withgetattrso older toolkits no-op.Testing
TestRegistryBasedChainingintests/unit/test_sql_query_chaining.py: single-level, multi-level (dependency-ordered CTEs), stale-entry-ignored, non-SELECT-not-stored, empty-registry-safe. All 40 tests in the file pass;black/isort/flake8clean.SAL-90: https://linear.app/deepnote/issue/SAL-90/chained-sql-blocks
🤖 Generated with Claude Code
Summary by CodeRabbit
New Features
Tests