feat(sql): chain dataframe-mode SQL blocks via a query registry by robertlacok · Pull Request #104 · deepnote/deepnote-toolkit

robertlacok · 2026-05-28T12:00:22Z

What

Enables SQL chaining for dataframe-mode SQL blocks: a block whose result is a plain DataFrame (e.g. df_1) can be referenced as a table in a later SQL block (SELECT * FROM df_1), which is expanded into a CTE at execution time.

Today this only works for query-preview mode, because chaining relies on the referenced variable being a DeepnoteQueryPreview that carries its source SQL. Dataframe-mode results are plain DataFrames with no SQL metadata, so they're invisible to chaining.

How

New register_sql_query(variable_name, query) + a module-level _sql_query_registry. deepnote-internal calls this (via _dntk.register_sql_query) after a dataframe-mode SQL block runs, recording the block's source query keyed by its result variable name. Only single SELECT statements are stored (others clear the entry, since they can't be a CTE body).
find_query_preview_references now consults the registry in addition to the existing DeepnoteQueryPreview path. An entry is only trusted while the variable is still bound in __main__, so a deleted/rebound variable never produces a stale CTE. unchain_sql_query is unchanged — it just consumes the references.
The result stays a plain DataFrame: no preview banner, no 100-row limit.

This replaces the prior approach (a deepnote-toolkit-chained-sql.patch file committed inside deepnote-internal) with a real toolkit change.

Paired change

deepnote-internal PR #20235 (devin/1779889801-sal-90-chained-sql-blocks) calls register_sql_query, guarded with getattr so older toolkits no-op.

Testing

New TestRegistryBasedChaining in tests/unit/test_sql_query_chaining.py: single-level, multi-level (dependency-ordered CTEs), stale-entry-ignored, non-SELECT-not-stored, empty-registry-safe. All 40 tests in the file pass; black/isort/flake8 clean.

SAL-90: https://linear.app/deepnote/issue/SAL-90/chained-sql-blocks

🤖 Generated with Claude Code

Summary by CodeRabbit

New Features
- Added SQL query registration capability to map result variables to source queries and support multi-level dependency resolution in SQL chaining operations.
Tests
- Comprehensive test suite added for registry-based SQL chaining functionality.

Add register_sql_query plus a module-level registry so a SQL block whose result is a plain DataFrame can still be referenced as a table by a later SQL block (expanded into a CTE) - the same chaining DeepnoteQueryPreview already enables for query-preview mode, but without changing the result type (which shows a preview banner and limits the row count). find_query_preview_references now consults the registry, trusting an entry only while the variable is still bound in __main__ so a deleted or rebound variable doesn't produce a stale CTE. Only single SELECT statements are stored; anything else clears the entry. SAL-90 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

linear · 2026-05-28T12:00:26Z

SAL-90

coderabbitai · 2026-05-28T12:00:36Z

📝 Walkthrough

Walkthrough

This PR adds dataframe-mode SQL query chaining by introducing a registry that maps result variable names to their originating SQL sources. The register_sql_query() function stores only single-SELECT, non-empty queries. The find_query_preview_references() function was updated to check the registry when resolving table references: if a variable name exists in both __main__ and the registry, it binds the reference to the registered query and recursively resolves that query's own dependencies. The new function is exported from the main package, and comprehensive test coverage validates storage rules, recursive expansion, and fallback behavior.

Sequence Diagram

sequenceDiagram
  participant User
  participant register_sql_query
  participant _sql_query_registry
  participant find_query_preview_references
  participant __main__

  User->>register_sql_query: register_sql_query(var_name, query)
  register_sql_query->>_sql_query_registry: store/clear entry for var_name
  
  User->>find_query_preview_references: find_query_preview_references(sql)
  find_query_preview_references->>__main__: check if ref_name bound
  find_query_preview_references->>_sql_query_registry: lookup ref_name source query
  _sql_query_registry-->>find_query_preview_references: registered query
  find_query_preview_references->>find_query_preview_references: recursively resolve source refs
  find_query_preview_references-->>User: resolved reference map with CTEs

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs	⚠️ Warning	No documentation files (README, docs/*.md) were updated in this PR despite adding a new public API function `register_sql_query` that users or external systems will need to know about.	Update repository documentation and Deepnote landing page to document the new `register_sql_query` API and dataframe-mode SQL chaining feature.

✅ Passed checks (4 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately describes the main change: adding dataframe-mode SQL chaining via a query registry to the toolkit.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

github-actions · 2026-05-28T12:00:58Z

📦 Python package built successfully!

Version: 2.3.0.dev4+295b649
Wheel: deepnote_toolkit-2.3.0.dev4+295b649-py3-none-any.whl

Install:

pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.3.0.dev4%2B295b649/deepnote_toolkit-2.3.0.dev4%2B295b649-py3-none-any.whl"

codecov · 2026-05-28T12:04:19Z

Codecov Report

❌ Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.10%. Comparing base (98e2371) to head (8899f40).
✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
deepnote_toolkit/sql/sql_query_chaining.py	82.35%	1 Missing and 2 partials ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
+ Coverage   74.09%   74.10%   +0.01%     
==========================================
  Files          95       95              
  Lines        5678     5693      +15     
  Branches      843      848       +5     
==========================================
+ Hits         4207     4219      +12     
- Misses       1195     1196       +1     
- Partials      276      278       +2

Flag	Coverage Δ
combined	`74.10% <82.35%> (+0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepnote_toolkit/sql/sql_query_chaining.py`:
- Around line 170-184: The code uses _sql_query_registry keyed only by
variable_name so rebinding the same name leaves a stale expansion; change the
registry check in find_query_preview_references to verify the registered entry
is still the same object by comparing a freshness signal (e.g., store and
compare the original object's id or a weakref when the registry is written)
before trusting registered_source, and if the current object's id/weakref no
longer matches, drop the registry entry instead of using it; update the
registry-writing logic wherever entries are created to include the freshness
signal (and/or clear entries on reassignment) and add a regression test that
binds df_1 to an object producing SQL, then rebinds df_1 to a different object
and asserts the old SQL is not expanded.
- Around line 14-30: Annotate the registry and function with explicit types: add
"from typing import Optional, Dict" and change "_sql_query_registry: dict = {}"
to "_sql_query_registry: Dict[str, str] = {}". Update the function signature for
register_sql_query to "def register_sql_query(variable_name: Optional[str],
query: Optional[str]) -> None:" (keeping the implementation and the
is_single_select_query call unchanged) so static checks enforce the
Optional[str] inputs and a None return type.

In `@tests/unit/test_sql_query_chaining.py`:
- Around line 730-742: Add explicit type annotations to the new test helper and
lifecycle methods: annotate setUp and tearDown with -> None, and annotate _bind
as def _bind(self, name: str, value: typing.Any = "bound") -> None (or use Any
imported from typing) to satisfy ANN202; keep references to
sql_query_chaining._sql_query_registry and __main__ unchanged.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e17d3d2-abc0-4020-829c-63de436836d8

📥 Commits

Reviewing files that changed from the base of the PR and between 98e2371 and 8899f40.

📒 Files selected for processing (3)

deepnote_toolkit/__init__.py
deepnote_toolkit/sql/sql_query_chaining.py
tests/unit/test_sql_query_chaining.py

coderabbitai · 2026-05-28T12:05:43Z

+_sql_query_registry: dict = {}
+
+
+def register_sql_query(variable_name, query):
+    """Register a SQL block's source query so downstream blocks can chain on it.
+
+    Only single SELECT statements can be expanded into a CTE, so anything else
+    (or an empty query) clears any previous entry for the variable rather than
+    storing it.
+    """
+    if variable_name is None:
+        return
+
+    if query is not None and is_single_select_query(query):
+        _sql_query_registry[variable_name] = query
+    else:
+        _sql_query_registry.pop(variable_name, None)


🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a concrete type contract to the new registry API.

_sql_query_registry: dict and register_sql_query(variable_name, query) currently hide the actual Optional[str] -> None contract. Please annotate the registry and this public function explicitly so callers and static checks enforce the same rules as the implementation.

As per coding guidelines, "Always use Optional[T] for parameters that can be None (not T = None)" and "Use explicit type hints for function parameters and return values".

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepnote_toolkit/sql/sql_query_chaining.py` around lines 14 - 30, Annotate the registry and function with explicit types: add "from typing import Optional, Dict" and change "_sql_query_registry: dict = {}" to "_sql_query_registry: Dict[str, str] = {}". Update the function signature for register_sql_query to "def register_sql_query(variable_name: Optional[str], query: Optional[str]) -> None:" (keeping the implementation and the is_single_select_query call unchanged) so static checks enforce the Optional[str] inputs and a None return type.

coderabbitai · 2026-05-28T12:05:43Z

+        # 1) Dataframe-mode chaining. deepnote-internal registers a block's
+        #    source query keyed by its result variable name. We only trust the
+        #    entry while the variable is still bound in __main__, so a deleted or
+        #    rebound variable doesn't produce a stale CTE expansion.
+        if variable_name in _sql_query_registry and hasattr(__main__, variable_name):
+            registered_source = _sql_query_registry[variable_name]
+            if registered_source:
+                query_preview_references[variable_name] = registered_source
+                # Recursively resolve the registered query's own references
+                find_query_preview_references(
+                    registered_source,
+                    query_preview_references,
+                    processed_queries,
+                )
+            continue


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Rebinding the same name still expands stale SQL.

This branch only checks that the name still exists in __main__. If df_1 is rebound to a different object, the old registry entry is still used, so unchaining can execute SQL that no longer matches the notebook state. The registry needs a freshness signal beyond the variable name, or rebinding must clear the entry. Please add a rebinding regression test with the fix.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@deepnote_toolkit/sql/sql_query_chaining.py` around lines 170 - 184, The code uses _sql_query_registry keyed only by variable_name so rebinding the same name leaves a stale expansion; change the registry check in find_query_preview_references to verify the registered entry is still the same object by comparing a freshness signal (e.g., store and compare the original object's id or a weakref when the registry is written) before trusting registered_source, and if the current object's id/weakref no longer matches, drop the registry entry instead of using it; update the registry-writing logic wherever entries are created to include the freshness signal (and/or clear entries on reassignment) and add a regression test that binds df_1 to an object producing SQL, then rebinds df_1 to a different object and asserts the old SQL is not expanded.

coderabbitai · 2026-05-28T12:05:43Z

+    def setUp(self):
+        sql_query_chaining._sql_query_registry.clear()
+        self._original_vars = vars(__main__).copy()
+
+    def tearDown(self):
+        sql_query_chaining._sql_query_registry.clear()
+        for key in list(vars(__main__).keys()):
+            if key not in self._original_vars:
+                delattr(__main__, key)
+
+    def _bind(self, name, value="bound"):
+        """Bind a variable in __main__ (simulates the block having executed)."""
+        setattr(__main__, name, value)


⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Type-annotate the new test helpers.

The new helper/lifecycle methods should carry explicit -> None annotations, and _bind should type its parameters as well. That keeps this class aligned with the repo rules and fixes Ruff ANN202 on Line 740.

As per coding guidelines, "Use type hints consistently" and "Use explicit type hints for function parameters and return values".

🧰 Tools

🪛 Ruff (0.15.14)

[warning] 740-740: Missing return type annotation for private function _bind

Add return type annotation: None

(ANN202)

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@tests/unit/test_sql_query_chaining.py` around lines 730 - 742, Add explicit type annotations to the new test helper and lifecycle methods: annotate setUp and tearDown with -> None, and annotate _bind as def _bind(self, name: str, value: typing.Any = "bound") -> None (or use Any imported from typing) to satisfy ANN202; keep references to sql_query_chaining._sql_query_registry and __main__ unchanged.

deepnote-bot · 2026-05-28T12:12:46Z

🚀 Review App Deployment Started

📝 Description	🌐 Link / Info
🌍 Review application	ra-104
🔑 Sign-in URL	Click to sign-in
📊 Application logs	View logs
🔄 Actions	Click to redeploy
🚀 ArgoCD deployment	View deployment
⏰ Last deployed	2026-05-28 12:12:45 (UTC)
📜 Deployed commit	`20c85fd7d7738fb004cf748cb98cb2a77d235c02`
🛠️ Toolkit version	`295b649`

coderabbitai Bot requested changes May 28, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(sql): chain dataframe-mode SQL blocks via a query registry#104

feat(sql): chain dataframe-mode SQL blocks via a query registry#104
robertlacok wants to merge 1 commit into
mainfrom
rl/sal-90-chained-sql-blocks

robertlacok commented May 28, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

linear Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026 •

edited

Loading

Walkthrough

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

codecov Bot commented May 28, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

coderabbitai Bot May 28, 2026

Uh oh!

deepnote-bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robertlacok commented May 28, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

How

Paired change

Testing

Summary by CodeRabbit

Uh oh!

linear Bot commented May 28, 2026

Uh oh!

coderabbitai Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Sequence Diagram

Estimated code review effort

❌ Failed checks (2 warnings)

Uh oh!

github-actions Bot commented May 28, 2026

Uh oh!

codecov Bot commented May 28, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot May 28, 2026

Choose a reason for hiding this comment

Uh oh!

deepnote-bot commented May 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robertlacok commented May 28, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented May 28, 2026 •

edited

Loading

codecov Bot commented May 28, 2026 •

edited

Loading