Skip to content

feat(sql): chain dataframe-mode SQL blocks via a query registry#104

Draft
robertlacok wants to merge 1 commit into
mainfrom
rl/sal-90-chained-sql-blocks
Draft

feat(sql): chain dataframe-mode SQL blocks via a query registry#104
robertlacok wants to merge 1 commit into
mainfrom
rl/sal-90-chained-sql-blocks

Conversation

@robertlacok
Copy link
Copy Markdown
Contributor

@robertlacok robertlacok commented May 28, 2026

What

Enables SQL chaining for dataframe-mode SQL blocks: a block whose result is a plain DataFrame (e.g. df_1) can be referenced as a table in a later SQL block (SELECT * FROM df_1), which is expanded into a CTE at execution time.

Today this only works for query-preview mode, because chaining relies on the referenced variable being a DeepnoteQueryPreview that carries its source SQL. Dataframe-mode results are plain DataFrames with no SQL metadata, so they're invisible to chaining.

How

  • New register_sql_query(variable_name, query) + a module-level _sql_query_registry. deepnote-internal calls this (via _dntk.register_sql_query) after a dataframe-mode SQL block runs, recording the block's source query keyed by its result variable name. Only single SELECT statements are stored (others clear the entry, since they can't be a CTE body).
  • find_query_preview_references now consults the registry in addition to the existing DeepnoteQueryPreview path. An entry is only trusted while the variable is still bound in __main__, so a deleted/rebound variable never produces a stale CTE. unchain_sql_query is unchanged — it just consumes the references.
  • The result stays a plain DataFrame: no preview banner, no 100-row limit.

This replaces the prior approach (a deepnote-toolkit-chained-sql.patch file committed inside deepnote-internal) with a real toolkit change.

Paired change

  • deepnote-internal PR #20235 (devin/1779889801-sal-90-chained-sql-blocks) calls register_sql_query, guarded with getattr so older toolkits no-op.

Testing

  • New TestRegistryBasedChaining in tests/unit/test_sql_query_chaining.py: single-level, multi-level (dependency-ordered CTEs), stale-entry-ignored, non-SELECT-not-stored, empty-registry-safe. All 40 tests in the file pass; black/isort/flake8 clean.

SAL-90: https://linear.app/deepnote/issue/SAL-90/chained-sql-blocks

🤖 Generated with Claude Code

Summary by CodeRabbit

  • New Features

    • Added SQL query registration capability to map result variables to source queries and support multi-level dependency resolution in SQL chaining operations.
  • Tests

    • Comprehensive test suite added for registry-based SQL chaining functionality.

Review Change Stack

Add register_sql_query plus a module-level registry so a SQL block whose
result is a plain DataFrame can still be referenced as a table by a later
SQL block (expanded into a CTE) - the same chaining DeepnoteQueryPreview
already enables for query-preview mode, but without changing the result
type (which shows a preview banner and limits the row count).

find_query_preview_references now consults the registry, trusting an entry
only while the variable is still bound in __main__ so a deleted or rebound
variable doesn't produce a stale CTE. Only single SELECT statements are
stored; anything else clears the entry.

SAL-90

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@linear
Copy link
Copy Markdown

linear Bot commented May 28, 2026

SAL-90

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

📝 Walkthrough

Walkthrough

This PR adds dataframe-mode SQL query chaining by introducing a registry that maps result variable names to their originating SQL sources. The register_sql_query() function stores only single-SELECT, non-empty queries. The find_query_preview_references() function was updated to check the registry when resolving table references: if a variable name exists in both __main__ and the registry, it binds the reference to the registered query and recursively resolves that query's own dependencies. The new function is exported from the main package, and comprehensive test coverage validates storage rules, recursive expansion, and fallback behavior.

Sequence Diagram

sequenceDiagram
  participant User
  participant register_sql_query
  participant _sql_query_registry
  participant find_query_preview_references
  participant __main__

  User->>register_sql_query: register_sql_query(var_name, query)
  register_sql_query->>_sql_query_registry: store/clear entry for var_name
  
  User->>find_query_preview_references: find_query_preview_references(sql)
  find_query_preview_references->>__main__: check if ref_name bound
  find_query_preview_references->>_sql_query_registry: lookup ref_name source query
  _sql_query_registry-->>find_query_preview_references: registered query
  find_query_preview_references->>find_query_preview_references: recursively resolve source refs
  find_query_preview_references-->>User: resolved reference map with CTEs
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

🚥 Pre-merge checks | ✅ 4 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 26.67% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
Updates Docs ⚠️ Warning No documentation files (README, docs/*.md) were updated in this PR despite adding a new public API function register_sql_query that users or external systems will need to know about. Update repository documentation and Deepnote landing page to document the new register_sql_query API and dataframe-mode SQL chaining feature.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: adding dataframe-mode SQL chaining via a query registry to the toolkit.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.


Comment @coderabbitai help to get the list of available commands and usage tips.

@github-actions
Copy link
Copy Markdown

📦 Python package built successfully!

  • Version: 2.3.0.dev4+295b649
  • Wheel: deepnote_toolkit-2.3.0.dev4+295b649-py3-none-any.whl
  • Install:
    pip install "deepnote-toolkit @ https://deepnote-staging-runtime-artifactory.s3.amazonaws.com/deepnote-toolkit-packages/2.3.0.dev4%2B295b649/deepnote_toolkit-2.3.0.dev4%2B295b649-py3-none-any.whl"

@codecov
Copy link
Copy Markdown

codecov Bot commented May 28, 2026

Codecov Report

❌ Patch coverage is 82.35294% with 3 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.10%. Comparing base (98e2371) to head (8899f40).
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
deepnote_toolkit/sql/sql_query_chaining.py 82.35% 1 Missing and 2 partials ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #104      +/-   ##
==========================================
+ Coverage   74.09%   74.10%   +0.01%     
==========================================
  Files          95       95              
  Lines        5678     5693      +15     
  Branches      843      848       +5     
==========================================
+ Hits         4207     4219      +12     
- Misses       1195     1196       +1     
- Partials      276      278       +2     
Flag Coverage Δ
combined 74.10% <82.35%> (+0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@deepnote_toolkit/sql/sql_query_chaining.py`:
- Around line 170-184: The code uses _sql_query_registry keyed only by
variable_name so rebinding the same name leaves a stale expansion; change the
registry check in find_query_preview_references to verify the registered entry
is still the same object by comparing a freshness signal (e.g., store and
compare the original object's id or a weakref when the registry is written)
before trusting registered_source, and if the current object's id/weakref no
longer matches, drop the registry entry instead of using it; update the
registry-writing logic wherever entries are created to include the freshness
signal (and/or clear entries on reassignment) and add a regression test that
binds df_1 to an object producing SQL, then rebinds df_1 to a different object
and asserts the old SQL is not expanded.
- Around line 14-30: Annotate the registry and function with explicit types: add
"from typing import Optional, Dict" and change "_sql_query_registry: dict = {}"
to "_sql_query_registry: Dict[str, str] = {}". Update the function signature for
register_sql_query to "def register_sql_query(variable_name: Optional[str],
query: Optional[str]) -> None:" (keeping the implementation and the
is_single_select_query call unchanged) so static checks enforce the
Optional[str] inputs and a None return type.

In `@tests/unit/test_sql_query_chaining.py`:
- Around line 730-742: Add explicit type annotations to the new test helper and
lifecycle methods: annotate setUp and tearDown with -> None, and annotate _bind
as def _bind(self, name: str, value: typing.Any = "bound") -> None (or use Any
imported from typing) to satisfy ANN202; keep references to
sql_query_chaining._sql_query_registry and __main__ unchanged.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: ASSERTIVE

Plan: Pro

Run ID: 0e17d3d2-abc0-4020-829c-63de436836d8

📥 Commits

Reviewing files that changed from the base of the PR and between 98e2371 and 8899f40.

📒 Files selected for processing (3)
  • deepnote_toolkit/__init__.py
  • deepnote_toolkit/sql/sql_query_chaining.py
  • tests/unit/test_sql_query_chaining.py

Comment on lines +14 to +30
_sql_query_registry: dict = {}


def register_sql_query(variable_name, query):
"""Register a SQL block's source query so downstream blocks can chain on it.

Only single SELECT statements can be expanded into a CTE, so anything else
(or an empty query) clears any previous entry for the variable rather than
storing it.
"""
if variable_name is None:
return

if query is not None and is_single_select_query(query):
_sql_query_registry[variable_name] = query
else:
_sql_query_registry.pop(variable_name, None)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Add a concrete type contract to the new registry API.

_sql_query_registry: dict and register_sql_query(variable_name, query) currently hide the actual Optional[str] -> None contract. Please annotate the registry and this public function explicitly so callers and static checks enforce the same rules as the implementation.

As per coding guidelines, "Always use Optional[T] for parameters that can be None (not T = None)" and "Use explicit type hints for function parameters and return values".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/sql/sql_query_chaining.py` around lines 14 - 30, Annotate
the registry and function with explicit types: add "from typing import Optional,
Dict" and change "_sql_query_registry: dict = {}" to "_sql_query_registry:
Dict[str, str] = {}". Update the function signature for register_sql_query to
"def register_sql_query(variable_name: Optional[str], query: Optional[str]) ->
None:" (keeping the implementation and the is_single_select_query call
unchanged) so static checks enforce the Optional[str] inputs and a None return
type.

Comment on lines +170 to +184
# 1) Dataframe-mode chaining. deepnote-internal registers a block's
# source query keyed by its result variable name. We only trust the
# entry while the variable is still bound in __main__, so a deleted or
# rebound variable doesn't produce a stale CTE expansion.
if variable_name in _sql_query_registry and hasattr(__main__, variable_name):
registered_source = _sql_query_registry[variable_name]
if registered_source:
query_preview_references[variable_name] = registered_source
# Recursively resolve the registered query's own references
find_query_preview_references(
registered_source,
query_preview_references,
processed_queries,
)
continue
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Rebinding the same name still expands stale SQL.

This branch only checks that the name still exists in __main__. If df_1 is rebound to a different object, the old registry entry is still used, so unchaining can execute SQL that no longer matches the notebook state. The registry needs a freshness signal beyond the variable name, or rebinding must clear the entry. Please add a rebinding regression test with the fix.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@deepnote_toolkit/sql/sql_query_chaining.py` around lines 170 - 184, The code
uses _sql_query_registry keyed only by variable_name so rebinding the same name
leaves a stale expansion; change the registry check in
find_query_preview_references to verify the registered entry is still the same
object by comparing a freshness signal (e.g., store and compare the original
object's id or a weakref when the registry is written) before trusting
registered_source, and if the current object's id/weakref no longer matches,
drop the registry entry instead of using it; update the registry-writing logic
wherever entries are created to include the freshness signal (and/or clear
entries on reassignment) and add a regression test that binds df_1 to an object
producing SQL, then rebinds df_1 to a different object and asserts the old SQL
is not expanded.

Comment on lines +730 to +742
def setUp(self):
sql_query_chaining._sql_query_registry.clear()
self._original_vars = vars(__main__).copy()

def tearDown(self):
sql_query_chaining._sql_query_registry.clear()
for key in list(vars(__main__).keys()):
if key not in self._original_vars:
delattr(__main__, key)

def _bind(self, name, value="bound"):
"""Bind a variable in __main__ (simulates the block having executed)."""
setattr(__main__, name, value)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Type-annotate the new test helpers.

The new helper/lifecycle methods should carry explicit -> None annotations, and _bind should type its parameters as well. That keeps this class aligned with the repo rules and fixes Ruff ANN202 on Line 740.

As per coding guidelines, "Use type hints consistently" and "Use explicit type hints for function parameters and return values".

🧰 Tools
🪛 Ruff (0.15.14)

[warning] 740-740: Missing return type annotation for private function _bind

Add return type annotation: None

(ANN202)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/unit/test_sql_query_chaining.py` around lines 730 - 742, Add explicit
type annotations to the new test helper and lifecycle methods: annotate setUp
and tearDown with -> None, and annotate _bind as def _bind(self, name: str,
value: typing.Any = "bound") -> None (or use Any imported from typing) to
satisfy ANN202; keep references to sql_query_chaining._sql_query_registry and
__main__ unchanged.

@deepnote-bot
Copy link
Copy Markdown

🚀 Review App Deployment Started

📝 Description 🌐 Link / Info
🌍 Review application ra-104
🔑 Sign-in URL Click to sign-in
📊 Application logs View logs
🔄 Actions Click to redeploy
🚀 ArgoCD deployment View deployment
Last deployed 2026-05-28 12:12:45 (UTC)
📜 Deployed commit 20c85fd7d7738fb004cf748cb98cb2a77d235c02
🛠️ Toolkit version 295b649

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants