[FSTORE-1938] Support chaining of Transformation Functions using a DAG by manu-sj · Pull Request #580 · logicalclocks/logicalclocks.github.io

manu-sj · 2026-05-18T20:29:43Z

No description provided.

Copilot

Pull request overview

Adds documentation for chaining Transformation Functions into a dependency graph (DAG) in the Hopsworks Feature Store docs, including how execution order is resolved, how to visualize the DAG, and how parallel execution behaves for independent branches.

Changes:

Documented chaining semantics for Transformation Functions (ODT + MDT), including cycle/duplicate-output rejection behavior.
Added guidance on visualizing the transformation execution DAG from UI and SDK.
Added performance/parallelism tuning details via n_processes, including defaults and serving-time pool pre-spawn.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.

File	Description
docs/user_guides/fs/transformation_functions.md	Introduces chained transformation DAG concept, DAG visualization, and performance tuning/parallelism behavior.
docs/user_guides/fs/feature_view/model-dependent-transformations.md	Adds a section describing chaining model-dependent transformations and links to performance tuning guidance.
docs/user_guides/fs/feature_group/on_demand_transformations.md	Adds a section describing chaining on-demand transformations and the cross-DAG path into feature views/MDTs.

…xecution DAG https://hopsworks.atlassian.net/browse/FSTORE-1938 Document chaining of transformation functions across the user guides: how the output of one function feeds another, how the execution DAG resolves the order, how cycles and duplicate output columns are rejected, and how the DAG is rendered from the UI and from the SDK with visualize_transformations(). A Transformation Functions Performance Tuning subsection in the transformation functions guide covers the node-parallel execution model: the n_processes argument and its defaults per input shape, pool pre-spawning through init_serving and init_batch_scoring, Arrow shared-memory staging, and the HSFS_TF_POOL_START_METHOD override. The model-dependent transformations guide notes that statistics for chained functions are fit in dependency order on the data each function sees. The on-demand transformations guide covers chains whose intermediate output is dropped from the feature group. No migration entry is included since the changes are backwards compatible. Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@logicalclocks.com> Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

…xecution DAG https://hopsworks.atlassian.net/browse/FSTORE-1938 Restructure the performance tuning section so it reads in order: what the n_processes argument is, how parallelism maps to the DAG, when it pays off, online serving specifics, implementation notes. The previous version stated the sequential default three times across the first three paragraphs and placed the practical guidance after the implementation internals. Content changes: a call-shape distinction in the guidance (batch and offline calls benefit from worker processes, single feature vectors rarely do because the per-call dispatch cost usually exceeds the work), and a note that pre-spawning the pool removes the startup cost but not the per-call dispatch cost. Both reflect the measured behavior of the online batch chaining benchmark in the loadtest repository. Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@hopsworks.ai> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

…xecution DAG https://hopsworks.atlassian.net/browse/FSTORE-1938 Rework the chaining documentation for reading order on all three pages. The hub page now flows what chaining is, example, uniform offline and online behavior, statistics over chains with a link to the model-dependent page, cross-type chaining, and invalid configurations last instead of interleaved. The model-dependent page gives the statistics-over-chains behavior its own subsection instead of a single dangling sentence after the example, and states that statistics are fit on the train split, each transformation executes once, and the fitted values are persisted for serving. The on-demand page leads with the example like the other pages, and the example now demonstrates the dropped-column claims it previously only stated: both the raw input and the intermediate are dropped, leaving one stored output. Signed-off-by: Manu Sathyarajan Joseph <manu.joseph@hopsworks.ai> Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

manu-sj marked this pull request as draft May 21, 2026 13:06

manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from 5ed6dcb to b770050 Compare May 28, 2026 07:50

manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from 6eacba8 to cbf2ed3 Compare June 4, 2026 11:25

manu-sj marked this pull request as ready for review June 8, 2026 08:59

manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from ff87ced to 4db4444 Compare June 10, 2026 08:19

manu-sj requested a review from Copilot June 10, 2026 08:20

Copilot started reviewing on behalf of manu-sj June 10, 2026 08:21 View session

Copilot AI reviewed Jun 10, 2026

View reviewed changes

manu-sj force-pushed the FSTORE-1938 branch 2 times, most recently from efcea35 to 83a8a2a Compare June 10, 2026 13:02

bubriks approved these changes Jun 12, 2026

View reviewed changes

manu-sj and others added 4 commits June 14, 2026 18:16

Improving docs

28f0fa6

manu-sj force-pushed the FSTORE-1938 branch from 6c984a9 to 28f0fa6 Compare June 14, 2026 16:29

manu-sj merged commit 210f38d into logicalclocks:main Jun 15, 2026
1 check passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FSTORE-1938] Support chaining of Transformation Functions using a DAG#580

[FSTORE-1938] Support chaining of Transformation Functions using a DAG#580
manu-sj merged 4 commits into
logicalclocks:mainfrom
manu-sj:FSTORE-1938

manu-sj commented May 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

manu-sj commented May 18, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants