build(deps): bump dev tooling + wire gooddata-eval into tox by hkad98 · Pull Request #1676 · gooddata/gooddata-python-sdk

hkad98 · 2026-07-01T05:59:38Z

Bumps development/test tooling to current releases and, along the way, fixes a gap where gooddata-eval was never running in the test harness.

Commits

`build(deps)` — bump dev tooling

ty 0.0.40 → 0.0.55
ruff 0.15.15 → 0.15.20
pre-commit 4.0.1 → 4.6.0
tox 4.32.0 → 4.56.1, tox-uv 1.29.0 → 1.35.2
vcrpy 8.0.0 → 8.2.1 (root + sdk/pandas/fdw test groups)
moto 5.1.22 → 5.2.2 (re-lock within existing >=5.1.6 spec)

ty 0.0.55's improved inference flags the cast in MetricValueFilter.description as redundant, so it (and the now-unused typing.cast import) is removed. Other runtime deps left capped as-is; urllib3 stays pinned to the OpenAPI generator.

`chore(eval)` — fix lint violations flagged by ruff

Ruff autofixes across gooddata-eval: remove unused imports, re-sort import blocks, and add the missing # noqa: PLC0415 to the deliberately function-local timezone as _tz imports. No behavior change.

`test(eval)` — wire gooddata-eval into tox and fix stale unit tests

gooddata-eval was the only package without a tox.ini, so its tests never ran via make test/CI. The suite had silently drifted from the source across recent refactors, accumulating 6 deterministic failures.

Add a tox.ini mirroring the other packages, with extras = llm-judge so the optional openai dependency is installed.
Add pytest-json-report to the test group (shared pytest command emits a JSON report).
Update the 6 stale tests to match current, intentional source behavior: vis_agentic test-kind inference, provider-prefixed JSON report run keys, required ItemReport.question, and run_agentic_visualization only deleting conversations it created.

All 216 eval tests now pass via tox.

Validation

make lint, make format, make type-check green across all packages; sdk (488), pandas (343), fdw (43), pipelines (193), eval (216) test suites pass.

Summary by CodeRabbit

New Features
- Added richer test reporting with per-environment JSON output and JSON reports for test runs.
Bug Fixes
- Updated CLI JSON output to use provider-prefixed model keys consistently.
- Improved conversation handling to preserve caller-supplied conversation IDs.
- Refined visualization-related item classification.
Chores
- Updated test and workspace development dependencies (including tox, linting, and vcrpy).
- Aligned import/formatting patterns across evaluation components and tests.

Bump development and test tooling to current releases: - ty 0.0.40 -> 0.0.55 - ruff 0.15.15 -> 0.15.20 - pre-commit 4.0.1 -> 4.6.0 - tox 4.32.0 -> 4.56.1, tox-uv 1.29.0 -> 1.35.2 - vcrpy 8.0.0 -> 8.2.1 (root + sdk/pandas/fdw test groups) - moto 5.1.22 -> 5.2.2 (re-lock within existing >=5.1.6 spec) ty 0.0.55's improved inference flags the cast in MetricValueFilter.description as redundant, so drop it (and the now-unused typing.cast import). All other runtime deps left capped as-is; urllib3 stays pinned to the OpenAPI generator. Validated: make lint, format, type-check green; sdk/pandas/fdw/pipelines tests pass. jira: trivial risk: low

Apply ruff autofixes surfaced by `make lint-fix` across gooddata-eval: remove unused imports and re-sort import blocks, and add the missing `# noqa: PLC0415` to the deliberately function-local `timezone as _tz` imports (matching the `_dt` line directly above them). No behavior change. These violations went unnoticed because gooddata-eval is not yet wired into the test/lint harness (see follow-up commit). jira: trivial risk: nonprod

gooddata-eval was the only package without a tox.ini, so its tests never ran via `make test`/CI. As a result the suite silently drifted from the source as the package was refactored (notably "move all agentic evaluation logic into gooddata_eval SDK"), accumulating 6 deterministic failures. Add a tox.ini mirroring the other packages, with `extras = llm-judge` so the optional `openai` dependency is installed (9 tests import it). Add pytest-json-report to the test group since the shared pytest command emits a JSON report. Update the 6 stale tests to match current, intentional source behavior: - langfuse test_kind inference now returns "vis_agentic" for visualization expected_output - JSON report run keys are provider-prefixed ("Provider/model") to stay collision-free across providers - ItemReport now requires a `question` field - run_agentic_visualization only deletes conversations it created; a caller-supplied initial_conversation_id is left intact All 216 eval tests pass via tox. jira: trivial risk: nonprod

codecov · 2026-07-01T06:03:34Z

Codecov Report

❌ Patch coverage is 46.15385% with 14 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.74%. Comparing base (f9639cb) to head (10ea9f2).
⚠️ Report is 53 commits behind head on master.

Files with missing lines	Patch %	Lines
...eval/src/gooddata_eval/core/agentic/alert_skill.py	33.33%	2 Missing ⚠️
...val/src/gooddata_eval/core/agentic/conversation.py	0.00%	2 Missing ⚠️
...src/gooddata_eval/core/agentic/general_question.py	0.00%	2 Missing ⚠️
...a-eval/src/gooddata_eval/core/agentic/guardrail.py	0.00%	2 Missing ⚠️
...val/src/gooddata_eval/core/agentic/metric_skill.py	33.33%	2 Missing ⚠️
...eval/src/gooddata_eval/core/agentic/search_tool.py	0.00%	2 Missing ⚠️
...al/src/gooddata_eval/core/agentic/visualization.py	0.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1676      +/-   ##
==========================================
- Coverage   79.10%   77.74%   -1.36%     
==========================================
  Files         231      271      +40     
  Lines       15718    18570    +2852     
==========================================
+ Hits        12433    14437    +2004     
- Misses       3285     4133     +848

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

coderabbitai · 2026-07-01T06:04:59Z

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: f6572f35-153e-47ea-9f10-54e6e893f6d1

📥 Commits

Reviewing files that changed from the base of the PR and between ecd1f66 and 10ea9f2.

⛔ Files ignored due to path filters (1)

uv.lock is excluded by !**/*.lock

📒 Files selected for processing (1)

pyproject.toml

🚧 Files skipped from review as they are similar to previous changes (1)

pyproject.toml

📝 Walkthrough

Walkthrough

This PR adds tox.ini and pytest-json-report to gooddata-eval, splits combined datetime imports across agentic evaluation modules, refactors runner imports and a TypedDict definition, updates tests for narrowed imports and changed CLI/visualization expectations, bumps dependency versions, and removes an unused cast in filter.py.

Changes

gooddata-eval test tooling and CI reporting

Layer / File(s)	Summary
Tox configuration and JSON test reporting `packages/gooddata-eval/tox.ini`, `packages/gooddata-eval/pyproject.toml`	Adds a tox test matrix for Python 3.10-3.14 with wheel packaging, llm-judge extra, and pytest commands producing coverage XML and JSON reports; adds pytest-json-report dependency.

Agentic eval import cleanups and runner refactor

Layer / File(s)	Summary
Runner import and TypedDict refactor `.../cli/agentic_runner.py`, `.../cli/main.py`	Removes unused HttpxLangfuseClient import, converts _LfKw to a class-based TypedDict, and repositions an import in main.py.
Datetime import splits across agentic skill modules `.../core/agentic/alert_skill.py`, `.../conversation.py`, `.../general_question.py`, `.../guardrail.py`, `.../metric_skill.py`, `.../search_tool.py`, `.../visualization.py`	Splits combined datetime/timezone imports into separate statements and reorders minor imports.
Test import narrowing for agentic skills `.../tests/test_agentic_alert_skill.py`, `.../tests/test_agentic_conversation.py`, `.../tests/test_agentic_general_question.py`, `.../tests/test_agentic_guardrail.py`, `.../tests/test_agentic_search_tool.py`	Narrows test imports to match reduced module exports.
Visualization conversation-deletion test updates `.../tests/test_agentic_visualization.py`	Updates assertions so caller-supplied conversations are preserved while internally created ones are deleted.
CLI test updates for provider-prefixed keys and thread-safety `.../tests/test_cli.py`, `.../tests/test_langfuse_source.py`	Expects provider-prefixed model keys, streamlines thread-safety test imports and ItemReport construction, and updates expected test_kind to vis_agentic.

Dependency version bumps and filter.py cast removal

Layer / File(s)	Summary
Dependency version bumps `packages/gooddata-fdw/pyproject.toml`, `packages/gooddata-pandas/pyproject.toml`, `packages/gooddata-sdk/pyproject.toml`, `pyproject.toml`	Bumps vcrpy, pre-commit, ruff, ty, tox, and tox-uv versions.
Filter description cast removal `packages/gooddata-sdk/src/gooddata_sdk/compute/model/filter.py`	Removes cast import and replaces cast(...) with a direct values assignment.

Estimated code review effort: 2 (Simple) | ~12 minutes

Poem

A rabbit hopped through imports neat,
Split datetimes clean, no more repeat,
Tox hums along with JSON in tow,
vcrpy bumped, versions aglow,
Tests now dance to provider's beat! 🐇✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.
Title check	✅ Passed	The title accurately captures the main changes: development dependency bumps and adding gooddata-eval to the tox test harness.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.

_{Comment @coderabbitai help to get the list of available commands.}

scripts/docs/python_ref_builder.py imports jinja2, but jinja2 was never declared in the workspace test group -- it was only present transitively. Bumping the dev tooling pruned that transitive provider, breaking the docs-scripts-tests CI job (uv sync --group test --locked; make test-docs-scripts) with "ModuleNotFoundError: No module named 'jinja2'". Add jinja2~=3.1 (matching scripts/script-requirements.txt) to the test group alongside the other scripts/docs dependencies so the synced env provides it. jira: trivial risk: nonprod

hkad98 added 3 commits June 30, 2026 16:24

hkad98 requested review from lupko and pcerny as code owners July 1, 2026 05:59

hkad98 enabled auto-merge July 1, 2026 07:54

zdenekmusil-gd approved these changes Jul 1, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build(deps): bump dev tooling + wire gooddata-eval into tox#1676

build(deps): bump dev tooling + wire gooddata-eval into tox#1676
hkad98 wants to merge 4 commits into
gooddata:masterfrom
hkad98:jkd/deps

hkad98 commented Jul 1, 2026 •

edited by coderabbitai Bot

Loading

Uh oh!

codecov Bot commented Jul 1, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jul 1, 2026 •

edited

Loading

Walkthrough

Changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

hkad98 commented Jul 1, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Commits