build(deps): bump dev tooling + wire gooddata-eval into tox#1676
build(deps): bump dev tooling + wire gooddata-eval into tox#1676hkad98 wants to merge 4 commits into
Conversation
Bump development and test tooling to current releases: - ty 0.0.40 -> 0.0.55 - ruff 0.15.15 -> 0.15.20 - pre-commit 4.0.1 -> 4.6.0 - tox 4.32.0 -> 4.56.1, tox-uv 1.29.0 -> 1.35.2 - vcrpy 8.0.0 -> 8.2.1 (root + sdk/pandas/fdw test groups) - moto 5.1.22 -> 5.2.2 (re-lock within existing >=5.1.6 spec) ty 0.0.55's improved inference flags the cast in MetricValueFilter.description as redundant, so drop it (and the now-unused typing.cast import). All other runtime deps left capped as-is; urllib3 stays pinned to the OpenAPI generator. Validated: make lint, format, type-check green; sdk/pandas/fdw/pipelines tests pass. jira: trivial risk: low
Apply ruff autofixes surfaced by `make lint-fix` across gooddata-eval: remove unused imports and re-sort import blocks, and add the missing `# noqa: PLC0415` to the deliberately function-local `timezone as _tz` imports (matching the `_dt` line directly above them). No behavior change. These violations went unnoticed because gooddata-eval is not yet wired into the test/lint harness (see follow-up commit). jira: trivial risk: nonprod
gooddata-eval was the only package without a tox.ini, so its tests never
ran via `make test`/CI. As a result the suite silently drifted from the
source as the package was refactored (notably "move all agentic evaluation
logic into gooddata_eval SDK"), accumulating 6 deterministic failures.
Add a tox.ini mirroring the other packages, with `extras = llm-judge` so
the optional `openai` dependency is installed (9 tests import it). Add
pytest-json-report to the test group since the shared pytest command emits
a JSON report.
Update the 6 stale tests to match current, intentional source behavior:
- langfuse test_kind inference now returns "vis_agentic" for visualization
expected_output
- JSON report run keys are provider-prefixed ("Provider/model") to stay
collision-free across providers
- ItemReport now requires a `question` field
- run_agentic_visualization only deletes conversations it created; a
caller-supplied initial_conversation_id is left intact
All 216 eval tests pass via tox.
jira: trivial
risk: nonprod
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## master #1676 +/- ##
==========================================
- Coverage 79.10% 77.74% -1.36%
==========================================
Files 231 271 +40
Lines 15718 18570 +2852
==========================================
+ Hits 12433 14437 +2004
- Misses 3285 4133 +848 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: ⛔ Files ignored due to path filters (1)
📒 Files selected for processing (1)
🚧 Files skipped from review as they are similar to previous changes (1)
📝 WalkthroughWalkthroughThis PR adds tox.ini and pytest-json-report to gooddata-eval, splits combined datetime imports across agentic evaluation modules, refactors runner imports and a TypedDict definition, updates tests for narrowed imports and changed CLI/visualization expectations, bumps dependency versions, and removes an unused cast in filter.py. Changesgooddata-eval test tooling and CI reporting
Agentic eval import cleanups and runner refactor
Dependency version bumps and filter.py cast removal
Estimated code review effort: 2 (Simple) | ~12 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Comment |
scripts/docs/python_ref_builder.py imports jinja2, but jinja2 was never declared in the workspace test group -- it was only present transitively. Bumping the dev tooling pruned that transitive provider, breaking the docs-scripts-tests CI job (uv sync --group test --locked; make test-docs-scripts) with "ModuleNotFoundError: No module named 'jinja2'". Add jinja2~=3.1 (matching scripts/script-requirements.txt) to the test group alongside the other scripts/docs dependencies so the synced env provides it. jira: trivial risk: nonprod
Bumps development/test tooling to current releases and, along the way, fixes a gap where
gooddata-evalwas never running in the test harness.Commits
build(deps)— bump dev toolingty0.0.40 → 0.0.55ruff0.15.15 → 0.15.20pre-commit4.0.1 → 4.6.0tox4.32.0 → 4.56.1,tox-uv1.29.0 → 1.35.2vcrpy8.0.0 → 8.2.1 (root + sdk/pandas/fdw test groups)moto5.1.22 → 5.2.2 (re-lock within existing>=5.1.6spec)ty0.0.55's improved inference flags thecastinMetricValueFilter.descriptionas redundant, so it (and the now-unusedtyping.castimport) is removed. Other runtime deps left capped as-is;urllib3stays pinned to the OpenAPI generator.chore(eval)— fix lint violations flagged by ruffRuff autofixes across
gooddata-eval: remove unused imports, re-sort import blocks, and add the missing# noqa: PLC0415to the deliberately function-localtimezone as _tzimports. No behavior change.test(eval)— wire gooddata-eval into tox and fix stale unit testsgooddata-evalwas the only package without atox.ini, so its tests never ran viamake test/CI. The suite had silently drifted from the source across recent refactors, accumulating 6 deterministic failures.tox.inimirroring the other packages, withextras = llm-judgeso the optionalopenaidependency is installed.pytest-json-reportto the test group (shared pytest command emits a JSON report).vis_agentictest-kind inference, provider-prefixed JSON report run keys, requiredItemReport.question, andrun_agentic_visualizationonly deleting conversations it created.All 216 eval tests now pass via tox.
Validation
make lint,make format,make type-checkgreen across all packages; sdk (488), pandas (343), fdw (43), pipelines (193), eval (216) test suites pass.Summary by CodeRabbit
New Features
Bug Fixes
Chores