Feat/delivery dashboard by ritmun · Pull Request #3261 · openshift/osde2e

ritmun · 2026-06-18T16:30:09Z

DRAFT: final PRs will be split up

Summary by CodeRabbit

New Features
- Added a new dashboard CLI command that launches an interactive web server for monitoring cluster reserves, usage, test results, and operator pipelines.
- Dashboard includes REST API endpoints for programmatic access to cluster and test data.
- Added real-time data ingestion via SQS queue processing with historical backfill capabilities.
- Web interface displays operator status, cluster infrastructure metrics, and test failure analysis across multiple environments.
Documentation
- Added comprehensive dashboard documentation with usage examples, API reference, and troubleshooting guidance.
Chores
- Updated dependencies and added SQLite database support.
- Added Docker configuration and deployment scripts for OpenShift cluster deployment.

- Add dashboard command (pkg/dashboard, cmd/osde2e/dashboard) - SQLite store populated via SQS S3 event notifications - Backfill support for historical S3 data - Pipelines view with stage/int status, version, AI analysis - Per-operator history page with failure details - dashboard.Dockerfile for cluster builds using public images - Deployment script at scripts/dashboard/deploy.sh 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Mirror osde2e.Dockerfile layer caching: COPY go.* + go mod download as separate layer so deps are cached between builds - Update labels to reflect delivery-dashboard purpose 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Rename Usage → Clusters tab, remove Dashboard/Reserves tabs - Clusters page shows all osde2e clusters per env (int/stage/prod) with cluster ID, state, availability, version, flavor, ad hoc image, created/expires datetimes in local tz - Collapsible env sections - Pipelines: rename Operator column to Component, reorder Int/Stage - Update subtitle text on Pipelines page - Tab title: Delivery Dashboard - Pipelines - Build binary to out/osde2e (consistent with Makefile OUT_DIR) - Redirect / and /dashboard to /dashboard/usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Rename Operator/Component → Deliverable across all UI and Go code - Rename Clusters tab → Infra - Pipelines tab now first in nav, redirects / → /dashboard/deliverables - URL routes: /dashboard/operators → /dashboard/deliverables - API: /api/v1/operators → /api/v1/deliverables - Internal: operatorCollector → deliverableCollector - handleOperatorsPage → handleDeliverablesPage - Pipeline detail: chronological per-version int→stage visual view - VersionPipeline model groups runs by version - Store.GetHistory populates Versions field 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Add Analysis tab (/dashboard/analysis) to nav - Group failed runs by first sentence of LLM root cause (fallback: first line of failure message) - FailureGroup model with FailureMatch, RootCause, Recommendations, Entries - GetFailureGroups in store.go — groups sorted by match count desc - analysis.html: truncated group header with hover tooltip, AI analysis context, deliverable links to pipeline detail, env chips, timestamps - Rename: group.TestName → group.FailureMatch - Badge shows "N matching" instead of "N failures" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- Replace PVC with emptyDir for SQLite DB — avoids ReadWriteOnce multi-attach conflict during rolling updates - Add RollingUpdate strategy (maxSurge=1, maxUnavailable=0) so new pod is healthy before old pod terminates - --backfill on startup repopulates DB from S3 in ~5s - Remove PVC creation from deploy.sh 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- .claude/commands/dashboard-dev.md: thin wrapper that loads the skill - .claude/skills/dashboard-dev/SKILL.md: full onboarding guide covering fork setup, local dev, OpenShift deploy, common tasks, architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

openshift-merge-bot · 2026-06-18T16:30:12Z

There are test jobs defined for this repository which are not configured to run automatically. Comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger jobs most likely to be impacted by the proposed changes.Comment /pipeline required to trigger all required & necessary jobs.

openshift-ci · 2026-06-18T16:30:14Z

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

coderabbitai · 2026-06-18T16:30:21Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 7c4f7dac-ef51-4d8e-9a69-e27397b4665e

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

openshift-ci · 2026-06-18T16:30:56Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ritmun

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [ritmun]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

- Remove root Dockerfile (superseded by dashboard.Dockerfile) - Remove pkg/dashboard/*.md scaffolding files (BUILD_STATUS, COMPLETE, IMPLEMENTATION_SUMMARY, PLAN, TEMPLATE_FIX) - Remove dashboard.html and reserves.html templates (pages removed from nav) - Remove handleDashboard and handleReservesPage handlers + routes (/dashboard and /dashboard/reserves) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Dead code removed: - tests.html template (no handler renders it) - models.TestSummary (never used) - models.TestResult.SuccessRate() (never called) - store.OperatorNames() (never called) - collectors.TestResultsCollector.GetTestResultByJobID() (never called) Duplication eliminated via new collectors/helpers.go: - suiteStatus(suite) replaces 3x identical passed/failed/error logic - parseTimestamp(ts) replaces 2x identical time.Parse fallback chains - presignURL(client, bucket, key) extracted (available for future use) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- deploy.sh: replace in-cluster BuildConfig with local docker build/push to quay.io/rmundhe_oc/delivery-dashboard; use kustomize from adjacent hp-delivery-apps repo instead of inline oc apply heredocs - SKILL.md: fix secret name ocm-token → ocm-credentials (OCM_CLIENT_ID/SECRET); update deploy steps to reflect new flow - dashboard.Dockerfile: no functional change (quay expiry moved to deploy-time flag) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

- deploy.sh: simplify to 56 lines — fixed build context, secret pre-flight, straight kustomize apply (no SQS patching), DASHBOARD_QUAY_IMAGE required env - configs/local/dashboard-build/: fixed podman build context with Dockerfile (moved from dashboard.local.Dockerfile); binary gitignored there - Remove scripts/dashboard/run-local.sh → replaced by make dashboard - Makefile: add dashboard target (build + run locally with --backfill) - AGENTS.md: add pkg/dashboard + .claude/skills to architecture, tip for /dashboard-dev skill - SKILL.md: full rewrite — current deploy flow, make dashboard, ocm data in architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

Files renamed: collectors/operators.go → collectors/deliverables.go templates/operators.html → templates/deliverables.html Symbols renamed throughout: OperatorStatusCollector → DeliverableCollector NewOperatorStatusCollector → NewDeliverableCollector CollectOperatorStatus → CollectDeliverables OperatorStatus → DeliverableStatus OperatorName → Name operator_name (JSON/DB) → name The dashboard delivers more than just operators — services, pipelines, etc. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>

coderabbitai

Actionable comments posted: 1

Note

Due to the large number of review comments, Critical severity comments were prioritized as inline comments.

🟠 Major comments (26)

pkg/dashboard/collectors/sqs.go-122-128 (1)
122-128: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Partial record failures cause premature message deletion.

If one S3 record fails in processKey, the error is logged but processMessage returns nil, triggering message deletion. Subsequent records in the same SQS message are lost.

Consider tracking failures and returning an error when any record fails:
+	var errs []error
 	for _, rec := range event.Records {
 		if err := c.processKey(rec.S3.Bucket.Name, rec.S3.Object.Key); err != nil {
 			log.Printf("SQS consumer: skip %s: %v", rec.S3.Object.Key, err)
+			errs = append(errs, err)
 		}
 	}
+	if len(errs) > 0 {
+		return fmt.Errorf("%d record(s) failed", len(errs))
+	}
 	return nil
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/sqs.go` around lines 122 - 128, The processMessage
function returns nil even when processKey fails for one or more records in the
loop, causing the SQS message to be marked as successfully processed and deleted
despite incomplete processing. Modify the loop that iterates through
event.Records to track whether any record processing fails, then return an error
from the function if any processKey call returns an error, rather than just
logging and continuing with nil return. This ensures failed records cause the
message to be retried instead of being lost.
pkg/dashboard/server/templates/analysis.html-179-179 (1)
179-179: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add rel="noopener noreferrer" to the external logs link.

This prevents reverse-tabnabbing when opening third-party URLs in a new tab.

As per coding guidelines: “Web security (prodsec-skills)”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/templates/analysis.html` at line 179, The external logs
link in the analysis.html template is missing the rel attribute which is
required for security when opening third-party URLs in a new tab. Add
rel="noopener noreferrer" to the anchor tag with href="{{.LogURL}}" and
target="_blank" to prevent reverse-tabnabbing attacks and protect user security
when the link is clicked.
Source: Coding guidelines
pkg/dashboard/server/server.go-350-360 (1)
350-360: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Health degradation should reflect configured dependencies, not just collector presence.

The endpoint reports "degraded" whenever OCM/S3 collectors are nil, even when those integrations are intentionally disabled by config. That produces false-negative health for valid deployments.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/server.go` around lines 350 - 360, The handleHealth
function in the Server type incorrectly marks the health status as degraded
whenever OCM or S3 collectors are nil, without considering whether those
integrations are intentionally disabled through configuration. Instead of
directly checking if s.reserveCollector and s.testResultCollector are non-nil,
modify the logic to check if these integrations are configured to be enabled and
only mark them as not connected if they are configured but unavailable. This
way, deployments that intentionally disable OCM or S3 will correctly report "ok"
status instead of false-negative "degraded" status.
pkg/dashboard/server/templates.go-6-47 (1)
6-47: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Replace standard logger calls with structured klog/logr logging.

Template load/render paths currently use log.Printf; this should align with the repository logging standard for consistent structured output.

As per coding guidelines: “Use logr/klog for logging, not fmt.Println” and “Prefer structured logging with logr/klog”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/templates.go` around lines 6 - 47, The renderTemplate
function is using the standard log.Printf calls for error logging instead of
structured logging with klog/logr, which is inconsistent with the repository's
logging standards. Replace both log.Printf calls (one after template parsing
failure and one after template execution failure) with structured logging calls
using logr/klog to maintain consistency with the codebase's logging conventions.
Source: Coding guidelines
pkg/dashboard/server/templates/analysis.html-160-161 (1)
160-161: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Escape deliverable names before placing them in path segments.

Using raw {{.OperatorName}} in /dashboard/deliverables/... can break navigation for names containing reserved URL characters. Add a path-escape template helper and use it here.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/templates/analysis.html` around lines 160 - 161, The
href attribute in the deliverable link is using raw {{.OperatorName}} directly
in the URL path, which can break navigation if the name contains reserved URL
characters like slashes, question marks, or ampersands. Create or use a
path-escape template helper function and apply it to {{.OperatorName}} within
the href attribute in the anchor tag at the specified location to properly
encode the operator name for use as a URL path segment.
pkg/dashboard/server/templates/operators.html-279-367 (1)
279-367: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Move <dialog> nodes outside <tbody> to keep valid table structure.

<tbody> may only contain <tr> children. Rendering dialogs inside the table body produces invalid markup and can trigger browser DOM corrections that break event/data coupling.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/templates/operators.html` around lines 279 - 367, The
dialog elements with IDs d-{{$i}}-stage and d-{{$i}}-int are currently nested
inside a tbody element, which creates invalid HTML markup since tbody should
only contain tr children. Move both dialog blocks outside of the tbody element
to maintain valid table structure and prevent browser DOM corrections that could
break event handling and data relationships.
pkg/dashboard/server/templates/pipeline-detail.html-269-270 (1)
269-270: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Harden external links opened in new tabs.

Add rel="noopener noreferrer" to each target="_blank" anchor in the dialogs.

As per coding guidelines: “Web security (prodsec-skills)”.

Also applies to: 311-312
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/templates/pipeline-detail.html` around lines 269 - 270,
Add the security attribute rel="noopener noreferrer" to each anchor tag that
opens in a new tab using target="_blank". In the anchor tags for $intRun.LogURL
and $intRun.JUnitURL, insert rel="noopener noreferrer" between the href and
target attributes. Apply the same fix to the similar anchor tags at lines
311-312 that also use target="_blank". This prevents potential security
vulnerabilities when external links are opened in new tabs.
Source: Coding guidelines
pkg/dashboard/server/server.go-3-17 (1)
3-17: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick win

Use klog/logr instead of log.Printf across this server package.

This file consistently uses the standard logger (log.Printf), which diverges from project logging requirements and weakens structured observability.

As per coding guidelines: “Use logr/klog for logging, not fmt.Println” and “Prefer structured logging with logr/klog”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/server.go` around lines 3 - 17, The import section
includes the standard library "log" package, but the project requires structured
logging via klog/logr per coding guidelines. Replace the "log" import with the
appropriate klog/logr imports (such as "sigs.k8s.io/controller-runtime/pkg/log"
or the project's preferred logr implementation), and then update all log.Printf
calls throughout the server package to use the structured logger instead,
following the project's logging patterns for consistency and improved
observability.
Source: Coding guidelines
pkg/dashboard/server/templates/operators.html-318-319 (1)
318-319: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add rel="noopener noreferrer" to all target="_blank" links.

Opening untrusted URLs in a new tab without rel exposes reverse-tabnabbing.
Suggested fix
-{{if .LogURL}}<a href="{{.LogURL}}" target="_blank">Logs</a>{{end}}
-{{if .JUnitURL}}<a href="{{.JUnitURL}}" target="_blank">JUnit XML</a>{{end}}
+{{if .LogURL}}<a href="{{.LogURL}}" target="_blank" rel="noopener noreferrer">Logs</a>{{end}}
+{{if .JUnitURL}}<a href="{{.JUnitURL}}" target="_blank" rel="noopener noreferrer">JUnit XML</a>{{end}}
As per coding guidelines: “Web security (prodsec-skills)”.

Also applies to: 363-364
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/templates/operators.html` around lines 318 - 319, The
links in operators.html that open in a new tab using target="_blank"
(specifically the LogURL and JUnitURL links at lines 318-319 and the same
pattern at lines 363-364) are missing the security attribute rel="noopener
noreferrer" which prevents reverse-tabnabbing attacks. Add rel="noopener
noreferrer" to both the LogURL and JUnitURL anchor tags to include the complete
security attribute alongside the target="_blank" attribute.
Source: Coding guidelines
pkg/dashboard/server/server.go-103-116 (1)
103-116: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add HTTP server timeouts to harden availability under slow-client conditions.

http.Server is created without ReadHeaderTimeout/ReadTimeout/WriteTimeout/IdleTimeout. This leaves the handler exposed to connection starvation patterns.
Suggested fix
 func (s *Server) Start(addr string, ctx context.Context) error {
-	srv := &http.Server{Addr: addr, Handler: s.mux}
+	srv := &http.Server{
+		Addr:              addr,
+		Handler:           s.mux,
+		ReadHeaderTimeout: 10 * time.Second,
+		ReadTimeout:       30 * time.Second,
+		WriteTimeout:      30 * time.Second,
+		IdleTimeout:       60 * time.Second,
+	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/server.go` around lines 103 - 116, In the Start method
of the Server struct, the http.Server initialization is missing timeout
configurations that protect against slow-client connection starvation. Add
ReadHeaderTimeout, ReadTimeout, WriteTimeout, and IdleTimeout fields to the
http.Server struct creation (the line where srv := &http.Server{Addr: addr,
Handler: s.mux} is defined) with appropriate timeout values to harden
availability under slow-client conditions.
pkg/dashboard/server/server.go-428-433 (1)
428-433: ⚠️ Potential issue | 🟠 Major

Set Content-Type header before calling WriteHeader in error responses.

In sendAPIError, WriteHeader is called before sendJSON sets the Content-Type header. In Go HTTP semantics, once WriteHeader is called, any subsequent Header().Set() calls are ignored. Error responses will be sent without the correct Content-Type: application/json header, potentially breaking clients expecting JSON.
Suggested fix
 func (s *Server) sendAPIError(w http.ResponseWriter, message string, statusCode int) {
-	w.WriteHeader(statusCode)
-	s.sendJSON(w, models.APIResponse{
+	w.Header().Set("Content-Type", "application/json")
+	w.WriteHeader(statusCode)
+	_ = json.NewEncoder(w).Encode(models.APIResponse{
 		Success: false,
 		Error:   message,
 	})
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/server.go` around lines 428 - 433, In the sendAPIError
method of the Server struct, the WriteHeader call is being made before the
Content-Type header is set by sendJSON, which means Go will ignore any
subsequent header modifications. To fix this, explicitly set the Content-Type
header to application/json on the response writer before calling
w.WriteHeader(statusCode), ensuring the header is properly included in the error
response sent to clients.
pkg/dashboard/config/config.go-10-35 (1)
10-35: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy lift

Consolidate dashboard config keys into the shared config contract.

Defining a separate key registry here diverges from the repository’s configuration contract and risks precedence/documentation drift. Please move/alias these keys through pkg/common/config/config.go and reference them from this package.

Based on learnings, "Config changes? Edit config.go only". As per coding guidelines, "Primary file: pkg/common/config/config.go - All configuration options (START HERE)".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/config/config.go` around lines 10 - 35, The dashboard
configuration constants (Port, Environment, RefreshInterval,
ExpirationWarningThreshold, MaxTestResults, LookbackDays, SQSQueueURL, DBPath)
are currently defined in pkg/dashboard/config/config.go but should be
centralized in the shared configuration contract. Move these constant
definitions from the dashboard config file to pkg/common/config/config.go where
all configuration options are managed, then import and reference them from
pkg/dashboard/config/config.go to avoid duplication and maintain a single source
of truth for configuration keys across the repository.
Sources: Coding guidelines, Learnings
pkg/dashboard/collectors/operators.go-124-126 (1)
124-126: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Version is dropped from status grouping, causing record overwrite.

CollectOperatorStatus groups/indexes by operator name (+env) but not version. Different versions can overwrite each other, returning incorrect operator status.
Suggested fix
- type groupKey struct{ name, env string }
+ type groupKey struct{ name, version, env string }

- name, _, env := parseComponentPath(component)
- gk := groupKey{name, env}
+ name, version, env := parseComponentPath(component)
+ gk := groupKey{name: name, version: version, env: env}

- indexKey := r.name
+ indexKey := fmt.Sprintf("%s|%s", r.name, r.version)
Also applies to: 152-153, 237-243
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/operators.go` around lines 124 - 126, The groupKey
struct used in CollectOperatorStatus only includes name and env fields, but
omits version, which causes different operator versions to overwrite each other
in the newestByGroup map. Add a version field to the groupKey struct definition
and update all places where groupKey instances are created (including the
locations at lines 152-153 and 237-243) to include the version information when
grouping operators, ensuring each unique operator-version-env combination is
treated as a separate entry.
pkg/dashboard/collectors/operators.go-207-208 (1)
207-208: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Environment fallback reads the wrong S3 key.

fetchMetaFromLog expects the full component path segment, but callers pass operator name. This builds non-existent log paths and keeps env as unknown.
Suggested fix
-func (c *OperatorStatusCollector) fetchMetaFromLog(name, date, jobID string) (env, version string) {
-	logKey := fmt.Sprintf("test-results/%s/%s/%s/test_output.log", name, date, jobID)
+func (c *OperatorStatusCollector) fetchMetaFromLog(component, date, jobID string) (env, version string) {
+	logKey := fmt.Sprintf("test-results/%s/%s/%s/test_output.log", component, date, jobID)
- if detected := c.fetchEnvFromLog(gk.name, parts[2], parts[3]); detected != "" {
+ if detected := c.fetchEnvFromLog(cand.component, parts[2], parts[3]); detected != "" {
- if detected := c.fetchEnvFromLog(operatorName, rk.dateStr, rk.jobID); detected != "" {
+ if detected := c.fetchEnvFromLog(rk.component, rk.dateStr, rk.jobID); detected != "" {
Also applies to: 292-294, 469-470
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/operators.go` around lines 207 - 208, The
fetchEnvFromLog method expects the full component path segment as its first
parameter, but the current calls (in the block around fetchEnvFromLog at line
207-208, and also in the calls mentioned at lines 292-294 and 469-470) are
passing only the operator name (gk.name) instead. Update all three
fetchEnvFromLog call sites to pass the complete component path segment rather
than just gk.name to ensure the correct log paths are constructed and the
environment value is properly detected instead of remaining as unknown.
pkg/dashboard/collectors/helpers.go-21-30 (1)
21-30: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Do not fallback to time.Now() for parse failures.

Using current time on parse failure misorders historical runs as newest. Return zero time and let callers apply an explicit fallback source.
Suggested fix
 func parseTimestamp(ts string) time.Time {
 	if t, err := time.Parse("2006-01-02T15:04:05", ts); err == nil {
 		return t
 	}
 	if t, err := time.Parse(time.RFC3339, ts); err == nil {
 		return t
 	}
-	return time.Now()
+	return time.Time{}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/helpers.go` around lines 21 - 30, The parseTimestamp
function currently returns time.Now() when both timestamp parsing attempts fail,
which causes historical runs to be misordered by appearing as the newest.
Replace the final return statement (return time.Now()) with return time.Time{}
to return the zero time value instead, allowing callers to apply their own
explicit fallback logic for handling missing or invalid timestamps.
pkg/dashboard/models/types.go-21-23 (1)
21-23: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Guard zero/expired timestamps in IsExpiringSoon.

Current logic marks zero (and already expired) timestamps as “expiring soon”, which inflates expiry metrics.
Suggested fix
 func (c *ClusterReserve) IsExpiringSoon(threshold time.Duration) bool {
-	return time.Until(c.ExpiresAt) < threshold
+	if c.ExpiresAt.IsZero() {
+		return false
+	}
+	remaining := time.Until(c.ExpiresAt)
+	return remaining >= 0 && remaining < threshold
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/models/types.go` around lines 21 - 23, The IsExpiringSoon
method on the ClusterReserve type currently returns true for zero timestamps and
already-expired timestamps because negative durations from time.Until() are less
than any positive threshold. Add guards to check if c.ExpiresAt is the zero
value or is in the past (already expired) before evaluating whether it is
expiring soon based on the threshold. Return false if the timestamp is zero or
already expired.
pkg/dashboard/config/config.go-82-88 (1)
82-88: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Normalize environment aliases in OCMEnvironments.

integration is documented but not mapped to int. Returning raw input can produce invalid provider env names and silent skips.
Suggested fix
+import "strings"
+
 func (c *Config) OCMEnvironments() []string {
-	switch c.Environment {
+	env := strings.ToLower(strings.TrimSpace(c.Environment))
+	switch env {
 	case "all", "":
 		return []string{"stage", "int", "prod"}
+	case "integration":
+		return []string{"int"}
 	default:
-		return []string{c.Environment}
+		return []string{env}
 	}
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/config/config.go` around lines 82 - 88, The OCMEnvironments
method in the Config type does not normalize environment aliases before
returning them, causing documented aliases like "integration" to be passed
through as-is instead of being mapped to their canonical forms like "int". This
results in invalid provider environment names being returned silently. Add
environment alias normalization logic to the OCMEnvironments method that maps
aliases such as "integration" to "int" before the switch statement processes the
environment value, ensuring that all documented aliases are properly converted
to their expected canonical forms.
pkg/dashboard/collectors/reserves.go-45-48 (1)
45-48: ⚠️ Potential issue | 🟠 Major

Add pagination to OCM cluster list queries to fetch all results.

Both methods set Size() but omit page iteration, returning only the first page. In OCM environments with reserves/clusters exceeding the page size, data will be incomplete. Use the pagination pattern from pkg/common/providers/ocmprovider/cluster.go (loop with .Page(page) and increment page counter). Also applies to pkg/dashboard/collectors/usage.go.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/reserves.go` around lines 45 - 48, The cluster list
query in the GetConnection().ClustersMgmt().V1().Clusters().List() call sets a
Size but lacks pagination logic, causing it to return only the first page of
results. Implement pagination by wrapping the Send() call in a loop that
iterates through pages using .Page(page) with an incrementing page counter,
following the pagination pattern from
pkg/common/providers/ocmprovider/cluster.go. Apply this same fix to the
equivalent code in pkg/dashboard/collectors/usage.go to ensure all clusters and
usage data are fetched across all pages.
pkg/dashboard/collectors/operators.go-441-446 (1)
441-446: ⚠️ Potential issue | 🟠 Major

Eliminate redundant per-run S3 list in fan-out loop.

The initial ListObjectsV2Pages scan (lines 32–57) already identifies all JUnit XML keys for the operator, but the code discards the key and stores only the (component, dateStr, jobID) tuple. The fan-out loop then performs a separate ListObjectsV2 for each candidate run to rediscover the same key.

Extend runKey to include the JUnit key discovered during the initial scan, eliminating the redundant listing on lines 443–446.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/operators.go` around lines 441 - 446, The runKey
structure currently only stores component, dateStr, and jobID, but the initial
ListObjectsV2Pages scan already discovers the JUnit XML keys that are needed
later. Extend the runKey struct to include a field for the JUnit XML key,
populate this field during the initial ListObjectsV2Pages scan (lines 32-57)
when each candidate run is identified, and then use the pre-discovered key from
the runKey in the fan-out loop instead of performing the redundant ListObjectsV2
call on lines 443-446.
pkg/dashboard/collectors/s3tests.go-169-170 (1)
169-170: ⚠️ Potential issue | 🟠 Major

Stop pagination early to reach maxResults causes omission of newer tests.

The function stops ListObjectsV2Pages once resultsByJob reaches maxResults entries (line 170). However, S3 lists objects in lexicographical order by key. Since keys include dates (test-results/<component>/<date>/<job-id>/junit*.xml), lexicographical ordering matches chronological order, placing older jobs in earlier pages and newer jobs in later pages. Stopping early prevents collection of newer jobs that appear lexicographically after the maxResults threshold. The subsequent sort (line 182-185) only reorders the incomplete subset already collected.

Fix: Collect all results (or a large buffer) across all pages before sorting and limiting by timestamp, ensuring newer jobs are included in the "recent tests" result set.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/s3tests.go` around lines 169 - 170, The pagination
logic in the ListObjectsV2Pages iteration stops early when resultsByJob reaches
maxResults, but since S3 objects are ordered lexicographically by key (with
dates embedded in the path), this prevents collection of newer jobs that appear
later in the lexicographical order. Instead of returning early when
len(resultsByJob) < maxResults at line 170, remove this early termination and
allow the loop to iterate through all pages to collect all results. Then apply
the sorting (lines 182-185) and result limiting logic after all pages have been
processed, ensuring that newer jobs are properly included in the final filtered
result set.
pkg/dashboard/collectors/usage.go-92-95 (1)
92-95: ⚠️ Potential issue | 🟠 Major

Add pagination loop to fetch all clusters.

collectUsageForEnv uses .Size(1000) but omits .Page() and pagination logic. The function fetches only the first page, undercounting clusters in larger environments. Implement pagination consistent with existing patterns in pkg/common/providers/ocmprovider/cluster.go (lines 39–75, 700–730):
Example pattern
page := 1
for {
    resp, err := provider.GetConnection().ClustersMgmt().V1().Clusters().List().
        Search(query).
        Size(1000).
        Page(page).
        Send()
    if err != nil {
        return nil, fmt.Errorf("failed to query clusters: %w", err)
    }

    resp.Items().Each(func(cluster *v1.Cluster) bool {
        // process cluster
        return true
    })

    if resp.Size() < 1000 {
        break
    }
    page++
}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/usage.go` around lines 92 - 95, The
collectUsageForEnv function fetches clusters with a fixed size of 1000 but lacks
pagination logic, causing it to retrieve only the first page of results.
Implement a pagination loop around the existing
provider.GetConnection().ClustersMgmt().V1().Clusters().List() call that starts
at page 1, adds .Page(page) to each request, increments the page counter after
processing each response, and terminates when the response size is less than
1000 (indicating the final page). Process all clusters within the loop using
resp.Items().Each() or similar iteration, following the pagination pattern
already established in pkg/common/providers/ocmprovider/cluster.go.
dashboard.Dockerfile-10-10 (1)
10-10: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid copying the entire build context into the image build stage.

COPY . . can unintentionally include sensitive or irrelevant files in build context/layers. Restrict copied paths and/or enforce a strict .dockerignore.

As per coding guidelines: **/{Dockerfile,Containerfile}*: "COPY specific files, not entire context".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dashboard.Dockerfile` at line 10, The COPY . . command on line 10 is copying
the entire build context into the Docker image, which can include sensitive or
irrelevant files unnecessarily. Replace this broad copy instruction with
specific file and directory paths that are actually required for the build (such
as package.json, source code directories, configuration files, etc.). If needed,
also create or update a .dockerignore file to explicitly exclude sensitive files
and directories that should not be included in the build context.
Source: Coding guidelines
configs/local/dashboard-build/Dockerfile-1-5 (1)
1-5: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Set a non-root runtime user in this image.

The container currently runs as root by default.
Suggested fix
 FROM registry.access.redhat.com/ubi9/ubi-minimal:latest
 WORKDIR /
 COPY osde2e /osde2e
+USER 1001
 ENV PATH="${PATH}:/"
 ENTRYPOINT ["/osde2e"]
As per coding guidelines: **/{Dockerfile,Containerfile}*: "USER non-root; never run as root".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@configs/local/dashboard-build/Dockerfile` around lines 1 - 5, The Dockerfile
in configs/local/dashboard-build/Dockerfile runs the container as root by
default, which violates security guidelines. Add a USER instruction before the
ENTRYPOINT directive to specify a non-root user for runtime execution. This
ensures the container adheres to the coding guideline that requires USER
non-root and prevents running containers with root privileges.
Sources: Coding guidelines, Linters/SAST tools
dashboard.Dockerfile-14-20 (1)
14-20: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Run the final image as a non-root user.

The runtime stage executes as root by default, which weakens container isolation.
Suggested fix
 FROM registry.access.redhat.com/ubi9/ubi-minimal:latest
 WORKDIR /
 
 COPY --from=builder /opt/app-root/src/github.com/openshift/osde2e/out/osde2e .
+USER 1001
 
 ENV PATH="${PATH}:/"
 ENTRYPOINT ["/osde2e"]
As per coding guidelines: **/{Dockerfile,Containerfile}*: "USER non-root; never run as root".
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@dashboard.Dockerfile` around lines 14 - 20, The final runtime stage in the
Dockerfile lacks a USER directive, causing the container to execute as root,
which violates security guidelines. Add a USER directive after the COPY command
and before the ENTRYPOINT in the final stage to specify a non-root user account
for running the osde2e application. This ensures the container runs with
restricted permissions rather than as root.
Sources: Coding guidelines, Linters/SAST tools
scripts/dashboard/verify-build.sh-72-76 (1)
72-76: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Formatting gate uses gofmt instead of required gofumpt.

The verification step should enforce gofumpt to match repo policy; otherwise style checks can pass while violating project standards.
Suggested fix
-# Check for syntax errors (gofmt)
-echo "5. Checking Go syntax..."
+# Check formatting (gofumpt)
+echo "5. Checking Go formatting..."
 DASHBOARD_FILES=$(find pkg/dashboard cmd/osde2e/dashboard -name "*.go" 2>/dev/null)
 if [ -n "$DASHBOARD_FILES" ]; then
-    gofmt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt
+    gofumpt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt
As per coding guidelines, “Use gofumpt for code formatting, not gofmt.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/dashboard/verify-build.sh` around lines 72 - 76, The formatting
verification step in the Go syntax check is using gofmt instead of the required
gofumpt tool. Replace the gofmt command in the check with gofumpt to enforce the
project's code formatting standards. Specifically, change the gofmt -l
invocation to use gofumpt -l against the DASHBOARD_FILES variable to ensure the
verification gate correctly validates formatting according to project policy.
Source: Coding guidelines
scripts/dashboard/verify-build.sh-54-60 (1)
54-60: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Template verification list is out of sync with the dashboard templates.

The script checks dashboard.html, reserves.html, and tests.html, but the current dashboard layer uses operators.html, pipeline-detail.html, and analysis.html. This will fail verification on a valid tree.
Suggested fix
 TEMPLATES=(
     "pkg/dashboard/server/templates/base.html"
-    "pkg/dashboard/server/templates/dashboard.html"
-    "pkg/dashboard/server/templates/reserves.html"
+    "pkg/dashboard/server/templates/operators.html"
+    "pkg/dashboard/server/templates/pipeline-detail.html"
     "pkg/dashboard/server/templates/usage.html"
-    "pkg/dashboard/server/templates/tests.html"
+    "pkg/dashboard/server/templates/analysis.html"
 )
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/dashboard/verify-build.sh` around lines 54 - 60, The TEMPLATES array
in the verify-build.sh script contains outdated template file names that don't
match the current dashboard implementation. Update the TEMPLATES array to
replace the old template names (dashboard.html, reserves.html, tests.html) with
the actual template files currently used by the dashboard (operators.html,
pipeline-detail.html, analysis.html), ensuring the verification script checks
for the correct template files in pkg/dashboard/server/templates/.

🟡 Minor comments (12)

pkg/dashboard/README.md-19-32 (1)

19-32: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the dashboard tree and status checklist.

pkg/dashboard/server/templates/ is the actual location, but the tree shows pkg/dashboard/templates/. The same section still marks templates as TODO even though this PR adds them, so the documentation is stale.

Also applies to: 212-233
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/README.md` around lines 19 - 32, Update the directory tree in
the README.md to reflect the correct location of templates, moving the templates
entry from under pkg/dashboard/ directly to under pkg/dashboard/server/, and
remove the TODO marker from the templates section since templates are now
included in this PR rather than being a planned future addition.

pkg/dashboard/README.md-61-63 (1)

61-63: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use prod, not production, in the examples.

The dashboard config accepts stage, int, prod, or all; production is not a valid value in either the CLI or API example.

Also applies to: 120-124

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/README.md` around lines 61 - 63, The README.md file contains
examples using `production` as an environment value, but the dashboard config
only accepts `stage`, `int`, `prod`, or `all`. Replace all instances of
`production` with `prod` in the environment examples throughout the document,
including the filter by environment section starting at line 61 and any other
occurrences mentioned in the range 120-124. Ensure all CLI and API examples
consistently use the valid `prod` value instead of the invalid `production`
value.

pkg/dashboard/README.md-151-155 (1)

151-155: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Document the live test-results endpoint.

/api/v1/tests is not registered by the server; the dashboard exposes /api/v1/deliverables for this data.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/README.md` around lines 151 - 155, The Test Results section in
the README.md file documents an incorrect endpoint. The `/api/v1/tests` endpoint
is not registered by the server; the dashboard actually exposes
`/api/v1/deliverables` for test results data. Update the Test Results section to
replace the documented endpoint from `/api/v1/tests` to `/api/v1/deliverables`
and adjust the related endpoint path `/api/v1/tests/{job-id}` to
`/api/v1/deliverables/{job-id}` to accurately reflect the actual API endpoints
provided by the dashboard.

pkg/dashboard/store/store.go-216-225 (1)

216-225: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Silent JSON unmarshal failures may mask data corruption.

Lines 217 and 296 discard unmarshal errors. Corrupted failed_tests JSON will silently yield an empty slice, potentially hiding real failures.

Consider logging at debug level:

 		var failedTests []models.FailedTestCase
-		_ = json.Unmarshal([]byte(ftJSON), &failedTests)
+		if err := json.Unmarshal([]byte(ftJSON), &failedTests); err != nil {
+			log.Printf("Warning: failed_tests unmarshal for %s/%s: %v", name, env, err)
+		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/store/store.go` around lines 216 - 225, The json.Unmarshal call
for failedTests is silently discarding errors by assigning to the blank
identifier, which can hide data corruption issues. Instead of ignoring the
error, capture it and log it at debug level to aid in diagnostics when the JSON
parsing fails. Apply the same approach to the failedTests unmarshal as is
already being done for the llm unmarshal error handling.

pkg/dashboard/collectors/sqs.go-100-103 (1)

100-103: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

DeleteMessage error silently discarded.

Failed deletions leave messages on the queue, causing reprocessing. Log the error at minimum.

-		_, _ = c.sqsClient.DeleteMessage(&sqs.DeleteMessageInput{
+		if _, err := c.sqsClient.DeleteMessage(&sqs.DeleteMessageInput{
 			QueueUrl:      aws.String(c.queueURL),
 			ReceiptHandle: msg.ReceiptHandle,
-		})
+		}); err != nil {
+			log.Printf("SQS consumer: delete error: %v", err)
+		}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/sqs.go` around lines 100 - 103, The DeleteMessage
call on sqsClient in the SQS collector is discarding the error return value
using blank identifiers, which masks failures that cause messages to remain on
the queue. Capture the error return value from the DeleteMessage method call
instead of discarding it with underscores, and log any error that occurs using
an appropriate logging mechanism to ensure visibility into deletion failures.

pkg/dashboard/collectors/sqs.go-245-246 (1)

245-246: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Response body not closed via defer.

If io.ReadAll panics, the HTTP connection leaks. Use defer immediately after error check.

 		if err != nil {
 			continue
 		}
+		defer out.Body.Close()
 		data, err := io.ReadAll(out.Body)
-		out.Body.Close()

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/sqs.go` around lines 245 - 246, The response body
out.Body is closed directly after io.ReadAll without using defer, which means if
io.ReadAll panics, the connection will leak. Move the out.Body.Close() call to a
deferred statement immediately after the error check for obtaining the response
object, rather than calling it directly after the io.ReadAll call. This ensures
the body is closed even if io.ReadAll panics.

pkg/dashboard/server/server.go-397-399 (1)

397-399: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Use cfg.MaxTestResults instead of a hardcoded 20.

CollectRecentTests(20) bypasses runtime configuration and makes MaxTestResults ineffective.

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/server/server.go` around lines 397 - 399, In the
testResultCollector block where CollectRecentTests is called, replace the
hardcoded integer value 20 with cfg.MaxTestResults to ensure the function
respects the runtime configuration. This allows the maximum number of test
results to be determined by the configuration settings instead of being
hardcoded in the function call.

cmd/osde2e/dashboard/cmd.go-51-53 (1)

51-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Normalize the environment value before persisting it.

Line 52 advertises integration, while dashboard config contracts use int. This can route an unsupported environment string downstream.

Suggested fix

  if cmd.PersistentFlags().Changed("environment") {
-		viper.Set(config.Environment, args.environment)
-		viper.Set(ocmprovider.Env, args.environment)
+		env := args.environment
+		if env == "integration" {
+			env = "int"
+		}
+		viper.Set(config.Environment, env)
+		viper.Set(ocmprovider.Env, env)
  }

Also applies to: 96-99

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/osde2e/dashboard/cmd.go` around lines 51 - 53, The environment flag at
the StringVarP definition allows users to specify "integration" as documented in
the help text, but the dashboard configuration expects "int" as the normalized
value. Add a normalization function that converts the user-provided environment
value to match the expected dashboard config format, converting "integration" to
"int" before the args.environment value is persisted or used downstream. Apply
this same normalization logic to all other occurrences of environment flag
handling as mentioned in the code review comment.

scripts/dashboard/deploy.sh-19-20 (1)

19-20: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Validate NAMESPACE immediately after parsing.

If parsing fails or format changes, deployment fails later with less clear diagnostics.

Suggested fix

 NAMESPACE=$(grep "^namespace:" "${OVERLAY_DIR}/kustomization.yaml" | awk '{print $2}')
+if [[ -z "${NAMESPACE}" ]]; then
+  echo "Error: namespace not found in ${OVERLAY_DIR}/kustomization.yaml"
+  exit 1
+fi
 APP="delivery-dashboard"

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/dashboard/deploy.sh` around lines 19 - 20, The NAMESPACE variable is
parsed from the kustomization.yaml file but is not validated immediately after
assignment, which can lead to unclear errors later if parsing fails or the file
format changes. Add a validation check immediately after the NAMESPACE variable
assignment (the grep and awk command line) to ensure the parsed value is not
empty. If validation fails, exit the script with a clear error message
indicating that the namespace could not be parsed from the kustomization.yaml
file, which will provide better diagnostics during deployment troubleshooting.

Makefile-40-41 (1)

40-41: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Do not enable --backfill by default in the dashboard target.

This target will trigger expensive historical scans every run. Make backfill opt-in.

Suggested fix

 dashboard: build
-	"$(OUT_DIR)/osde2e" dashboard --db="$(DIR)dashboard.db" --backfill --port=8080
+	"$(OUT_DIR)/osde2e" dashboard --db="$(DIR)dashboard.db" --port=8080 $(DASHBOARD_BACKFILL)

# Optional:
# make dashboard DASHBOARD_BACKFILL=--backfill

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@Makefile` around lines 40 - 41, Remove the hardcoded --backfill flag from the
dashboard target command. Instead, make it optional by introducing a make
variable (such as DASHBOARD_BACKFILL) that can be set when invoking the target.
Modify the command in the dashboard target to conditionally append this variable
so that backfill is only enabled when explicitly passed by the user, avoiding
expensive historical scans on every default invocation.

scripts/dashboard/verify-build.sh-76-76 (1)

76-76: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Quote file list expansion in formatter invocation.

gofumpt/gofmt -l $DASHBOARD_FILES is vulnerable to word-splitting/globbing. Use a null-delimited find + xargs -0 pipeline for safe execution.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/dashboard/verify-build.sh` at line 76, The gofmt invocation with the
unquoted `$DASHBOARD_FILES` variable is vulnerable to word-splitting and
globbing attacks. Replace the current `gofmt -l $DASHBOARD_FILES >
/tmp/dashboard-fmt-check.txt` command with a safe null-delimited pipeline using
find and xargs, such as piping the output of a find command with -print0 to
xargs -0 gofmt -l, to safely handle file names with spaces or special characters
and write the results to the same output file.
Source: Linters/SAST tools

scripts/dashboard/verify-build.sh-4-4 (1)

4-4: ⚠️ Potential issue | 🟡 Minor

Use gofumpt instead of gofmt and add pipefail for pipeline safety.

Line 76 uses gofmt but osde2e requires gofumpt per coding guidelines. Additionally, lines 90 and 100 mask go build failures via pipelines without set -o pipefail—the tee command succeeds regardless of go build status, allowing errors to slip through.
Suggested fixes
-set -e
+set -euo pipefail
-    gofmt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt
+    gofumpt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/dashboard/verify-build.sh` at line 4, Add the bash option `set -o
pipefail` near the beginning of the script alongside the existing `set -e`
command to ensure pipeline failures are properly caught. Replace the `gofmt`
command on line 76 with `gofumpt` to comply with osde2e coding guidelines. This
combination ensures that any failures in the `go build` pipelines at lines 90
and 100 will not be masked by the `tee` command and will properly cause the
script to exit with an error status.

🧹 Nitpick comments (7)

pkg/dashboard/collectors/sqs.go (1)
283-283: ⚖️ Poor tradeoff

Backfill lacks context for cancellation.

Large S3 scans can take minutes. Without a context.Context, there's no way to abort on shutdown.
-func (c *SQSConsumer) Backfill() error {
+func (c *SQSConsumer) Backfill(ctx context.Context) error {
Caller in cmd/osde2e/dashboard/cmd.go already has ctx available.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/sqs.go` at line 283, The Backfill method in the
SQSConsumer struct does not accept a context.Context parameter, which prevents
graceful cancellation of long-running S3 scans during shutdown. Add
context.Context as the first parameter to the Backfill() method signature, then
ensure this context is passed down to all underlying operations that support
cancellation (such as S3 API calls and any loops that perform the scanning).
Update the caller in cmd/osde2e/dashboard/cmd.go to pass the available ctx when
invoking Backfill().
pkg/dashboard/store/store.go (1)
478-484: ⚡ Quick win

Bubble sort is O(n²); consider sort.Slice.

For large failure group sets, this becomes a bottleneck. Standard library sort is O(n log n).
+import "sort"
+
-	for i := 0; i < len(result)-1; i++ {
-		for j := i + 1; j < len(result); j++ {
-			if len(result[j].Entries) > len(result[i].Entries) {
-				result[i], result[j] = result[j], result[i]
-			}
-		}
-	}
+	sort.Slice(result, func(i, j int) bool {
+		return len(result[i].Entries) > len(result[j].Entries)
+	})
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/store/store.go` around lines 478 - 484, Replace the nested for
loop bubble sort implementation with sort.Slice from the standard library.
Remove the two nested loops that iterate through the result slice and perform
the swap operation based on the Entries length comparison. Instead, use
sort.Slice on the result slice with a comparison function that returns true when
result[i].Entries should be ordered before result[j].Entries based on their
lengths, allowing the sort package to efficiently handle the ordering in O(n log
n) time instead of the current O(n²) bubble sort approach.
pkg/dashboard/collectors/operators.go (1)
6-6: ⚡ Quick win

Replace log.Printf with structured klog/logr.

Please align collector logs with project logging requirements and include structured fields (operator, env, key, error).

As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".

Also applies to: 173-173, 196-197, 281-281, 463-464
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/operators.go` at line 6, Replace the "log" package
import with the appropriate klog/logr imports at the top of the file. Then
locate and replace all log.Printf calls throughout the collectors/operators.go
file (including those referenced at lines 173, 196-197, 281, and 463-464) with
structured logging calls using logr/klog that include the required structured
fields such as operator, env, key, and error for better observability and
alignment with project logging standards.
Source: Coding guidelines
pkg/dashboard/collectors/reserves.go (1)
5-6: ⚡ Quick win

Replace log.Printf with structured klog/logr.

This collector uses unstructured standard logging despite project policy. Please switch to structured klog/logr logging with environment and query context fields.

As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".

Also applies to: 25-26, 51-54, 63-63, 105-108
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/reserves.go` around lines 5 - 6, Remove the
unstructured "log" import from the imports section and replace it with the
appropriate klog/logr imports for structured logging. Then replace all
log.Printf calls throughout the file (at the locations mentioned: 25-26, 51-54,
63, and 105-108) with structured logging calls using klog or logr, ensuring each
logging statement includes relevant context fields such as environment and query
parameters to support better debugging and monitoring as per project guidelines.
Source: Coding guidelines
pkg/dashboard/collectors/s3tests.go (1)
7-7: ⚡ Quick win

Switch to structured klog/logr logging.

The collector currently uses log.Printf; please align with structured project logging.

As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".

Also applies to: 161-162, 192-192, 283-284
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/s3tests.go` at line 7, Replace the standard "log"
package import with the appropriate klog/logr structured logging import at the
top of the file. Then find and replace all instances of log.Printf calls (which
also occur at lines 161-162, 192, and 283-284 as noted) with structured logging
calls using the klog/logr API, ensuring that logging messages follow the
structured logging pattern with appropriate key-value pairs instead of formatted
string output.
Source: Coding guidelines
pkg/dashboard/collectors/usage.go (1)
5-5: ⚡ Quick win

Use structured klog/logr instead of log.Printf.

This collector should follow project logging conventions for consistency and operational parsing.

As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".

Also applies to: 34-35, 75-78, 84-84
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/dashboard/collectors/usage.go` at line 5, Replace the standard library
`log` import at the top of the usage.go file with the appropriate `klog` or
`logr` imports that the project uses for structured logging. Then replace all
`log.Printf` calls throughout the file (at the locations mentioned: lines 34-35,
75-78, and 84) with corresponding structured logging calls using the project's
structured logging library, ensuring each log statement uses the proper
structured logging methods with key-value pairs or message formatting
appropriate for the chosen logging framework.
Source: Coding guidelines
cmd/osde2e/dashboard/cmd.go (1)
78-79: ⚡ Quick win

Use klog/logr structured logging instead of the standard log package.

This file currently emits unstructured logs via log.Printf/log.Println. Please align with project logging standards.

As per coding guidelines: **/*.go: "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".

Also applies to: 85-86, 115-132, 137-138, 149-150, 158-164, 170-170, 179-180, 185-190
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@cmd/osde2e/dashboard/cmd.go` around lines 78 - 79, Replace all unstructured
logging calls using the standard log package (log.Println and log.Printf)
throughout the file with structured logging using klog/logr. Import the
appropriate klog/logr logger and use its Info, Error, or Warn methods with
structured key-value pairs instead of formatted strings. This includes the
log.Println call at line 78 and all other instances mentioned in the comment
(lines 85-86, 115-132, 137-138, 149-150, 158-164, 170, 179-180, 185-190) to
ensure consistent structured logging aligned with project standards.
Source: Coding guidelines

- Add /dashboard/junit handler — fetches JUnit XML from S3 via AWS creds and renders as HTML (kitproj/junit2html style) using existing go-junit dependency - Add /dashboard/s3 proxy handler — streams S3 objects server-side, replacing presigned URLs that expired after 7 days - Replace all presigned URL generation with plain proxy URLs; migrate existing DB rows on open - Rename /dashboard/deliverables → /dashboard/pipelines across routes, templates, and deploy script - Fix superfluous WriteHeader: render templates into buffer before writing response - Fix .Operators → .Deliverables stale reference in deliverables.html (empty table bug) - Fix .OperatorName → .Name stale references in analysis.html and pipeline-detail.html - Fix DB migration: rename operator_name → name column for existing databases - Remove InitAWSViper from dashboard cmd — it was clearing AWS env vars set in shell - Use standard AWS SDK credential chain (env vars → ~/.aws) for DeliverableCollector - Truncate DB before backfill so deleted S3 objects don't persist as stale rows - Remove fail-list panels from popups — show AI summary + links only - Rename "Version" → "Tag" and "Job ID" → "Job suffix" in popup meta fields - Normalise failure grouping key (lowercase, strip quotes) to merge duplicate groups - Add imagePullPolicy: Always and rollout restart to deploy script - Add add() template func for junit-report.html totals aggregation Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

- Version tag links to github.com/openshift/<repo>/commit/<sha> - Add githubRepo template func to strip -e2e-master/-e2e job suffixes - Revert cookie secret check from deploy script (oauth-proxy removed) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>

ritmun and others added 7 commits June 18, 2026 08:25

openshift-ci Bot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jun 18, 2026

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2026

ritmun and others added 2 commits June 18, 2026 11:46

ritmun force-pushed the feat/delivery-dashboard branch from dfa0b1d to 7ce601c Compare June 18, 2026 19:06

ritmun and others added 3 commits June 19, 2026 12:09

coderabbitai Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread pkg/dashboard/server/templates/deliverables.html

ritmun and others added 2 commits June 20, 2026 16:37

ritmun force-pushed the feat/delivery-dashboard branch from be32a7e to e1d8ec5 Compare June 22, 2026 14:45

ritmun closed this Jun 22, 2026

ritmun reopened this Jun 22, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Feat/delivery dashboard#3261

Feat/delivery dashboard#3261
ritmun wants to merge 14 commits into
openshift:mainfrom
ritmun:feat/delivery-dashboard

ritmun commented Jun 18, 2026 •

edited

Loading

Uh oh!

openshift-merge-bot Bot commented Jun 18, 2026

Uh oh!

openshift-ci Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading

Review skipped

Uh oh!

openshift-ci Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ritmun commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary by CodeRabbit

Uh oh!

openshift-merge-bot Bot commented Jun 18, 2026

Uh oh!

openshift-ci Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

ritmun commented Jun 18, 2026 •

edited

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading