Feat/delivery dashboard#3261
Conversation
- Add dashboard command (pkg/dashboard, cmd/osde2e/dashboard) - SQLite store populated via SQS S3 event notifications - Backfill support for historical S3 data - Pipelines view with stage/int status, version, AI analysis - Per-operator history page with failure details - dashboard.Dockerfile for cluster builds using public images - Deployment script at scripts/dashboard/deploy.sh 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Mirror osde2e.Dockerfile layer caching: COPY go.* + go mod download as separate layer so deps are cached between builds - Update labels to reflect delivery-dashboard purpose 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Rename Usage → Clusters tab, remove Dashboard/Reserves tabs - Clusters page shows all osde2e clusters per env (int/stage/prod) with cluster ID, state, availability, version, flavor, ad hoc image, created/expires datetimes in local tz - Collapsible env sections - Pipelines: rename Operator column to Component, reorder Int/Stage - Update subtitle text on Pipelines page - Tab title: Delivery Dashboard - Pipelines - Build binary to out/osde2e (consistent with Makefile OUT_DIR) - Redirect / and /dashboard to /dashboard/usage 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Rename Operator/Component → Deliverable across all UI and Go code - Rename Clusters tab → Infra - Pipelines tab now first in nav, redirects / → /dashboard/deliverables - URL routes: /dashboard/operators → /dashboard/deliverables - API: /api/v1/operators → /api/v1/deliverables - Internal: operatorCollector → deliverableCollector - handleOperatorsPage → handleDeliverablesPage - Pipeline detail: chronological per-version int→stage visual view - VersionPipeline model groups runs by version - Store.GetHistory populates Versions field 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Add Analysis tab (/dashboard/analysis) to nav - Group failed runs by first sentence of LLM root cause (fallback: first line of failure message) - FailureGroup model with FailureMatch, RootCause, Recommendations, Entries - GetFailureGroups in store.go — groups sorted by match count desc - analysis.html: truncated group header with hover tooltip, AI analysis context, deliverable links to pipeline detail, env chips, timestamps - Rename: group.TestName → group.FailureMatch - Badge shows "N matching" instead of "N failures" 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- Replace PVC with emptyDir for SQLite DB — avoids ReadWriteOnce multi-attach conflict during rolling updates - Add RollingUpdate strategy (maxSurge=1, maxUnavailable=0) so new pod is healthy before old pod terminates - --backfill on startup repopulates DB from S3 in ~5s - Remove PVC creation from deploy.sh 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- .claude/commands/dashboard-dev.md: thin wrapper that loads the skill - .claude/skills/dashboard-dev/SKILL.md: full onboarding guide covering fork setup, local dev, OpenShift deploy, common tasks, architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
|
There are test jobs defined for this repository which are not configured to run automatically. Comment |
|
Skipping CI for Draft Pull Request. |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ritmun The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
- Remove root Dockerfile (superseded by dashboard.Dockerfile) - Remove pkg/dashboard/*.md scaffolding files (BUILD_STATUS, COMPLETE, IMPLEMENTATION_SUMMARY, PLAN, TEMPLATE_FIX) - Remove dashboard.html and reserves.html templates (pages removed from nav) - Remove handleDashboard and handleReservesPage handlers + routes (/dashboard and /dashboard/reserves) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Dead code removed: - tests.html template (no handler renders it) - models.TestSummary (never used) - models.TestResult.SuccessRate() (never called) - store.OperatorNames() (never called) - collectors.TestResultsCollector.GetTestResultByJobID() (never called) Duplication eliminated via new collectors/helpers.go: - suiteStatus(suite) replaces 3x identical passed/failed/error logic - parseTimestamp(ts) replaces 2x identical time.Parse fallback chains - presignURL(client, bucket, key) extracted (available for future use) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
dfa0b1d to
7ce601c
Compare
- deploy.sh: replace in-cluster BuildConfig with local docker build/push to quay.io/rmundhe_oc/delivery-dashboard; use kustomize from adjacent hp-delivery-apps repo instead of inline oc apply heredocs - SKILL.md: fix secret name ocm-token → ocm-credentials (OCM_CLIENT_ID/SECRET); update deploy steps to reflect new flow - dashboard.Dockerfile: no functional change (quay expiry moved to deploy-time flag) 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
- deploy.sh: simplify to 56 lines — fixed build context, secret pre-flight, straight kustomize apply (no SQS patching), DASHBOARD_QUAY_IMAGE required env - configs/local/dashboard-build/: fixed podman build context with Dockerfile (moved from dashboard.local.Dockerfile); binary gitignored there - Remove scripts/dashboard/run-local.sh → replaced by make dashboard - Makefile: add dashboard target (build + run locally with --backfill) - AGENTS.md: add pkg/dashboard + .claude/skills to architecture, tip for /dashboard-dev skill - SKILL.md: full rewrite — current deploy flow, make dashboard, ocm data in architecture 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
Files renamed: collectors/operators.go → collectors/deliverables.go templates/operators.html → templates/deliverables.html Symbols renamed throughout: OperatorStatusCollector → DeliverableCollector NewOperatorStatusCollector → NewDeliverableCollector CollectOperatorStatus → CollectDeliverables OperatorStatus → DeliverableStatus OperatorName → Name operator_name (JSON/DB) → name The dashboard delivers more than just operators — services, pipelines, etc. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
There was a problem hiding this comment.
Actionable comments posted: 1
Note
Due to the large number of review comments, Critical severity comments were prioritized as inline comments.
🟠 Major comments (26)
pkg/dashboard/collectors/sqs.go-122-128 (1)
122-128:⚠️ Potential issue | 🟠 Major | ⚡ Quick winPartial record failures cause premature message deletion.
If one S3 record fails in
processKey, the error is logged butprocessMessagereturnsnil, triggering message deletion. Subsequent records in the same SQS message are lost.Consider tracking failures and returning an error when any record fails:
+ var errs []error for _, rec := range event.Records { if err := c.processKey(rec.S3.Bucket.Name, rec.S3.Object.Key); err != nil { log.Printf("SQS consumer: skip %s: %v", rec.S3.Object.Key, err) + errs = append(errs, err) } } + if len(errs) > 0 { + return fmt.Errorf("%d record(s) failed", len(errs)) + } return nil🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/sqs.go` around lines 122 - 128, The processMessage function returns nil even when processKey fails for one or more records in the loop, causing the SQS message to be marked as successfully processed and deleted despite incomplete processing. Modify the loop that iterates through event.Records to track whether any record processing fails, then return an error from the function if any processKey call returns an error, rather than just logging and continuing with nil return. This ensures failed records cause the message to be retried instead of being lost.pkg/dashboard/server/templates/analysis.html-179-179 (1)
179-179:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd
rel="noopener noreferrer"to the external logs link.This prevents reverse-tabnabbing when opening third-party URLs in a new tab.
As per coding guidelines: “Web security (prodsec-skills)”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/templates/analysis.html` at line 179, The external logs link in the analysis.html template is missing the rel attribute which is required for security when opening third-party URLs in a new tab. Add rel="noopener noreferrer" to the anchor tag with href="{{.LogURL}}" and target="_blank" to prevent reverse-tabnabbing attacks and protect user security when the link is clicked.Source: Coding guidelines
pkg/dashboard/server/server.go-350-360 (1)
350-360:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHealth degradation should reflect configured dependencies, not just collector presence.
The endpoint reports
"degraded"whenever OCM/S3 collectors are nil, even when those integrations are intentionally disabled by config. That produces false-negative health for valid deployments.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/server.go` around lines 350 - 360, The handleHealth function in the Server type incorrectly marks the health status as degraded whenever OCM or S3 collectors are nil, without considering whether those integrations are intentionally disabled through configuration. Instead of directly checking if s.reserveCollector and s.testResultCollector are non-nil, modify the logic to check if these integrations are configured to be enabled and only mark them as not connected if they are configured but unavailable. This way, deployments that intentionally disable OCM or S3 will correctly report "ok" status instead of false-negative "degraded" status.pkg/dashboard/server/templates.go-6-47 (1)
6-47: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winReplace standard logger calls with structured
klog/logrlogging.Template load/render paths currently use
log.Printf; this should align with the repository logging standard for consistent structured output.As per coding guidelines: “Use logr/klog for logging, not fmt.Println” and “Prefer structured logging with logr/klog”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/templates.go` around lines 6 - 47, The renderTemplate function is using the standard log.Printf calls for error logging instead of structured logging with klog/logr, which is inconsistent with the repository's logging standards. Replace both log.Printf calls (one after template parsing failure and one after template execution failure) with structured logging calls using logr/klog to maintain consistency with the codebase's logging conventions.Source: Coding guidelines
pkg/dashboard/server/templates/analysis.html-160-161 (1)
160-161:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEscape deliverable names before placing them in path segments.
Using raw
{{.OperatorName}}in/dashboard/deliverables/...can break navigation for names containing reserved URL characters. Add a path-escape template helper and use it here.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/templates/analysis.html` around lines 160 - 161, The href attribute in the deliverable link is using raw {{.OperatorName}} directly in the URL path, which can break navigation if the name contains reserved URL characters like slashes, question marks, or ampersands. Create or use a path-escape template helper function and apply it to {{.OperatorName}} within the href attribute in the anchor tag at the specified location to properly encode the operator name for use as a URL path segment.pkg/dashboard/server/templates/operators.html-279-367 (1)
279-367:⚠️ Potential issue | 🟠 Major | ⚡ Quick winMove
<dialog>nodes outside<tbody>to keep valid table structure.
<tbody>may only contain<tr>children. Rendering dialogs inside the table body produces invalid markup and can trigger browser DOM corrections that break event/data coupling.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/templates/operators.html` around lines 279 - 367, The dialog elements with IDs d-{{$i}}-stage and d-{{$i}}-int are currently nested inside a tbody element, which creates invalid HTML markup since tbody should only contain tr children. Move both dialog blocks outside of the tbody element to maintain valid table structure and prevent browser DOM corrections that could break event handling and data relationships.pkg/dashboard/server/templates/pipeline-detail.html-269-270 (1)
269-270:⚠️ Potential issue | 🟠 Major | ⚡ Quick winHarden external links opened in new tabs.
Add
rel="noopener noreferrer"to eachtarget="_blank"anchor in the dialogs.As per coding guidelines: “Web security (prodsec-skills)”.
Also applies to: 311-312
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/templates/pipeline-detail.html` around lines 269 - 270, Add the security attribute rel="noopener noreferrer" to each anchor tag that opens in a new tab using target="_blank". In the anchor tags for $intRun.LogURL and $intRun.JUnitURL, insert rel="noopener noreferrer" between the href and target attributes. Apply the same fix to the similar anchor tags at lines 311-312 that also use target="_blank". This prevents potential security vulnerabilities when external links are opened in new tabs.Source: Coding guidelines
pkg/dashboard/server/server.go-3-17 (1)
3-17: 🛠️ Refactor suggestion | 🟠 Major | ⚡ Quick winUse
klog/logrinstead oflog.Printfacross this server package.This file consistently uses the standard logger (
log.Printf), which diverges from project logging requirements and weakens structured observability.As per coding guidelines: “Use logr/klog for logging, not fmt.Println” and “Prefer structured logging with logr/klog”.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/server.go` around lines 3 - 17, The import section includes the standard library "log" package, but the project requires structured logging via klog/logr per coding guidelines. Replace the "log" import with the appropriate klog/logr imports (such as "sigs.k8s.io/controller-runtime/pkg/log" or the project's preferred logr implementation), and then update all log.Printf calls throughout the server package to use the structured logger instead, following the project's logging patterns for consistency and improved observability.Source: Coding guidelines
pkg/dashboard/server/templates/operators.html-318-319 (1)
318-319:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd
rel="noopener noreferrer"to alltarget="_blank"links.Opening untrusted URLs in a new tab without
relexposes reverse-tabnabbing.Suggested fix
-{{if .LogURL}}<a href="{{.LogURL}}" target="_blank">Logs</a>{{end}} -{{if .JUnitURL}}<a href="{{.JUnitURL}}" target="_blank">JUnit XML</a>{{end}} +{{if .LogURL}}<a href="{{.LogURL}}" target="_blank" rel="noopener noreferrer">Logs</a>{{end}} +{{if .JUnitURL}}<a href="{{.JUnitURL}}" target="_blank" rel="noopener noreferrer">JUnit XML</a>{{end}}As per coding guidelines: “Web security (prodsec-skills)”.
Also applies to: 363-364
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/templates/operators.html` around lines 318 - 319, The links in operators.html that open in a new tab using target="_blank" (specifically the LogURL and JUnitURL links at lines 318-319 and the same pattern at lines 363-364) are missing the security attribute rel="noopener noreferrer" which prevents reverse-tabnabbing attacks. Add rel="noopener noreferrer" to both the LogURL and JUnitURL anchor tags to include the complete security attribute alongside the target="_blank" attribute.Source: Coding guidelines
pkg/dashboard/server/server.go-103-116 (1)
103-116:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAdd HTTP server timeouts to harden availability under slow-client conditions.
http.Serveris created withoutReadHeaderTimeout/ReadTimeout/WriteTimeout/IdleTimeout. This leaves the handler exposed to connection starvation patterns.Suggested fix
func (s *Server) Start(addr string, ctx context.Context) error { - srv := &http.Server{Addr: addr, Handler: s.mux} + srv := &http.Server{ + Addr: addr, + Handler: s.mux, + ReadHeaderTimeout: 10 * time.Second, + ReadTimeout: 30 * time.Second, + WriteTimeout: 30 * time.Second, + IdleTimeout: 60 * time.Second, + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/server.go` around lines 103 - 116, In the Start method of the Server struct, the http.Server initialization is missing timeout configurations that protect against slow-client connection starvation. Add ReadHeaderTimeout, ReadTimeout, WriteTimeout, and IdleTimeout fields to the http.Server struct creation (the line where srv := &http.Server{Addr: addr, Handler: s.mux} is defined) with appropriate timeout values to harden availability under slow-client conditions.pkg/dashboard/server/server.go-428-433 (1)
428-433:⚠️ Potential issue | 🟠 MajorSet
Content-Typeheader before callingWriteHeaderin error responses.In
sendAPIError,WriteHeaderis called beforesendJSONsets theContent-Typeheader. In Go HTTP semantics, onceWriteHeaderis called, any subsequentHeader().Set()calls are ignored. Error responses will be sent without the correctContent-Type: application/jsonheader, potentially breaking clients expecting JSON.Suggested fix
func (s *Server) sendAPIError(w http.ResponseWriter, message string, statusCode int) { - w.WriteHeader(statusCode) - s.sendJSON(w, models.APIResponse{ + w.Header().Set("Content-Type", "application/json") + w.WriteHeader(statusCode) + _ = json.NewEncoder(w).Encode(models.APIResponse{ Success: false, Error: message, }) }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/server.go` around lines 428 - 433, In the sendAPIError method of the Server struct, the WriteHeader call is being made before the Content-Type header is set by sendJSON, which means Go will ignore any subsequent header modifications. To fix this, explicitly set the Content-Type header to application/json on the response writer before calling w.WriteHeader(statusCode), ensuring the header is properly included in the error response sent to clients.pkg/dashboard/config/config.go-10-35 (1)
10-35: 🛠️ Refactor suggestion | 🟠 Major | 🏗️ Heavy liftConsolidate dashboard config keys into the shared config contract.
Defining a separate key registry here diverges from the repository’s configuration contract and risks precedence/documentation drift. Please move/alias these keys through
pkg/common/config/config.goand reference them from this package.Based on learnings, "Config changes? Edit config.go only". As per coding guidelines, "Primary file:
pkg/common/config/config.go- All configuration options (START HERE)".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/config/config.go` around lines 10 - 35, The dashboard configuration constants (Port, Environment, RefreshInterval, ExpirationWarningThreshold, MaxTestResults, LookbackDays, SQSQueueURL, DBPath) are currently defined in pkg/dashboard/config/config.go but should be centralized in the shared configuration contract. Move these constant definitions from the dashboard config file to pkg/common/config/config.go where all configuration options are managed, then import and reference them from pkg/dashboard/config/config.go to avoid duplication and maintain a single source of truth for configuration keys across the repository.Sources: Coding guidelines, Learnings
pkg/dashboard/collectors/operators.go-124-126 (1)
124-126:⚠️ Potential issue | 🟠 Major | ⚡ Quick winVersion is dropped from status grouping, causing record overwrite.
CollectOperatorStatusgroups/indexes by operator name (+env) but not version. Different versions can overwrite each other, returning incorrect operator status.Suggested fix
- type groupKey struct{ name, env string } + type groupKey struct{ name, version, env string } - name, _, env := parseComponentPath(component) - gk := groupKey{name, env} + name, version, env := parseComponentPath(component) + gk := groupKey{name: name, version: version, env: env} - indexKey := r.name + indexKey := fmt.Sprintf("%s|%s", r.name, r.version)Also applies to: 152-153, 237-243
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/operators.go` around lines 124 - 126, The groupKey struct used in CollectOperatorStatus only includes name and env fields, but omits version, which causes different operator versions to overwrite each other in the newestByGroup map. Add a version field to the groupKey struct definition and update all places where groupKey instances are created (including the locations at lines 152-153 and 237-243) to include the version information when grouping operators, ensuring each unique operator-version-env combination is treated as a separate entry.pkg/dashboard/collectors/operators.go-207-208 (1)
207-208:⚠️ Potential issue | 🟠 Major | ⚡ Quick winEnvironment fallback reads the wrong S3 key.
fetchMetaFromLogexpects the full component path segment, but callers pass operator name. This builds non-existent log paths and keeps env asunknown.Suggested fix
-func (c *OperatorStatusCollector) fetchMetaFromLog(name, date, jobID string) (env, version string) { - logKey := fmt.Sprintf("test-results/%s/%s/%s/test_output.log", name, date, jobID) +func (c *OperatorStatusCollector) fetchMetaFromLog(component, date, jobID string) (env, version string) { + logKey := fmt.Sprintf("test-results/%s/%s/%s/test_output.log", component, date, jobID)- if detected := c.fetchEnvFromLog(gk.name, parts[2], parts[3]); detected != "" { + if detected := c.fetchEnvFromLog(cand.component, parts[2], parts[3]); detected != "" {- if detected := c.fetchEnvFromLog(operatorName, rk.dateStr, rk.jobID); detected != "" { + if detected := c.fetchEnvFromLog(rk.component, rk.dateStr, rk.jobID); detected != "" {Also applies to: 292-294, 469-470
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/operators.go` around lines 207 - 208, The fetchEnvFromLog method expects the full component path segment as its first parameter, but the current calls (in the block around fetchEnvFromLog at line 207-208, and also in the calls mentioned at lines 292-294 and 469-470) are passing only the operator name (gk.name) instead. Update all three fetchEnvFromLog call sites to pass the complete component path segment rather than just gk.name to ensure the correct log paths are constructed and the environment value is properly detected instead of remaining as unknown.pkg/dashboard/collectors/helpers.go-21-30 (1)
21-30:⚠️ Potential issue | 🟠 Major | ⚡ Quick winDo not fallback to
time.Now()for parse failures.Using current time on parse failure misorders historical runs as newest. Return zero time and let callers apply an explicit fallback source.
Suggested fix
func parseTimestamp(ts string) time.Time { if t, err := time.Parse("2006-01-02T15:04:05", ts); err == nil { return t } if t, err := time.Parse(time.RFC3339, ts); err == nil { return t } - return time.Now() + return time.Time{} }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/helpers.go` around lines 21 - 30, The parseTimestamp function currently returns time.Now() when both timestamp parsing attempts fail, which causes historical runs to be misordered by appearing as the newest. Replace the final return statement (return time.Now()) with return time.Time{} to return the zero time value instead, allowing callers to apply their own explicit fallback logic for handling missing or invalid timestamps.pkg/dashboard/models/types.go-21-23 (1)
21-23:⚠️ Potential issue | 🟠 Major | ⚡ Quick winGuard zero/expired timestamps in
IsExpiringSoon.Current logic marks zero (and already expired) timestamps as “expiring soon”, which inflates expiry metrics.
Suggested fix
func (c *ClusterReserve) IsExpiringSoon(threshold time.Duration) bool { - return time.Until(c.ExpiresAt) < threshold + if c.ExpiresAt.IsZero() { + return false + } + remaining := time.Until(c.ExpiresAt) + return remaining >= 0 && remaining < threshold }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/models/types.go` around lines 21 - 23, The IsExpiringSoon method on the ClusterReserve type currently returns true for zero timestamps and already-expired timestamps because negative durations from time.Until() are less than any positive threshold. Add guards to check if c.ExpiresAt is the zero value or is in the past (already expired) before evaluating whether it is expiring soon based on the threshold. Return false if the timestamp is zero or already expired.pkg/dashboard/config/config.go-82-88 (1)
82-88:⚠️ Potential issue | 🟠 Major | ⚡ Quick winNormalize environment aliases in
OCMEnvironments.
integrationis documented but not mapped toint. Returning raw input can produce invalid provider env names and silent skips.Suggested fix
+import "strings" + func (c *Config) OCMEnvironments() []string { - switch c.Environment { + env := strings.ToLower(strings.TrimSpace(c.Environment)) + switch env { case "all", "": return []string{"stage", "int", "prod"} + case "integration": + return []string{"int"} default: - return []string{c.Environment} + return []string{env} } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/config/config.go` around lines 82 - 88, The OCMEnvironments method in the Config type does not normalize environment aliases before returning them, causing documented aliases like "integration" to be passed through as-is instead of being mapped to their canonical forms like "int". This results in invalid provider environment names being returned silently. Add environment alias normalization logic to the OCMEnvironments method that maps aliases such as "integration" to "int" before the switch statement processes the environment value, ensuring that all documented aliases are properly converted to their expected canonical forms.pkg/dashboard/collectors/reserves.go-45-48 (1)
45-48:⚠️ Potential issue | 🟠 MajorAdd pagination to OCM cluster list queries to fetch all results.
Both methods set
Size()but omit page iteration, returning only the first page. In OCM environments with reserves/clusters exceeding the page size, data will be incomplete. Use the pagination pattern frompkg/common/providers/ocmprovider/cluster.go(loop with.Page(page)and increment page counter). Also applies topkg/dashboard/collectors/usage.go.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/reserves.go` around lines 45 - 48, The cluster list query in the GetConnection().ClustersMgmt().V1().Clusters().List() call sets a Size but lacks pagination logic, causing it to return only the first page of results. Implement pagination by wrapping the Send() call in a loop that iterates through pages using .Page(page) with an incrementing page counter, following the pagination pattern from pkg/common/providers/ocmprovider/cluster.go. Apply this same fix to the equivalent code in pkg/dashboard/collectors/usage.go to ensure all clusters and usage data are fetched across all pages.pkg/dashboard/collectors/operators.go-441-446 (1)
441-446:⚠️ Potential issue | 🟠 MajorEliminate redundant per-run S3 list in fan-out loop.
The initial
ListObjectsV2Pagesscan (lines 32–57) already identifies all JUnit XML keys for the operator, but the code discards the key and stores only the(component, dateStr, jobID)tuple. The fan-out loop then performs a separateListObjectsV2for each candidate run to rediscover the same key.Extend
runKeyto include the JUnit key discovered during the initial scan, eliminating the redundant listing on lines 443–446.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/operators.go` around lines 441 - 446, The runKey structure currently only stores component, dateStr, and jobID, but the initial ListObjectsV2Pages scan already discovers the JUnit XML keys that are needed later. Extend the runKey struct to include a field for the JUnit XML key, populate this field during the initial ListObjectsV2Pages scan (lines 32-57) when each candidate run is identified, and then use the pre-discovered key from the runKey in the fan-out loop instead of performing the redundant ListObjectsV2 call on lines 443-446.pkg/dashboard/collectors/s3tests.go-169-170 (1)
169-170:⚠️ Potential issue | 🟠 MajorStop pagination early to reach
maxResultscauses omission of newer tests.The function stops
ListObjectsV2PagesonceresultsByJobreachesmaxResultsentries (line 170). However, S3 lists objects in lexicographical order by key. Since keys include dates (test-results/<component>/<date>/<job-id>/junit*.xml), lexicographical ordering matches chronological order, placing older jobs in earlier pages and newer jobs in later pages. Stopping early prevents collection of newer jobs that appear lexicographically after themaxResultsthreshold. The subsequent sort (line 182-185) only reorders the incomplete subset already collected.Fix: Collect all results (or a large buffer) across all pages before sorting and limiting by timestamp, ensuring newer jobs are included in the "recent tests" result set.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/s3tests.go` around lines 169 - 170, The pagination logic in the ListObjectsV2Pages iteration stops early when resultsByJob reaches maxResults, but since S3 objects are ordered lexicographically by key (with dates embedded in the path), this prevents collection of newer jobs that appear later in the lexicographical order. Instead of returning early when len(resultsByJob) < maxResults at line 170, remove this early termination and allow the loop to iterate through all pages to collect all results. Then apply the sorting (lines 182-185) and result limiting logic after all pages have been processed, ensuring that newer jobs are properly included in the final filtered result set.pkg/dashboard/collectors/usage.go-92-95 (1)
92-95:⚠️ Potential issue | 🟠 MajorAdd pagination loop to fetch all clusters.
collectUsageForEnvuses.Size(1000)but omits.Page()and pagination logic. The function fetches only the first page, undercounting clusters in larger environments. Implement pagination consistent with existing patterns inpkg/common/providers/ocmprovider/cluster.go(lines 39–75, 700–730):Example pattern
page := 1 for { resp, err := provider.GetConnection().ClustersMgmt().V1().Clusters().List(). Search(query). Size(1000). Page(page). Send() if err != nil { return nil, fmt.Errorf("failed to query clusters: %w", err) } resp.Items().Each(func(cluster *v1.Cluster) bool { // process cluster return true }) if resp.Size() < 1000 { break } page++ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/usage.go` around lines 92 - 95, The collectUsageForEnv function fetches clusters with a fixed size of 1000 but lacks pagination logic, causing it to retrieve only the first page of results. Implement a pagination loop around the existing provider.GetConnection().ClustersMgmt().V1().Clusters().List() call that starts at page 1, adds .Page(page) to each request, increments the page counter after processing each response, and terminates when the response size is less than 1000 (indicating the final page). Process all clusters within the loop using resp.Items().Each() or similar iteration, following the pagination pattern already established in pkg/common/providers/ocmprovider/cluster.go.dashboard.Dockerfile-10-10 (1)
10-10:⚠️ Potential issue | 🟠 Major | ⚡ Quick winAvoid copying the entire build context into the image build stage.
COPY . .can unintentionally include sensitive or irrelevant files in build context/layers. Restrict copied paths and/or enforce a strict.dockerignore.As per coding guidelines:
**/{Dockerfile,Containerfile}*: "COPY specific files, not entire context".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dashboard.Dockerfile` at line 10, The COPY . . command on line 10 is copying the entire build context into the Docker image, which can include sensitive or irrelevant files unnecessarily. Replace this broad copy instruction with specific file and directory paths that are actually required for the build (such as package.json, source code directories, configuration files, etc.). If needed, also create or update a .dockerignore file to explicitly exclude sensitive files and directories that should not be included in the build context.Source: Coding guidelines
configs/local/dashboard-build/Dockerfile-1-5 (1)
1-5:⚠️ Potential issue | 🟠 Major | ⚡ Quick winSet a non-root runtime user in this image.
The container currently runs as root by default.
Suggested fix
FROM registry.access.redhat.com/ubi9/ubi-minimal:latest WORKDIR / COPY osde2e /osde2e +USER 1001 ENV PATH="${PATH}:/" ENTRYPOINT ["/osde2e"]As per coding guidelines:
**/{Dockerfile,Containerfile}*: "USER non-root; never run as root".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@configs/local/dashboard-build/Dockerfile` around lines 1 - 5, The Dockerfile in configs/local/dashboard-build/Dockerfile runs the container as root by default, which violates security guidelines. Add a USER instruction before the ENTRYPOINT directive to specify a non-root user for runtime execution. This ensures the container adheres to the coding guideline that requires USER non-root and prevents running containers with root privileges.Sources: Coding guidelines, Linters/SAST tools
dashboard.Dockerfile-14-20 (1)
14-20:⚠️ Potential issue | 🟠 Major | ⚡ Quick winRun the final image as a non-root user.
The runtime stage executes as root by default, which weakens container isolation.
Suggested fix
FROM registry.access.redhat.com/ubi9/ubi-minimal:latest WORKDIR / COPY --from=builder /opt/app-root/src/github.com/openshift/osde2e/out/osde2e . +USER 1001 ENV PATH="${PATH}:/" ENTRYPOINT ["/osde2e"]As per coding guidelines:
**/{Dockerfile,Containerfile}*: "USER non-root; never run as root".🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@dashboard.Dockerfile` around lines 14 - 20, The final runtime stage in the Dockerfile lacks a USER directive, causing the container to execute as root, which violates security guidelines. Add a USER directive after the COPY command and before the ENTRYPOINT in the final stage to specify a non-root user account for running the osde2e application. This ensures the container runs with restricted permissions rather than as root.Sources: Coding guidelines, Linters/SAST tools
scripts/dashboard/verify-build.sh-72-76 (1)
72-76:⚠️ Potential issue | 🟠 Major | ⚡ Quick winFormatting gate uses
gofmtinstead of requiredgofumpt.The verification step should enforce
gofumptto match repo policy; otherwise style checks can pass while violating project standards.Suggested fix
-# Check for syntax errors (gofmt) -echo "5. Checking Go syntax..." +# Check formatting (gofumpt) +echo "5. Checking Go formatting..." DASHBOARD_FILES=$(find pkg/dashboard cmd/osde2e/dashboard -name "*.go" 2>/dev/null) if [ -n "$DASHBOARD_FILES" ]; then - gofmt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt + gofumpt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txtAs per coding guidelines, “Use gofumpt for code formatting, not gofmt.”
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/dashboard/verify-build.sh` around lines 72 - 76, The formatting verification step in the Go syntax check is using gofmt instead of the required gofumpt tool. Replace the gofmt command in the check with gofumpt to enforce the project's code formatting standards. Specifically, change the gofmt -l invocation to use gofumpt -l against the DASHBOARD_FILES variable to ensure the verification gate correctly validates formatting according to project policy.Source: Coding guidelines
scripts/dashboard/verify-build.sh-54-60 (1)
54-60:⚠️ Potential issue | 🟠 Major | ⚡ Quick winTemplate verification list is out of sync with the dashboard templates.
The script checks
dashboard.html,reserves.html, andtests.html, but the current dashboard layer usesoperators.html,pipeline-detail.html, andanalysis.html. This will fail verification on a valid tree.Suggested fix
TEMPLATES=( "pkg/dashboard/server/templates/base.html" - "pkg/dashboard/server/templates/dashboard.html" - "pkg/dashboard/server/templates/reserves.html" + "pkg/dashboard/server/templates/operators.html" + "pkg/dashboard/server/templates/pipeline-detail.html" "pkg/dashboard/server/templates/usage.html" - "pkg/dashboard/server/templates/tests.html" + "pkg/dashboard/server/templates/analysis.html" )🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/dashboard/verify-build.sh` around lines 54 - 60, The TEMPLATES array in the verify-build.sh script contains outdated template file names that don't match the current dashboard implementation. Update the TEMPLATES array to replace the old template names (dashboard.html, reserves.html, tests.html) with the actual template files currently used by the dashboard (operators.html, pipeline-detail.html, analysis.html), ensuring the verification script checks for the correct template files in pkg/dashboard/server/templates/.
🟡 Minor comments (12)
pkg/dashboard/README.md-19-32 (1)
19-32:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winFix the dashboard tree and status checklist.
pkg/dashboard/server/templates/is the actual location, but the tree showspkg/dashboard/templates/. The same section still marks templates as TODO even though this PR adds them, so the documentation is stale.Also applies to: 212-233
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/README.md` around lines 19 - 32, Update the directory tree in the README.md to reflect the correct location of templates, moving the templates entry from under pkg/dashboard/ directly to under pkg/dashboard/server/, and remove the TODO marker from the templates section since templates are now included in this PR rather than being a planned future addition.pkg/dashboard/README.md-61-63 (1)
61-63:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse
prod, notproduction, in the examples.The dashboard config accepts
stage,int,prod, orall;productionis not a valid value in either the CLI or API example.Also applies to: 120-124
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/README.md` around lines 61 - 63, The README.md file contains examples using `production` as an environment value, but the dashboard config only accepts `stage`, `int`, `prod`, or `all`. Replace all instances of `production` with `prod` in the environment examples throughout the document, including the filter by environment section starting at line 61 and any other occurrences mentioned in the range 120-124. Ensure all CLI and API examples consistently use the valid `prod` value instead of the invalid `production` value.pkg/dashboard/README.md-151-155 (1)
151-155:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDocument the live test-results endpoint.
/api/v1/testsis not registered by the server; the dashboard exposes/api/v1/deliverablesfor this data.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/README.md` around lines 151 - 155, The Test Results section in the README.md file documents an incorrect endpoint. The `/api/v1/tests` endpoint is not registered by the server; the dashboard actually exposes `/api/v1/deliverables` for test results data. Update the Test Results section to replace the documented endpoint from `/api/v1/tests` to `/api/v1/deliverables` and adjust the related endpoint path `/api/v1/tests/{job-id}` to `/api/v1/deliverables/{job-id}` to accurately reflect the actual API endpoints provided by the dashboard.pkg/dashboard/store/store.go-216-225 (1)
216-225:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winSilent JSON unmarshal failures may mask data corruption.
Lines 217 and 296 discard unmarshal errors. Corrupted
failed_testsJSON will silently yield an empty slice, potentially hiding real failures.Consider logging at debug level:
var failedTests []models.FailedTestCase - _ = json.Unmarshal([]byte(ftJSON), &failedTests) + if err := json.Unmarshal([]byte(ftJSON), &failedTests); err != nil { + log.Printf("Warning: failed_tests unmarshal for %s/%s: %v", name, env, err) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/store/store.go` around lines 216 - 225, The json.Unmarshal call for failedTests is silently discarding errors by assigning to the blank identifier, which can hide data corruption issues. Instead of ignoring the error, capture it and log it at debug level to aid in diagnostics when the JSON parsing fails. Apply the same approach to the failedTests unmarshal as is already being done for the llm unmarshal error handling.pkg/dashboard/collectors/sqs.go-100-103 (1)
100-103:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDeleteMessage error silently discarded.
Failed deletions leave messages on the queue, causing reprocessing. Log the error at minimum.
- _, _ = c.sqsClient.DeleteMessage(&sqs.DeleteMessageInput{ + if _, err := c.sqsClient.DeleteMessage(&sqs.DeleteMessageInput{ QueueUrl: aws.String(c.queueURL), ReceiptHandle: msg.ReceiptHandle, - }) + }); err != nil { + log.Printf("SQS consumer: delete error: %v", err) + }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/sqs.go` around lines 100 - 103, The DeleteMessage call on sqsClient in the SQS collector is discarding the error return value using blank identifiers, which masks failures that cause messages to remain on the queue. Capture the error return value from the DeleteMessage method call instead of discarding it with underscores, and log any error that occurs using an appropriate logging mechanism to ensure visibility into deletion failures.pkg/dashboard/collectors/sqs.go-245-246 (1)
245-246:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winResponse body not closed via defer.
If
io.ReadAllpanics, the HTTP connection leaks. Usedeferimmediately after error check.if err != nil { continue } + defer out.Body.Close() data, err := io.ReadAll(out.Body) - out.Body.Close()🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/sqs.go` around lines 245 - 246, The response body out.Body is closed directly after io.ReadAll without using defer, which means if io.ReadAll panics, the connection will leak. Move the out.Body.Close() call to a deferred statement immediately after the error check for obtaining the response object, rather than calling it directly after the io.ReadAll call. This ensures the body is closed even if io.ReadAll panics.pkg/dashboard/server/server.go-397-399 (1)
397-399:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winUse
cfg.MaxTestResultsinstead of a hardcoded20.
CollectRecentTests(20)bypasses runtime configuration and makesMaxTestResultsineffective.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/server/server.go` around lines 397 - 399, In the testResultCollector block where CollectRecentTests is called, replace the hardcoded integer value 20 with cfg.MaxTestResults to ensure the function respects the runtime configuration. This allows the maximum number of test results to be determined by the configuration settings instead of being hardcoded in the function call.cmd/osde2e/dashboard/cmd.go-51-53 (1)
51-53:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winNormalize the
environmentvalue before persisting it.Line 52 advertises
integration, while dashboard config contracts useint. This can route an unsupported environment string downstream.Suggested fix
if cmd.PersistentFlags().Changed("environment") { - viper.Set(config.Environment, args.environment) - viper.Set(ocmprovider.Env, args.environment) + env := args.environment + if env == "integration" { + env = "int" + } + viper.Set(config.Environment, env) + viper.Set(ocmprovider.Env, env) }Also applies to: 96-99
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/osde2e/dashboard/cmd.go` around lines 51 - 53, The environment flag at the StringVarP definition allows users to specify "integration" as documented in the help text, but the dashboard configuration expects "int" as the normalized value. Add a normalization function that converts the user-provided environment value to match the expected dashboard config format, converting "integration" to "int" before the args.environment value is persisted or used downstream. Apply this same normalization logic to all other occurrences of environment flag handling as mentioned in the code review comment.scripts/dashboard/deploy.sh-19-20 (1)
19-20:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winValidate
NAMESPACEimmediately after parsing.If parsing fails or format changes, deployment fails later with less clear diagnostics.
Suggested fix
NAMESPACE=$(grep "^namespace:" "${OVERLAY_DIR}/kustomization.yaml" | awk '{print $2}') +if [[ -z "${NAMESPACE}" ]]; then + echo "Error: namespace not found in ${OVERLAY_DIR}/kustomization.yaml" + exit 1 +fi APP="delivery-dashboard"🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/dashboard/deploy.sh` around lines 19 - 20, The NAMESPACE variable is parsed from the kustomization.yaml file but is not validated immediately after assignment, which can lead to unclear errors later if parsing fails or the file format changes. Add a validation check immediately after the NAMESPACE variable assignment (the grep and awk command line) to ensure the parsed value is not empty. If validation fails, exit the script with a clear error message indicating that the namespace could not be parsed from the kustomization.yaml file, which will provide better diagnostics during deployment troubleshooting.Makefile-40-41 (1)
40-41:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winDo not enable
--backfillby default in thedashboardtarget.This target will trigger expensive historical scans every run. Make backfill opt-in.
Suggested fix
dashboard: build - "$(OUT_DIR)/osde2e" dashboard --db="$(DIR)dashboard.db" --backfill --port=8080 + "$(OUT_DIR)/osde2e" dashboard --db="$(DIR)dashboard.db" --port=8080 $(DASHBOARD_BACKFILL)# Optional: # make dashboard DASHBOARD_BACKFILL=--backfill🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@Makefile` around lines 40 - 41, Remove the hardcoded --backfill flag from the dashboard target command. Instead, make it optional by introducing a make variable (such as DASHBOARD_BACKFILL) that can be set when invoking the target. Modify the command in the dashboard target to conditionally append this variable so that backfill is only enabled when explicitly passed by the user, avoiding expensive historical scans on every default invocation.scripts/dashboard/verify-build.sh-76-76 (1)
76-76:⚠️ Potential issue | 🟡 Minor | ⚡ Quick winQuote file list expansion in formatter invocation.
gofumpt/gofmt -l $DASHBOARD_FILESis vulnerable to word-splitting/globbing. Use a null-delimitedfind+xargs -0pipeline for safe execution.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/dashboard/verify-build.sh` at line 76, The gofmt invocation with the unquoted `$DASHBOARD_FILES` variable is vulnerable to word-splitting and globbing attacks. Replace the current `gofmt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt` command with a safe null-delimited pipeline using find and xargs, such as piping the output of a find command with -print0 to xargs -0 gofmt -l, to safely handle file names with spaces or special characters and write the results to the same output file.Source: Linters/SAST tools
scripts/dashboard/verify-build.sh-4-4 (1)
4-4:⚠️ Potential issue | 🟡 MinorUse
gofumptinstead ofgofmtand addpipefailfor pipeline safety.Line 76 uses
gofmtbut osde2e requiresgofumptper coding guidelines. Additionally, lines 90 and 100 maskgo buildfailures via pipelines withoutset -o pipefail—theteecommand succeeds regardless ofgo buildstatus, allowing errors to slip through.Suggested fixes
-set -e +set -euo pipefail- gofmt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt + gofumpt -l $DASHBOARD_FILES > /tmp/dashboard-fmt-check.txt🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@scripts/dashboard/verify-build.sh` at line 4, Add the bash option `set -o pipefail` near the beginning of the script alongside the existing `set -e` command to ensure pipeline failures are properly caught. Replace the `gofmt` command on line 76 with `gofumpt` to comply with osde2e coding guidelines. This combination ensures that any failures in the `go build` pipelines at lines 90 and 100 will not be masked by the `tee` command and will properly cause the script to exit with an error status.
🧹 Nitpick comments (7)
pkg/dashboard/collectors/sqs.go (1)
283-283: ⚖️ Poor tradeoffBackfill lacks context for cancellation.
Large S3 scans can take minutes. Without a
context.Context, there's no way to abort on shutdown.-func (c *SQSConsumer) Backfill() error { +func (c *SQSConsumer) Backfill(ctx context.Context) error {Caller in
cmd/osde2e/dashboard/cmd.goalready hasctxavailable.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/sqs.go` at line 283, The Backfill method in the SQSConsumer struct does not accept a context.Context parameter, which prevents graceful cancellation of long-running S3 scans during shutdown. Add context.Context as the first parameter to the Backfill() method signature, then ensure this context is passed down to all underlying operations that support cancellation (such as S3 API calls and any loops that perform the scanning). Update the caller in cmd/osde2e/dashboard/cmd.go to pass the available ctx when invoking Backfill().pkg/dashboard/store/store.go (1)
478-484: ⚡ Quick winBubble sort is O(n²); consider
sort.Slice.For large failure group sets, this becomes a bottleneck. Standard library sort is O(n log n).
+import "sort" + - for i := 0; i < len(result)-1; i++ { - for j := i + 1; j < len(result); j++ { - if len(result[j].Entries) > len(result[i].Entries) { - result[i], result[j] = result[j], result[i] - } - } - } + sort.Slice(result, func(i, j int) bool { + return len(result[i].Entries) > len(result[j].Entries) + })🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/store/store.go` around lines 478 - 484, Replace the nested for loop bubble sort implementation with sort.Slice from the standard library. Remove the two nested loops that iterate through the result slice and perform the swap operation based on the Entries length comparison. Instead, use sort.Slice on the result slice with a comparison function that returns true when result[i].Entries should be ordered before result[j].Entries based on their lengths, allowing the sort package to efficiently handle the ordering in O(n log n) time instead of the current O(n²) bubble sort approach.pkg/dashboard/collectors/operators.go (1)
6-6: ⚡ Quick winReplace
log.Printfwith structuredklog/logr.Please align collector logs with project logging requirements and include structured fields (
operator,env,key,error).As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".
Also applies to: 173-173, 196-197, 281-281, 463-464
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/operators.go` at line 6, Replace the "log" package import with the appropriate klog/logr imports at the top of the file. Then locate and replace all log.Printf calls throughout the collectors/operators.go file (including those referenced at lines 173, 196-197, 281, and 463-464) with structured logging calls using logr/klog that include the required structured fields such as operator, env, key, and error for better observability and alignment with project logging standards.Source: Coding guidelines
pkg/dashboard/collectors/reserves.go (1)
5-6: ⚡ Quick winReplace
log.Printfwith structuredklog/logr.This collector uses unstructured standard logging despite project policy. Please switch to structured
klog/logrlogging with environment and query context fields.As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".
Also applies to: 25-26, 51-54, 63-63, 105-108
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/reserves.go` around lines 5 - 6, Remove the unstructured "log" import from the imports section and replace it with the appropriate klog/logr imports for structured logging. Then replace all log.Printf calls throughout the file (at the locations mentioned: 25-26, 51-54, 63, and 105-108) with structured logging calls using klog or logr, ensuring each logging statement includes relevant context fields such as environment and query parameters to support better debugging and monitoring as per project guidelines.Source: Coding guidelines
pkg/dashboard/collectors/s3tests.go (1)
7-7: ⚡ Quick winSwitch to structured
klog/logrlogging.The collector currently uses
log.Printf; please align with structured project logging.As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".
Also applies to: 161-162, 192-192, 283-284
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/s3tests.go` at line 7, Replace the standard "log" package import with the appropriate klog/logr structured logging import at the top of the file. Then find and replace all instances of log.Printf calls (which also occur at lines 161-162, 192, and 283-284 as noted) with structured logging calls using the klog/logr API, ensuring that logging messages follow the structured logging pattern with appropriate key-value pairs instead of formatted string output.Source: Coding guidelines
pkg/dashboard/collectors/usage.go (1)
5-5: ⚡ Quick winUse structured
klog/logrinstead oflog.Printf.This collector should follow project logging conventions for consistency and operational parsing.
As per coding guidelines, "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".
Also applies to: 34-35, 75-78, 84-84
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/dashboard/collectors/usage.go` at line 5, Replace the standard library `log` import at the top of the usage.go file with the appropriate `klog` or `logr` imports that the project uses for structured logging. Then replace all `log.Printf` calls throughout the file (at the locations mentioned: lines 34-35, 75-78, and 84) with corresponding structured logging calls using the project's structured logging library, ensuring each log statement uses the proper structured logging methods with key-value pairs or message formatting appropriate for the chosen logging framework.Source: Coding guidelines
cmd/osde2e/dashboard/cmd.go (1)
78-79: ⚡ Quick winUse
klog/logrstructured logging instead of the standardlogpackage.This file currently emits unstructured logs via
log.Printf/log.Println. Please align with project logging standards.As per coding guidelines:
**/*.go: "Use logr/klog for logging, not fmt.Println" and "Prefer structured logging with logr/klog".Also applies to: 85-86, 115-132, 137-138, 149-150, 158-164, 170-170, 179-180, 185-190
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@cmd/osde2e/dashboard/cmd.go` around lines 78 - 79, Replace all unstructured logging calls using the standard log package (log.Println and log.Printf) throughout the file with structured logging using klog/logr. Import the appropriate klog/logr logger and use its Info, Error, or Warn methods with structured key-value pairs instead of formatted strings. This includes the log.Println call at line 78 and all other instances mentioned in the comment (lines 85-86, 115-132, 137-138, 149-150, 158-164, 170, 179-180, 185-190) to ensure consistent structured logging aligned with project standards.Source: Coding guidelines
- Add /dashboard/junit handler — fetches JUnit XML from S3 via AWS creds and renders as HTML (kitproj/junit2html style) using existing go-junit dependency - Add /dashboard/s3 proxy handler — streams S3 objects server-side, replacing presigned URLs that expired after 7 days - Replace all presigned URL generation with plain proxy URLs; migrate existing DB rows on open - Rename /dashboard/deliverables → /dashboard/pipelines across routes, templates, and deploy script - Fix superfluous WriteHeader: render templates into buffer before writing response - Fix .Operators → .Deliverables stale reference in deliverables.html (empty table bug) - Fix .OperatorName → .Name stale references in analysis.html and pipeline-detail.html - Fix DB migration: rename operator_name → name column for existing databases - Remove InitAWSViper from dashboard cmd — it was clearing AWS env vars set in shell - Use standard AWS SDK credential chain (env vars → ~/.aws) for DeliverableCollector - Truncate DB before backfill so deleted S3 objects don't persist as stale rows - Remove fail-list panels from popups — show AI summary + links only - Rename "Version" → "Tag" and "Job ID" → "Job suffix" in popup meta fields - Normalise failure grouping key (lowercase, strip quotes) to merge duplicate groups - Add imagePullPolicy: Always and rollout restart to deploy script - Add add() template func for junit-report.html totals aggregation Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
- Version tag links to github.com/openshift/<repo>/commit/<sha> - Add githubRepo template func to strip -e2e-master/-e2e job suffixes - Revert cookie secret check from deploy script (oauth-proxy removed) Co-Authored-By: Claude Sonnet 4.6 (1M context) <noreply@anthropic.com>
be32a7e to
e1d8ec5
Compare
DRAFT: final PRs will be split up
Summary by CodeRabbit
New Features
dashboardCLI command that launches an interactive web server for monitoring cluster reserves, usage, test results, and operator pipelines.Documentation
Chores