[SDCICD-1858] Sanitize artifacts before S3 upload by YiqinZhang · Pull Request #3262 · openshift/osde2e

YiqinZhang · 2026-06-18T17:36:05Z

Summary

S3 upload path now redacts secrets (AWS keys, tokens, passwords, private keys, etc.) from artifact files before uploading to osde2e-logs bucket
Enable sanitization for LLM analysis aggregator prompt path (was wired but never activated)
Fail-open design: sanitization failures do not block uploads

Test plan

Unit tests for prepareUploadBody (secret redaction, fail-open, nil sanitizer)
go vet clean
All existing tests pass unchanged

JIRA: https://issues.redhat.com/browse/SDCICD-1858

Co-authored-by: Cursor

Summary by CodeRabbit

Release Notes

New Features
- S3 uploads now automatically detect and redact sensitive values in text-based files before uploading, enhancing data security and privacy protection.
Tests
- Added test coverage for the new file sanitization functionality during upload operations.

openshift-merge-bot · 2026-06-18T17:36:07Z

There are test jobs defined for this repository which are not configured to run automatically. Comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger jobs most likely to be impacted by the proposed changes.Comment /pipeline required to trigger all required & necessary jobs.

coderabbitai · 2026-06-18T17:36:14Z

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 29fb7576-9b15-4844-89b4-74e48a3684ad

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

_{Comment @coderabbitai help to get the list of available commands.}

openshift-ci · 2026-06-18T17:40:32Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: YiqinZhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details

Needs approval from an approver in each of these files:

~~OWNERS~~ [YiqinZhang]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

coderabbitai

Actionable comments posted: 1

🧹 Nitpick comments (1)

pkg/common/aws/s3.go (1)

333-340: ⚡ Quick win

Sanitization failures are silently swallowed.

When SanitizeText returns an error (e.g., oversized content), the failure is not logged. This complicates debugging when sanitization is unexpectedly skipped.

Proposed fix

 	if u.sanitizer != nil {
-		if result, err := u.sanitizer.SanitizeText(string(content), relPath); err == nil {
+		result, sanitizeErr := u.sanitizer.SanitizeText(string(content), relPath)
+		if sanitizeErr != nil {
+			log.Printf("Warning: sanitization failed for %s, uploading raw: %v", relPath, sanitizeErr)
+		} else {
 			if result.MatchesFound > 0 {
 				log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
 			}
 			content = []byte(result.Content)
-		}
+		}
 	}

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/common/aws/s3.go` around lines 333 - 340, The SanitizeText method call in
the sanitization block only logs when sanitization succeeds (err == nil) but
silently ignores any errors returned by the sanitizer. To fix this, add error
logging for the failure case by either using an else clause or checking the
error separately after the SanitizeText call on u.sanitizer, so that errors like
oversized content are properly logged for debugging instead of being swallowed.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/aws/s3.go`:
- Line 178: The error returned by sanitizer.New is being silently discarded
using the blank identifier, causing initialization failures to go unlogged and
masking configuration issues. Modify the sanitizer.New assignment to capture the
error into a named variable instead of discarding it with underscore, then add a
log statement (likely using logger or similar logging mechanism available in
this package) to record the error if sanitizer initialization fails, ensuring
that configuration or initialization problems are visible in logs while still
allowing prepareUploadBody to handle the nil sanitizer gracefully.

---

Nitpick comments:
In `@pkg/common/aws/s3.go`:
- Around line 333-340: The SanitizeText method call in the sanitization block
only logs when sanitization succeeds (err == nil) but silently ignores any
errors returned by the sanitizer. To fix this, add error logging for the failure
case by either using an else clause or checking the error separately after the
SanitizeText call on u.sanitizer, so that errors like oversized content are
properly logged for debugging instead of being swallowed.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2af983cb-06bf-48b9-8454-32302f1ce643

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae4e9 and c053ce4.

📒 Files selected for processing (4)

pkg/common/aws/s3.go
pkg/common/aws/s3_test.go
pkg/e2e/adhoctestimages/adhoctestimages.go
pkg/e2e/e2e.go

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/e2e/e2e.go`:
- Around line 311-313: The SanitizerConfig field is initialized with an empty
struct `&sanitizer.Config{}` which prevents the application of default values in
sanitizer.New() that only apply when config is nil. Change this initialization
from `&sanitizer.Config{}` to `nil` so that the defaults are properly applied,
including audit logging enablement and standard size limits. Alternatively, if
audit logging is required, explicitly set `EnableAudit: true` within the config
struct. This ensures both redaction patterns for secrets and audit visibility
are properly configured.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a2bd5a95-2042-4ecd-b5ff-44245e45eb3c

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae4e9 and 56bcad6.

📒 Files selected for processing (4)

pkg/common/aws/s3.go
pkg/common/aws/s3_test.go
pkg/e2e/adhoctestimages/adhoctestimages.go
pkg/e2e/e2e.go

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (1)

pkg/common/aws/s3.go (1)
178-181: 🧹 Nitpick | 🔵 Trivial

Replace log.Printf calls with structured logging in S3Uploader.

The log.Printf statements at lines 178-181, 288-302, and 336-342 use unstructured logging, making errors difficult to query and monitor. Refactor NewS3Uploader to accept a logr.Logger (or context.Context to extract logger) and replace all log.Printf calls with structured logging:
log.Printf("Warning: failed to initialize sanitizer, uploads will not be redacted: %v", err)
→ Replace with logger.WithValues("file", "s3.go").Info("sanitizer initialization failed", "error", err) or similar.

This aligns with the pattern used throughout pkg/common (executor.go, junit.go, rosa.go) where loggers are passed to constructors.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/common/aws/s3.go` around lines 178 - 181, The NewS3Uploader function uses
unstructured log.Printf calls which make errors difficult to monitor and query.
Refactor the NewS3Uploader constructor to accept a logr.Logger parameter, then
replace all three occurrences of log.Printf statements (the sanitizer
initialization warning and the other error logging locations throughout the
function) with structured logging calls using logger.Info or logger.Error with
appropriate key-value pairs to convey operation context and error details.
Source: Coding guidelines

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/aws/s3_test.go`:
- Around line 173-202: The test fixture in the secretContent variable contains
hard-coded secret-shaped literals (the GitHub token starting with ghp_) that
trigger security scanners. Instead of using literal secret values in the
secretContent string initialization, construct the AWS key and GitHub token
values programmatically at runtime by concatenating strings or generating them
from parts. This allows the sanitizer to still be tested with appropriately
formatted secrets without committing scanner-triggering literals, while keeping
the prepareUploadBody method and subsequent validation logic unchanged.
- Around line 220-239: The test is not properly validating the fail-open path
when sanitization fails. First, check the error returned from sanitizer.New
(instead of ignoring it with _) and assert it is nil to confirm the sanitizer
was created successfully. Then, improve the final assertion after
io.ReadAll(body) to verify that the returned raw fallback content exactly
matches the original content passed in, rather than just checking that it is
non-empty. This ensures the test actually proves the sanitizer-error path and
that raw content is used as the fallback without modification.

In `@pkg/common/aws/s3.go`:
- Around line 328-347: The prepareUploadBody method loads entire files into
memory with os.ReadFile before checking file size, which can cause memory
exhaustion on large artifacts. Refactor by first using os.Stat on the filePath
to check the file size, and define a bounded sanitization limit. If the file
exceeds this limit, stream the raw file content directly without loading it into
memory. Only read files smaller than the limit into memory for sanitization
processing. This ensures large artifacts are handled efficiently without
exhausting the uploader process memory.

---

Nitpick comments:
In `@pkg/common/aws/s3.go`:
- Around line 178-181: The NewS3Uploader function uses unstructured log.Printf
calls which make errors difficult to monitor and query. Refactor the
NewS3Uploader constructor to accept a logr.Logger parameter, then replace all
three occurrences of log.Printf statements (the sanitizer initialization warning
and the other error logging locations throughout the function) with structured
logging calls using logger.Info or logger.Error with appropriate key-value pairs
to convey operation context and error details.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0cf699c8-9a40-41da-8f5f-0b18294be323

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae4e9 and d582335.

📒 Files selected for processing (4)

pkg/common/aws/s3.go
pkg/common/aws/s3_test.go
pkg/e2e/adhoctestimages/adhoctestimages.go
pkg/e2e/e2e.go

coderabbitai · 2026-06-22T19:33:37Z

+// prepareUploadBody reads a file and redacts secrets before upload.
+// Falls back to raw content if sanitizer is nil or sanitization fails.
+func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (*bytes.Reader, int64, error) {
+	content, err := os.ReadFile(filePath)
+	if err != nil {
+		return nil, 0, err
+	}
+
+	if u.sanitizer != nil {
+		if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil {
+			log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err)
+		} else {
+			if result.MatchesFound > 0 {
+				log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
+			}
+			content = []byte(result.Content)
+		}
+	}
+
+	return bytes.NewReader(content), int64(len(content)), nil


⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid loading every artifact fully into memory.

os.ReadFile runs before any sanitizer limit can fail open, so one large report artifact can exhaust the uploader process. Stat first and stream raw content above a bounded sanitization limit; only read small artifacts into memory.

Suggested direction

+const maxSanitizableArtifactBytes int64 = 50 << 20 + -func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (*bytes.Reader, int64, error) { +func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (io.ReadCloser, int64, error) { + info, err := os.Stat(filePath) + if err != nil { + return nil, 0, err + } + + if u.sanitizer == nil || info.Size() > maxSanitizableArtifactBytes { + body, err := os.Open(filePath) + if err != nil { + return nil, 0, err + } + return body, info.Size(), nil + } + content, err := os.ReadFile(filePath) if err != nil { return nil, 0, err } - if u.sanitizer != nil { - if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil { - log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err) - } else { - if result.MatchesFound > 0 { - log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound) - } - content = []byte(result.Content) + if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil { + log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err) + } else { + if result.MatchesFound > 0 { + log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound) } + content = []byte(result.Content) } - return bytes.NewReader(content), int64(len(content)), nil + return io.NopCloser(bytes.NewReader(content)), int64(len(content)), nil }

- body, size, prepErr := u.prepareUploadBody(filePath, relPath) + body, size, prepErr := u.prepareUploadBody(filePath, relPath) if prepErr != nil { log.Printf("Warning: failed to prepare %s: %v", filePath, prepErr) return nil } - _, uploadErr := u.uploader.Upload(&s3manager.UploadInput{ + _, uploadErr := u.uploader.Upload(&s3manager.UploadInput{ Bucket: aws.String(u.bucket), Key: aws.String(s3Key), Body: body, ContentType: aws.String(contentType), }) + closeErr := body.Close() if uploadErr != nil { log.Printf("Warning: failed to upload %s: %v", filePath, uploadErr) return nil } + if closeErr != nil { + log.Printf("Warning: failed to close %s: %v", filePath, closeErr) + return nil + }

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/common/aws/s3.go` around lines 328 - 347, The prepareUploadBody method loads entire files into memory with os.ReadFile before checking file size, which can cause memory exhaustion on large artifacts. Refactor by first using os.Stat on the filePath to check the file size, and define a bounded sanitization limit. If the file exceeds this limit, stream the raw file content directly without loading it into memory. Only read files smaller than the limit into memory for sanitization processing. This ensures large artifacts are handled efficiently without exhausting the uploader process memory.

YiqinZhang · 2026-06-23T19:32:10Z

/pipeline required

openshift-merge-bot · 2026-06-23T19:32:13Z

Scheduling required tests:
/test hypershift-pr-check

openshift-ci · 2026-06-23T20:21:38Z

@YiqinZhang: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

YiqinZhang force-pushed the SDCICD-1858 branch from e51ae30 to df620b6 Compare June 18, 2026 17:38

openshift-ci Bot requested review from minlei98 and varunraokadaparthi June 18, 2026 17:40

openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2026

Add Sanitizer before S3 upload

c053ce4

YiqinZhang force-pushed the SDCICD-1858 branch from df620b6 to c053ce4 Compare June 18, 2026 17:49

coderabbitai Bot reviewed Jun 18, 2026

View reviewed changes

Comment thread pkg/common/aws/s3.go Outdated

ritmun reviewed Jun 18, 2026

View reviewed changes

Comment thread pkg/e2e/adhoctestimages/adhoctestimages.go Outdated

ritmun reviewed Jun 18, 2026

View reviewed changes

Comment thread pkg/e2e/e2e.go Outdated

YiqinZhang force-pushed the SDCICD-1858 branch from 37413e6 to 56bcad6 Compare June 22, 2026 19:05

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

Comment thread pkg/e2e/e2e.go Outdated

YiqinZhang force-pushed the SDCICD-1858 branch 2 times, most recently from 6e516ad to d582335 Compare June 22, 2026 19:18

coderabbitai Bot reviewed Jun 22, 2026

View reviewed changes

YiqinZhang force-pushed the SDCICD-1858 branch 3 times, most recently from d1e6338 to c2ffe3f Compare June 23, 2026 14:54

log sanitizer errors, explicit config:

4ab35d1

YiqinZhang force-pushed the SDCICD-1858 branch from c2ffe3f to 4ab35d1 Compare June 23, 2026 18:34

YiqinZhang requested a review from ritmun June 23, 2026 21:44

Conversation

YiqinZhang commented Jun 18, 2026 • edited by coderabbitai Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Summary by CodeRabbit

Release Notes

Uh oh!

openshift-merge-bot Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot commented Jun 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Uh oh!

openshift-ci Bot commented Jun 18, 2026

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

coderabbitai Bot Jun 22, 2026

Choose a reason for hiding this comment

Uh oh!

YiqinZhang commented Jun 23, 2026

Uh oh!

openshift-merge-bot Bot commented Jun 23, 2026

Uh oh!

openshift-ci Bot commented Jun 23, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

YiqinZhang commented Jun 18, 2026 •

edited by coderabbitai Bot

Loading

coderabbitai Bot commented Jun 18, 2026 •

edited

Loading