Skip to content

[SDCICD-1858] Sanitize artifacts before S3 upload#3262

Open
YiqinZhang wants to merge 2 commits into
openshift:mainfrom
YiqinZhang:SDCICD-1858
Open

[SDCICD-1858] Sanitize artifacts before S3 upload#3262
YiqinZhang wants to merge 2 commits into
openshift:mainfrom
YiqinZhang:SDCICD-1858

Conversation

@YiqinZhang

@YiqinZhang YiqinZhang commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Summary

  • S3 upload path now redacts secrets (AWS keys, tokens, passwords, private keys, etc.) from artifact files before uploading to osde2e-logs bucket
  • Enable sanitization for LLM analysis aggregator prompt path (was wired but never activated)
  • Fail-open design: sanitization failures do not block uploads

Test plan

  • Unit tests for prepareUploadBody (secret redaction, fail-open, nil sanitizer)
  • go vet clean
  • All existing tests pass unchanged

JIRA: https://issues.redhat.com/browse/SDCICD-1858

Co-authored-by: Cursor

Summary by CodeRabbit

Release Notes

  • New Features

    • S3 uploads now automatically detect and redact sensitive values in text-based files before uploading, enhancing data security and privacy protection.
  • Tests

    • Added test coverage for the new file sanitization functionality during upload operations.

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

There are test jobs defined for this repository which are not configured to run automatically. Comment /test ? to see a list of all defined jobs. Review these jobs and use /test <job> to manually trigger jobs most likely to be impacted by the proposed changes.Comment /pipeline required to trigger all required & necessary jobs.

@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown

Important

Review skipped

Auto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 29fb7576-9b15-4844-89b4-74e48a3684ad

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands.

@openshift-ci

openshift-ci Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: YiqinZhang

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci openshift-ci Bot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Jun 18, 2026

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
pkg/common/aws/s3.go (1)

333-340: ⚡ Quick win

Sanitization failures are silently swallowed.

When SanitizeText returns an error (e.g., oversized content), the failure is not logged. This complicates debugging when sanitization is unexpectedly skipped.

Proposed fix
 	if u.sanitizer != nil {
-		if result, err := u.sanitizer.SanitizeText(string(content), relPath); err == nil {
+		result, sanitizeErr := u.sanitizer.SanitizeText(string(content), relPath)
+		if sanitizeErr != nil {
+			log.Printf("Warning: sanitization failed for %s, uploading raw: %v", relPath, sanitizeErr)
+		} else {
 			if result.MatchesFound > 0 {
 				log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
 			}
 			content = []byte(result.Content)
-		}
+		}
 	}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/common/aws/s3.go` around lines 333 - 340, The SanitizeText method call in
the sanitization block only logs when sanitization succeeds (err == nil) but
silently ignores any errors returned by the sanitizer. To fix this, add error
logging for the failure case by either using an else clause or checking the
error separately after the SanitizeText call on u.sanitizer, so that errors like
oversized content are properly logged for debugging instead of being swallowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/aws/s3.go`:
- Line 178: The error returned by sanitizer.New is being silently discarded
using the blank identifier, causing initialization failures to go unlogged and
masking configuration issues. Modify the sanitizer.New assignment to capture the
error into a named variable instead of discarding it with underscore, then add a
log statement (likely using logger or similar logging mechanism available in
this package) to record the error if sanitizer initialization fails, ensuring
that configuration or initialization problems are visible in logs while still
allowing prepareUploadBody to handle the nil sanitizer gracefully.

---

Nitpick comments:
In `@pkg/common/aws/s3.go`:
- Around line 333-340: The SanitizeText method call in the sanitization block
only logs when sanitization succeeds (err == nil) but silently ignores any
errors returned by the sanitizer. To fix this, add error logging for the failure
case by either using an else clause or checking the error separately after the
SanitizeText call on u.sanitizer, so that errors like oversized content are
properly logged for debugging instead of being swallowed.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 2af983cb-06bf-48b9-8454-32302f1ce643

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae4e9 and c053ce4.

📒 Files selected for processing (4)
  • pkg/common/aws/s3.go
  • pkg/common/aws/s3_test.go
  • pkg/e2e/adhoctestimages/adhoctestimages.go
  • pkg/e2e/e2e.go

Comment thread pkg/common/aws/s3.go Outdated
Comment thread pkg/e2e/adhoctestimages/adhoctestimages.go Outdated
Comment thread pkg/e2e/e2e.go Outdated

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/e2e/e2e.go`:
- Around line 311-313: The SanitizerConfig field is initialized with an empty
struct `&sanitizer.Config{}` which prevents the application of default values in
sanitizer.New() that only apply when config is nil. Change this initialization
from `&sanitizer.Config{}` to `nil` so that the defaults are properly applied,
including audit logging enablement and standard size limits. Alternatively, if
audit logging is required, explicitly set `EnableAudit: true` within the config
struct. This ensures both redaction patterns for secrets and audit visibility
are properly configured.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: a2bd5a95-2042-4ecd-b5ff-44245e45eb3c

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae4e9 and 56bcad6.

📒 Files selected for processing (4)
  • pkg/common/aws/s3.go
  • pkg/common/aws/s3_test.go
  • pkg/e2e/adhoctestimages/adhoctestimages.go
  • pkg/e2e/e2e.go

Comment thread pkg/e2e/e2e.go Outdated
@YiqinZhang YiqinZhang force-pushed the SDCICD-1858 branch 2 times, most recently from 6e516ad to d582335 Compare June 22, 2026 19:18

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
pkg/common/aws/s3.go (1)

178-181: 🧹 Nitpick | 🔵 Trivial

Replace log.Printf calls with structured logging in S3Uploader.

The log.Printf statements at lines 178-181, 288-302, and 336-342 use unstructured logging, making errors difficult to query and monitor. Refactor NewS3Uploader to accept a logr.Logger (or context.Context to extract logger) and replace all log.Printf calls with structured logging:

log.Printf("Warning: failed to initialize sanitizer, uploads will not be redacted: %v", err)

→ Replace with logger.WithValues("file", "s3.go").Info("sanitizer initialization failed", "error", err) or similar.

This aligns with the pattern used throughout pkg/common (executor.go, junit.go, rosa.go) where loggers are passed to constructors.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/common/aws/s3.go` around lines 178 - 181, The NewS3Uploader function uses
unstructured log.Printf calls which make errors difficult to monitor and query.
Refactor the NewS3Uploader constructor to accept a logr.Logger parameter, then
replace all three occurrences of log.Printf statements (the sanitizer
initialization warning and the other error logging locations throughout the
function) with structured logging calls using logger.Info or logger.Error with
appropriate key-value pairs to convey operation context and error details.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/common/aws/s3_test.go`:
- Around line 173-202: The test fixture in the secretContent variable contains
hard-coded secret-shaped literals (the GitHub token starting with ghp_) that
trigger security scanners. Instead of using literal secret values in the
secretContent string initialization, construct the AWS key and GitHub token
values programmatically at runtime by concatenating strings or generating them
from parts. This allows the sanitizer to still be tested with appropriately
formatted secrets without committing scanner-triggering literals, while keeping
the prepareUploadBody method and subsequent validation logic unchanged.
- Around line 220-239: The test is not properly validating the fail-open path
when sanitization fails. First, check the error returned from sanitizer.New
(instead of ignoring it with _) and assert it is nil to confirm the sanitizer
was created successfully. Then, improve the final assertion after
io.ReadAll(body) to verify that the returned raw fallback content exactly
matches the original content passed in, rather than just checking that it is
non-empty. This ensures the test actually proves the sanitizer-error path and
that raw content is used as the fallback without modification.

In `@pkg/common/aws/s3.go`:
- Around line 328-347: The prepareUploadBody method loads entire files into
memory with os.ReadFile before checking file size, which can cause memory
exhaustion on large artifacts. Refactor by first using os.Stat on the filePath
to check the file size, and define a bounded sanitization limit. If the file
exceeds this limit, stream the raw file content directly without loading it into
memory. Only read files smaller than the limit into memory for sanitization
processing. This ensures large artifacts are handled efficiently without
exhausting the uploader process memory.

---

Nitpick comments:
In `@pkg/common/aws/s3.go`:
- Around line 178-181: The NewS3Uploader function uses unstructured log.Printf
calls which make errors difficult to monitor and query. Refactor the
NewS3Uploader constructor to accept a logr.Logger parameter, then replace all
three occurrences of log.Printf statements (the sanitizer initialization warning
and the other error logging locations throughout the function) with structured
logging calls using logger.Info or logger.Error with appropriate key-value pairs
to convey operation context and error details.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository YAML (base), Central YAML (inherited)

Review profile: CHILL

Plan: Enterprise

Run ID: 0cf699c8-9a40-41da-8f5f-0b18294be323

📥 Commits

Reviewing files that changed from the base of the PR and between c7ae4e9 and d582335.

📒 Files selected for processing (4)
  • pkg/common/aws/s3.go
  • pkg/common/aws/s3_test.go
  • pkg/e2e/adhoctestimages/adhoctestimages.go
  • pkg/e2e/e2e.go

Comment thread pkg/common/aws/s3_test.go Outdated
Comment thread pkg/common/aws/s3_test.go Outdated
Comment thread pkg/common/aws/s3.go Outdated
Comment on lines +328 to +347
// prepareUploadBody reads a file and redacts secrets before upload.
// Falls back to raw content if sanitizer is nil or sanitization fails.
func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (*bytes.Reader, int64, error) {
content, err := os.ReadFile(filePath)
if err != nil {
return nil, 0, err
}

if u.sanitizer != nil {
if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil {
log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err)
} else {
if result.MatchesFound > 0 {
log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
}
content = []byte(result.Content)
}
}

return bytes.NewReader(content), int64(len(content)), nil

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | 🏗️ Heavy lift

Avoid loading every artifact fully into memory.

os.ReadFile runs before any sanitizer limit can fail open, so one large report artifact can exhaust the uploader process. Stat first and stream raw content above a bounded sanitization limit; only read small artifacts into memory.

Suggested direction
+const maxSanitizableArtifactBytes int64 = 50 << 20
+
-func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (*bytes.Reader, int64, error) {
+func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (io.ReadCloser, int64, error) {
+	info, err := os.Stat(filePath)
+	if err != nil {
+		return nil, 0, err
+	}
+
+	if u.sanitizer == nil || info.Size() > maxSanitizableArtifactBytes {
+		body, err := os.Open(filePath)
+		if err != nil {
+			return nil, 0, err
+		}
+		return body, info.Size(), nil
+	}
+
 	content, err := os.ReadFile(filePath)
 	if err != nil {
 		return nil, 0, err
 	}
 
-	if u.sanitizer != nil {
-		if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil {
-			log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err)
-		} else {
-			if result.MatchesFound > 0 {
-				log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
-			}
-			content = []byte(result.Content)
+	if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil {
+		log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err)
+	} else {
+		if result.MatchesFound > 0 {
+			log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
 		}
+		content = []byte(result.Content)
 	}
 
-	return bytes.NewReader(content), int64(len(content)), nil
+	return io.NopCloser(bytes.NewReader(content)), int64(len(content)), nil
 }
-		body, size, prepErr := u.prepareUploadBody(filePath, relPath)
+		body, size, prepErr := u.prepareUploadBody(filePath, relPath)
 		if prepErr != nil {
 			log.Printf("Warning: failed to prepare %s: %v", filePath, prepErr)
 			return nil
 		}
 
-		_, uploadErr := u.uploader.Upload(&s3manager.UploadInput{
+		_, uploadErr := u.uploader.Upload(&s3manager.UploadInput{
 			Bucket:      aws.String(u.bucket),
 			Key:         aws.String(s3Key),
 			Body:        body,
 			ContentType: aws.String(contentType),
 		})
+		closeErr := body.Close()
 		if uploadErr != nil {
 			log.Printf("Warning: failed to upload %s: %v", filePath, uploadErr)
 			return nil
 		}
+		if closeErr != nil {
+			log.Printf("Warning: failed to close %s: %v", filePath, closeErr)
+			return nil
+		}
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/common/aws/s3.go` around lines 328 - 347, The prepareUploadBody method
loads entire files into memory with os.ReadFile before checking file size, which
can cause memory exhaustion on large artifacts. Refactor by first using os.Stat
on the filePath to check the file size, and define a bounded sanitization limit.
If the file exceeds this limit, stream the raw file content directly without
loading it into memory. Only read files smaller than the limit into memory for
sanitization processing. This ensures large artifacts are handled efficiently
without exhausting the uploader process memory.

@YiqinZhang YiqinZhang force-pushed the SDCICD-1858 branch 3 times, most recently from d1e6338 to c2ffe3f Compare June 23, 2026 14:54
@YiqinZhang

Copy link
Copy Markdown
Contributor Author

/pipeline required

@openshift-merge-bot

Copy link
Copy Markdown
Contributor

Scheduling required tests:
/test hypershift-pr-check

@openshift-ci

openshift-ci Bot commented Jun 23, 2026

Copy link
Copy Markdown
Contributor

@YiqinZhang: all tests passed!

Full PR test history. Your PR dashboard.

Details

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

@YiqinZhang YiqinZhang requested a review from ritmun June 23, 2026 21:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

approved Indicates a PR has been approved by an approver from all required OWNERS files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants