[SDCICD-1858] Sanitize artifacts before S3 upload#3262
Conversation
|
There are test jobs defined for this repository which are not configured to run automatically. Comment |
|
Important Review skippedAuto reviews are disabled on this repository. Please check the settings in the CodeRabbit UI or the ⚙️ Run configurationConfiguration used: Repository YAML (base), Central YAML (inherited) Review profile: CHILL Plan: Enterprise Run ID: You can disable this status message by setting the Use the checkbox below for a quick retry:
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
|
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: YiqinZhang The full list of commands accepted by this bot can be found here. The pull request process is described here DetailsNeeds approval from an approver in each of these files:
Approvers can indicate their approval by writing |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
pkg/common/aws/s3.go (1)
333-340: ⚡ Quick winSanitization failures are silently swallowed.
When
SanitizeTextreturns an error (e.g., oversized content), the failure is not logged. This complicates debugging when sanitization is unexpectedly skipped.Proposed fix
if u.sanitizer != nil { - if result, err := u.sanitizer.SanitizeText(string(content), relPath); err == nil { + result, sanitizeErr := u.sanitizer.SanitizeText(string(content), relPath) + if sanitizeErr != nil { + log.Printf("Warning: sanitization failed for %s, uploading raw: %v", relPath, sanitizeErr) + } else { if result.MatchesFound > 0 { log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound) } content = []byte(result.Content) - } + } }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/common/aws/s3.go` around lines 333 - 340, The SanitizeText method call in the sanitization block only logs when sanitization succeeds (err == nil) but silently ignores any errors returned by the sanitizer. To fix this, add error logging for the failure case by either using an else clause or checking the error separately after the SanitizeText call on u.sanitizer, so that errors like oversized content are properly logged for debugging instead of being swallowed.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/common/aws/s3.go`:
- Line 178: The error returned by sanitizer.New is being silently discarded
using the blank identifier, causing initialization failures to go unlogged and
masking configuration issues. Modify the sanitizer.New assignment to capture the
error into a named variable instead of discarding it with underscore, then add a
log statement (likely using logger or similar logging mechanism available in
this package) to record the error if sanitizer initialization fails, ensuring
that configuration or initialization problems are visible in logs while still
allowing prepareUploadBody to handle the nil sanitizer gracefully.
---
Nitpick comments:
In `@pkg/common/aws/s3.go`:
- Around line 333-340: The SanitizeText method call in the sanitization block
only logs when sanitization succeeds (err == nil) but silently ignores any
errors returned by the sanitizer. To fix this, add error logging for the failure
case by either using an else clause or checking the error separately after the
SanitizeText call on u.sanitizer, so that errors like oversized content are
properly logged for debugging instead of being swallowed.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 2af983cb-06bf-48b9-8454-32302f1ce643
📒 Files selected for processing (4)
pkg/common/aws/s3.gopkg/common/aws/s3_test.gopkg/e2e/adhoctestimages/adhoctestimages.gopkg/e2e/e2e.go
There was a problem hiding this comment.
Actionable comments posted: 1
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/e2e/e2e.go`:
- Around line 311-313: The SanitizerConfig field is initialized with an empty
struct `&sanitizer.Config{}` which prevents the application of default values in
sanitizer.New() that only apply when config is nil. Change this initialization
from `&sanitizer.Config{}` to `nil` so that the defaults are properly applied,
including audit logging enablement and standard size limits. Alternatively, if
audit logging is required, explicitly set `EnableAudit: true` within the config
struct. This ensures both redaction patterns for secrets and audit visibility
are properly configured.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: a2bd5a95-2042-4ecd-b5ff-44245e45eb3c
📒 Files selected for processing (4)
pkg/common/aws/s3.gopkg/common/aws/s3_test.gopkg/e2e/adhoctestimages/adhoctestimages.gopkg/e2e/e2e.go
6e516ad to
d582335
Compare
There was a problem hiding this comment.
Actionable comments posted: 3
🧹 Nitpick comments (1)
pkg/common/aws/s3.go (1)
178-181: 🧹 Nitpick | 🔵 TrivialReplace
log.Printfcalls with structured logging in S3Uploader.The
log.Printfstatements at lines 178-181, 288-302, and 336-342 use unstructured logging, making errors difficult to query and monitor. RefactorNewS3Uploaderto accept alogr.Logger(orcontext.Contextto extract logger) and replace alllog.Printfcalls with structured logging:log.Printf("Warning: failed to initialize sanitizer, uploads will not be redacted: %v", err)→ Replace with
logger.WithValues("file", "s3.go").Info("sanitizer initialization failed", "error", err)or similar.This aligns with the pattern used throughout
pkg/common(executor.go, junit.go, rosa.go) where loggers are passed to constructors.🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the rest with a brief reason, keep changes minimal, and validate. In `@pkg/common/aws/s3.go` around lines 178 - 181, The NewS3Uploader function uses unstructured log.Printf calls which make errors difficult to monitor and query. Refactor the NewS3Uploader constructor to accept a logr.Logger parameter, then replace all three occurrences of log.Printf statements (the sanitizer initialization warning and the other error logging locations throughout the function) with structured logging calls using logger.Info or logger.Error with appropriate key-value pairs to convey operation context and error details.Source: Coding guidelines
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@pkg/common/aws/s3_test.go`:
- Around line 173-202: The test fixture in the secretContent variable contains
hard-coded secret-shaped literals (the GitHub token starting with ghp_) that
trigger security scanners. Instead of using literal secret values in the
secretContent string initialization, construct the AWS key and GitHub token
values programmatically at runtime by concatenating strings or generating them
from parts. This allows the sanitizer to still be tested with appropriately
formatted secrets without committing scanner-triggering literals, while keeping
the prepareUploadBody method and subsequent validation logic unchanged.
- Around line 220-239: The test is not properly validating the fail-open path
when sanitization fails. First, check the error returned from sanitizer.New
(instead of ignoring it with _) and assert it is nil to confirm the sanitizer
was created successfully. Then, improve the final assertion after
io.ReadAll(body) to verify that the returned raw fallback content exactly
matches the original content passed in, rather than just checking that it is
non-empty. This ensures the test actually proves the sanitizer-error path and
that raw content is used as the fallback without modification.
In `@pkg/common/aws/s3.go`:
- Around line 328-347: The prepareUploadBody method loads entire files into
memory with os.ReadFile before checking file size, which can cause memory
exhaustion on large artifacts. Refactor by first using os.Stat on the filePath
to check the file size, and define a bounded sanitization limit. If the file
exceeds this limit, stream the raw file content directly without loading it into
memory. Only read files smaller than the limit into memory for sanitization
processing. This ensures large artifacts are handled efficiently without
exhausting the uploader process memory.
---
Nitpick comments:
In `@pkg/common/aws/s3.go`:
- Around line 178-181: The NewS3Uploader function uses unstructured log.Printf
calls which make errors difficult to monitor and query. Refactor the
NewS3Uploader constructor to accept a logr.Logger parameter, then replace all
three occurrences of log.Printf statements (the sanitizer initialization warning
and the other error logging locations throughout the function) with structured
logging calls using logger.Info or logger.Error with appropriate key-value pairs
to convey operation context and error details.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Repository YAML (base), Central YAML (inherited)
Review profile: CHILL
Plan: Enterprise
Run ID: 0cf699c8-9a40-41da-8f5f-0b18294be323
📒 Files selected for processing (4)
pkg/common/aws/s3.gopkg/common/aws/s3_test.gopkg/e2e/adhoctestimages/adhoctestimages.gopkg/e2e/e2e.go
| // prepareUploadBody reads a file and redacts secrets before upload. | ||
| // Falls back to raw content if sanitizer is nil or sanitization fails. | ||
| func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (*bytes.Reader, int64, error) { | ||
| content, err := os.ReadFile(filePath) | ||
| if err != nil { | ||
| return nil, 0, err | ||
| } | ||
|
|
||
| if u.sanitizer != nil { | ||
| if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil { | ||
| log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err) | ||
| } else { | ||
| if result.MatchesFound > 0 { | ||
| log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound) | ||
| } | ||
| content = []byte(result.Content) | ||
| } | ||
| } | ||
|
|
||
| return bytes.NewReader(content), int64(len(content)), nil |
There was a problem hiding this comment.
Avoid loading every artifact fully into memory.
os.ReadFile runs before any sanitizer limit can fail open, so one large report artifact can exhaust the uploader process. Stat first and stream raw content above a bounded sanitization limit; only read small artifacts into memory.
Suggested direction
+const maxSanitizableArtifactBytes int64 = 50 << 20
+
-func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (*bytes.Reader, int64, error) {
+func (u *S3Uploader) prepareUploadBody(filePath, relPath string) (io.ReadCloser, int64, error) {
+ info, err := os.Stat(filePath)
+ if err != nil {
+ return nil, 0, err
+ }
+
+ if u.sanitizer == nil || info.Size() > maxSanitizableArtifactBytes {
+ body, err := os.Open(filePath)
+ if err != nil {
+ return nil, 0, err
+ }
+ return body, info.Size(), nil
+ }
+
content, err := os.ReadFile(filePath)
if err != nil {
return nil, 0, err
}
- if u.sanitizer != nil {
- if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil {
- log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err)
- } else {
- if result.MatchesFound > 0 {
- log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
- }
- content = []byte(result.Content)
+ if result, err := u.sanitizer.SanitizeText(string(content), relPath); err != nil {
+ log.Printf("Warning: sanitization failed for %s, uploading raw content: %v", relPath, err)
+ } else {
+ if result.MatchesFound > 0 {
+ log.Printf("Sanitized %s: redacted %d secret(s)", relPath, result.MatchesFound)
}
+ content = []byte(result.Content)
}
- return bytes.NewReader(content), int64(len(content)), nil
+ return io.NopCloser(bytes.NewReader(content)), int64(len(content)), nil
}- body, size, prepErr := u.prepareUploadBody(filePath, relPath)
+ body, size, prepErr := u.prepareUploadBody(filePath, relPath)
if prepErr != nil {
log.Printf("Warning: failed to prepare %s: %v", filePath, prepErr)
return nil
}
- _, uploadErr := u.uploader.Upload(&s3manager.UploadInput{
+ _, uploadErr := u.uploader.Upload(&s3manager.UploadInput{
Bucket: aws.String(u.bucket),
Key: aws.String(s3Key),
Body: body,
ContentType: aws.String(contentType),
})
+ closeErr := body.Close()
if uploadErr != nil {
log.Printf("Warning: failed to upload %s: %v", filePath, uploadErr)
return nil
}
+ if closeErr != nil {
+ log.Printf("Warning: failed to close %s: %v", filePath, closeErr)
+ return nil
+ }🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@pkg/common/aws/s3.go` around lines 328 - 347, The prepareUploadBody method
loads entire files into memory with os.ReadFile before checking file size, which
can cause memory exhaustion on large artifacts. Refactor by first using os.Stat
on the filePath to check the file size, and define a bounded sanitization limit.
If the file exceeds this limit, stream the raw file content directly without
loading it into memory. Only read files smaller than the limit into memory for
sanitization processing. This ensures large artifacts are handled efficiently
without exhausting the uploader process memory.
d1e6338 to
c2ffe3f
Compare
|
/pipeline required |
|
Scheduling required tests: |
|
@YiqinZhang: all tests passed! Full PR test history. Your PR dashboard. DetailsInstructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
Summary
osde2e-logsbucketTest plan
prepareUploadBody(secret redaction, fail-open, nil sanitizer)go vetcleanJIRA: https://issues.redhat.com/browse/SDCICD-1858
Co-authored-by: Cursor
Summary by CodeRabbit
Release Notes
New Features
Tests