Skip to content

feat: Create and push LCB container#335

Merged
arekay-nv merged 7 commits into
mainfrom
arekay/lcb_container
Jun 6, 2026
Merged

feat: Create and push LCB container#335
arekay-nv merged 7 commits into
mainfrom
arekay/lcb_container

Conversation

@arekay-nv
Copy link
Copy Markdown
Collaborator

@arekay-nv arekay-nv commented Jun 4, 2026

What does this PR do?

Adds script to build and create LCB container which can be used by the accuracy implementations to get LCB score. The implementation uses the src/inference_endpoint/evaluation/livecodebench/lcb_serve.dockerfile to build the container and push to the specified repository.

Type of change

  • Bug fix
  • New feature
  • Documentation update
  • Refactor/cleanup

Related issues

Testing

  • Tests added/updated
  • All tests pass locally
  • Manual testing completed

Checklist

  • Code follows project style
  • Pre-commit hooks pass
  • Documentation updated (if needed)

@arekay-nv arekay-nv requested a review from a team June 4, 2026 01:53
@github-actions github-actions Bot requested a review from nvzhihanj June 4, 2026 01:53
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 4, 2026

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces scripts (push_image.sh, pull_image.sh, and _image_env.sh) and updates documentation to facilitate building, pushing, and pulling a self-contained LiveCodeBench evaluator image, including support for cross-architecture builds. It also optimizes the dataset generation script (generate.py) to stream rows individually to prevent out-of-memory (OOM) errors. The review feedback suggests security enhancements in push_image.sh to avoid writing the sensitive HF_TOKEN to disk by leveraging Docker BuildKit's native environment variable secret resolution. Additionally, it recommends memory optimizations in generate.py by dropping heavy columns when test cases are not saved and deleting the row reference to release memory more effectively.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/inference_endpoint/evaluation/livecodebench/push_image.sh Outdated
Comment on lines +154 to +163
secret_file="$(mktemp)"
trap 'rm -f "$secret_file"' EXIT
printf '%s' "$HF_TOKEN" >"$secret_file"

echo ">> Building ${LCB_LOCAL_TAG} ..."
docker build \
-f "${SCRIPT_DIR}/lcb_serve.dockerfile" \
--secret "id=HF_TOKEN,src=${secret_file}" \
-t "$LCB_LOCAL_TAG" \
"$SCRIPT_DIR"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

security-high high

Similarly to the cross-architecture build path, we can avoid writing the sensitive HF_TOKEN to a temporary file on disk for native builds by exporting HF_TOKEN and using --secret id=HF_TOKEN directly.

Suggested change
secret_file="$(mktemp)"
trap 'rm -f "$secret_file"' EXIT
printf '%s' "$HF_TOKEN" >"$secret_file"
echo ">> Building ${LCB_LOCAL_TAG} ..."
docker build \
-f "${SCRIPT_DIR}/lcb_serve.dockerfile" \
--secret "id=HF_TOKEN,src=${secret_file}" \
-t "$LCB_LOCAL_TAG" \
"$SCRIPT_DIR"
echo ">> Building ${LCB_LOCAL_TAG} ..."
export HF_TOKEN
docker build \
-f "${SCRIPT_DIR}/lcb_serve.dockerfile" \
--secret id=HF_TOKEN \
-t "$LCB_LOCAL_TAG" \
"$SCRIPT_DIR"

Comment thread src/inference_endpoint/evaluation/livecodebench/generate.py
logger.info(f"Saved test cases to {test_cases_dir}")
# Release the decompressed test cases before the next iteration so the peak
# stays bounded by a single problem.
del public_cases, private_cases, test_case_data
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

To fully release all references to the heavy columns before the next iteration, we should also delete the row dictionary. Since row contains the raw, large compressed strings for public_test_cases and private_test_cases, keeping it bound until the next iteration means those raw strings remain in memory longer than necessary.

Suggested change
del public_cases, private_cases, test_case_data
del public_cases, private_cases, test_case_data, row

@arekay-nv arekay-nv force-pushed the arekay/lcb_container branch from 4412569 to 0d2f16c Compare June 4, 2026 04:17
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv force-pushed the arekay/lcb_container branch from 0d2f16c to 40b59b7 Compare June 4, 2026 04:54
arekay-nv added 2 commits June 4, 2026 07:54
Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
Copy link
Copy Markdown
Collaborator

@nvzhihanj nvzhihanj left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Council — Multi-AI Code Review

Reviewed by: Codex (--yolo) + Claude · Depth: standard

Found 4 issues — 1 posted inline, 3 low-severity advisories in the summary comment below.

Comment thread src/inference_endpoint/evaluation/livecodebench/generate.py Outdated
@nvzhihanj
Copy link
Copy Markdown
Collaborator

Review Council — Summary

Reviewed by: Codex (--yolo) + Claude · Depth: standard

Found 4 issues — 1 posted inline (medium), 3 low-severity advisories below.

# File Line Severity Category Reviewer Summary
1 generate.py 213 medium data-integrity Codex --max-samples 0 → empty recordsfrom_records([]) writes a schemaless parquet (old df.iloc[[]] kept columns); downstream KeyError. (posted inline)
2 pull_image.sh 81 low documentation Codex --no-local-tag skips creating ${LCB_LOCAL_TAG} (lines 57–59), but the final "Run with:" echo still prints docker run … ${LCB_LOCAL_TAG} — a non-runnable command. Print ${LCB_IMAGE_REF} instead.
3 generate.py 149 low design Claude Memory fix depends on the private builder._writer_batch_size; a future datasets rename would silently no-op and re-introduce the OOM. Mitigated by pinned datasets==3.6.0; consider asserting hasattr(...).
4 push_image.sh 23 low documentation Claude Header comment says --install arm64 while the runtime error message (126) and README (221) say --install all. Make the three consistent.

On the existing gemini-code-assist[bot] security-high comments (push_image.sh:143, :164): the council reviewed these and assesses them as low / stylistic, not security-high. The current mktemp (mode 0600) + printf '%s' (no trailing newline) + --secret id=HF_TOKEN,src=… + trap … EXIT cleanup is a standard, acceptable way to pass a BuildKit build secret; the suggested env-based --secret id=HF_TOKEN is functionally equivalent in exposure (token still lives in the process environment) rather than a meaningful hardening. Not blocking.

Signed-off-by: Rashid Kaleem <230885705+arekay-nv@users.noreply.github.com>
@arekay-nv arekay-nv merged commit 58b619c into main Jun 6, 2026
8 checks passed
@arekay-nv arekay-nv deleted the arekay/lcb_container branch June 6, 2026 15:16
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 6, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants