Skip to content

feat(risk): cross-source identity keying — recover risk for ~10% of repos#41

Merged
saagpatel merged 1 commit into
mainfrom
feat-risk-identity-keying
Jun 3, 2026
Merged

feat(risk): cross-source identity keying — recover risk for ~10% of repos#41
saagpatel merged 1 commit into
mainfrom
feat-risk-identity-keying

Conversation

@saagpatel
Copy link
Copy Markdown
Owner

Arc G — risk identity keying (cross-source)

Fixes the pre-existing limitation surfaced by Arc B's code review: risk is computed in portfolio truth (keyed by local-dir display_name) but every render consumer iterates audit data (keyed by GitHub metadata.name). For repos whose dir name ≠ GitHub repo name, risk rendered blank on every surface — including the already-shipped All Repos column.

Fix (mirrors the existing _select_security_entry GHAS join)

  • Persist the GitHub slug into truth: add IdentityFields.repo_full_name (additive, from _git_remote_full_name, already captured at scan time but previously dropped).
  • Multi-key the risk lookups by slug + display_name in both build_risk_lookup (report_enrichment.py) and load_risk_truth (excel_export_truth_helpers.py), so consumers keying by metadata.name resolve.
  • Aggregate safety: the slug alias shares the same entry object; _extract_risk_posture dedups by identity and load_risk_truth increments posture once per project — no double-counting (verified: posture sum == project count).

Measured impact

On a real audit report, repos resolving risk went 86% → 96%12 repos recovered (signal-noise, devils-advocate, PhantomFrequencies, seismoscope, …) that were silently blank everywhere. 121/129 truth projects now carry the slug (the 8 without have no GitHub remote).

Verification

  • 2222 passed, 2 skipped; ruff clean.
  • New tests: slug-keying + display-name-keying both resolve; posture not inflated by aliases.
  • Additive schema field (backward-compatible; old snapshots default to "").

@saagpatel saagpatel merged commit b95c468 into main Jun 3, 2026
2 checks passed
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 91d05470ad

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread src/report_enrichment.py
Comment on lines +191 to +193
slug = str(identity.get("repo_full_name") or "").rsplit("/", 1)[-1]
if slug and slug not in lookup:
lookup[slug] = entry
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Deduplicate risk aliases before aggregating

When a truth project has both a display name and a different repo_full_name slug, this new alias makes build_risk_lookup() return two keys for the same project. _extract_risk_posture() was updated to dedupe by object identity, but other existing consumers still aggregate by iterating the returned map directly; for example export_html_dashboard() counts every _risk_lookup.items() entry in src/web_export.py:108-111, and the Markdown reporter similarly iterates risk_lookup.values() in src/reporter.py:1684-1688. In those surfaces, a single elevated repo like Signal & Noise / signal-noise is now reported as two elevated repos and can appear twice in the elevated list.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant