Skip to content

data: ground-truth PhD roster fixture for 3 pilot chairs#57

Open
ValentinJSchmidt wants to merge 4 commits into
mainfrom
data/ground-truth-phds-pilot-chairs
Open

data: ground-truth PhD roster fixture for 3 pilot chairs#57
ValentinJSchmidt wants to merge 4 commits into
mainfrom
data/ground-truth-phds-pilot-chairs

Conversation

@ValentinJSchmidt

Copy link
Copy Markdown
Collaborator

Summary

Adds a hand-curated ground-truth PhD roster fixture for the 3 pilot chairs and marks Step 2 (#46) as in progress in STATUS. This is the recall benchmark that makes "none forgotten" measurable for PhD discovery (#47).

Motivation

PhD discovery (#47) has no objective target to measure recall against — OpenAlex exposes no supervisor→PhD edge. This fixture captures the official team-page rosters so any future discovery method can be scored against a known set.

What Changed

  • skills/tests/fixtures/ground_truth_phds.json — new fixture covering:
    • Martius / Autonomous Learning — 17 active, 4 alumni
    • von Luxburg / Theory of ML — 3 active, 2 incoming, 1 associated
    • Geiger / Autonomous Vision — 15 active
    • Each entry records status (active/incoming/associated/former), role_text, profile/evidence URLs, and a verification_note. Postdocs, group leaders, and staff are intentionally excluded with rationale in each chair's notes. Only active counts as a recall target.
  • STATUS.md — Step 2 (data: build hand-verified ground-truth roster for 3 pilot chairs #46) → 🟨 in progress, plus a dated log line.

Known Issues / Not Yet Done

  • Rosters were auto-captured from official team pages and still need human verification before the fixture is authoritative.
  • Some entries are deliberately ambiguous and excluded pending clarification (e.g. Felix Kloss in the Martius group) — documented in notes.
  • Individual PhD profile URLs are missing where the team page does not expose them.

Part of #46.

ValentinJSchmidt and others added 4 commits June 22, 2026 23:00
Captures current PhD rosters for the three pilot chairs as the recall
benchmark for PhD discovery (#47):
- Autonomous Learning / Distributed Intelligence (Georg Martius)
- Theory of Machine Learning (Ulrike von Luxburg)
- Autonomous Vision Group (Andreas Geiger)

Rosters were captured from each chair's official team page on
2026-06-22 via automated fetch. NOT final: entries still need
human verification against the live pages before this is treated as
authoritative ground truth. status values active/incoming/associated/
former are documented in the file header; only 'active' counts as a
recall target. Postdocs, research engineers and admin staff excluded.

Refs #46

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ground-truth PhD fixture for the 3 pilot chairs drafted and
committed as WIP; pending human verification.

Refs #46

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Former PhD alumni are removed from the ground-truth roster entirely.
People listed under a chair's 'Researcher' role are now treated as
active PhDs (Martius: Kloss, Kolev, Geist), while research engineers
remain excluded. Martius roster: 20 active, 0 former.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a 'postdoc' status counted as a recall target. von Luxburg gains
5 postdocs (Bhattacharjee, Bordt, König, Thiessen, Waller) confirmed
from the live team page. Martius and Geiger have no current postdoc
section, so none added there. Also backfill profile URLs for the three
Martius Researchers (Kloss, Kolev, Geist) found on the team page.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@ValentinJSchmidt

Copy link
Copy Markdown
Collaborator Author

This is now also manually approved

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant