feat: faculty-agnostic researcher-tree builder + integrity check#64
Open
ValentinJSchmidt wants to merge 8 commits into
Open
feat: faculty-agnostic researcher-tree builder + integrity check#64ValentinJSchmidt wants to merge 8 commits into
ValentinJSchmidt wants to merge 8 commits into
Conversation
Captures current PhD rosters for the three pilot chairs as the recall benchmark for PhD discovery (#47): - Autonomous Learning / Distributed Intelligence (Georg Martius) - Theory of Machine Learning (Ulrike von Luxburg) - Autonomous Vision Group (Andreas Geiger) Rosters were captured from each chair's official team page on 2026-06-22 via automated fetch. NOT final: entries still need human verification against the live pages before this is treated as authoritative ground truth. status values active/incoming/associated/ former are documented in the file header; only 'active' counts as a recall target. Postdocs, research engineers and admin staff excluded. Refs #46 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Ground-truth PhD fixture for the 3 pilot chairs drafted and committed as WIP; pending human verification. Refs #46 Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Former PhD alumni are removed from the ground-truth roster entirely. People listed under a chair's 'Researcher' role are now treated as active PhDs (Martius: Kloss, Kolev, Geist), while research engineers remain excluded. Martius roster: 20 active, 0 former. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Add a 'postdoc' status counted as a recall target. von Luxburg gains 5 postdocs (Bhattacharjee, Bordt, König, Thiessen, Waller) confirmed from the live team page. Martius and Geiger have no current postdoc section, so none added there. Also backfill profile URLs for the three Martius Researchers (Kloss, Kolev, Geist) found on the team page. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Turn the OpenAlex index builder into a script the agent can run to build or validate any faculty's Markdown tree, instead of relying on hardcoded Tuebingen paths and a fixed column layout. - add --researchers-index / --chairs-index / --papers-dir to target any faculty - parse tables by column name, so extra faculty columns do not break readers - add validate_references + --validate-only: checks every chair/researcher/paper link resolves and exits non-zero on orphans - wire the check into CI via skills/tests/test_tree_integrity.py - document reuse + integrity rules in the schema doc, SKILL.md, and AGENTS.md PhD-level schema fields (role/advisor edge) are deferred to a follow-up. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Adding findings/ left setuptools with two top-level dirs (skills, findings) and no way to choose, so `pip install -e ".[dev]"` failed in CI. The repo is not an importable package, so declare no py-modules to skip discovery. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Turns the OpenAlex index builder into a script the agent can run to build or check any faculty's researcher tree, instead of being tied to hardcoded Tübingen paths and a fixed set of columns.
Part of #48 (tree schema + referential integrity). The PhD-level fields (role / advisor edge) are left for a follow-up.
What changed
--researchers-index,--chairs-index,--papers-dirpoint the builder at any faculty's files. Defaults still point at the bundled Tübingen data.--validate-onlymode checks that every link resolves (chair→researcher, researcher→chair, paper→person, paper→chair) and exits with an error code if anything is orphaned.skills/tests/test_tree_integrity.pyruns the check on the real tree and proves it fails on broken links.update-openalex-paper-index/SKILL.md, andAGENTS.mdexplain the rules and how to reuse the builder for a new faculty.How to test
pip install -e ".[dev]"(or justpip install pytest).python -m pytest -q→ all pass (includes 5 new tree-integrity tests).python scripts/update_openalex_index.py --validate-only→ prints nothing, exits0.researchersslug to a name that doesn't exist, then run--validate-onlypointed at that folder → it printsERROR: paper '...' references missing person ...and exits2.--*-index/--papers-dirflags at another folder of the same Markdown shape → builds/validates with no code change.🤖 Generated with Claude Code