Skip to content

Feat/program study area scrape#15

Merged
ydankner merged 3 commits into
mainfrom
feat/program-study-area-scrape
Jun 26, 2026
Merged

Feat/program study area scrape#15
ydankner merged 3 commits into
mainfrom
feat/program-study-area-scrape

Conversation

@ydankner

Copy link
Copy Markdown
Collaborator

No description provided.

Kyotoo-chan and others added 3 commits June 26, 2026 10:46
Crawl the M.Sc. CS / B.Sc. Informatik / M.Sc. ML studiesOffered branches
alongside the VVZ "Gesamtverzeichnis" branch so courses cross-listed from
other faculties (KOG/GTCNEURO/MEDZ/BIOINF) are enumerated. Courses are
deduplicated by unit_id and detail pages are fetched once via the new
ScrapeOptions.skip_unit_ids; --no-programs restores VVZ-only behaviour.

Alias the codes that differ from study_areas.code in the D1 import so the
new courses link to the right study area: M.Sc. ML MACH-* -> ML-* and
B.Sc. Wahlpflicht INFM#### -> PRAK/THEO/TECH/INFO.

Add tqdm progress bars (outer "semesters", inner per-branch "details");
log lines go through tqdm.write so they don't corrupt the bars and --quiet
disables them. Add data_collection/CLAUDE.md and update QUICKSTART.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- Fix the per-branch `partial` flag to also reflect the --max-expansions
  crawl limit (it previously tracked only the runtime limit), so a branch
  that stops early is not silently recorded complete and skipped by
  --continue.
- Program-scope the study-area alias join (JOIN study_programs) so the
  generic B.Sc. codes PRAK/THEO/TECH/INFO cannot mislink a course to
  another program that happens to reuse the same bare code.
- Close both tqdm bars via context managers so an exception mid-run no
  longer leaks a bar / corrupts the terminal.
- Fix progress-label example in the ScrapeOptions docstring.
- Correct the module name in QUICKSTART.md / SETUP.md (alma.cli, not the
  non-existent alma_scraper.cli).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 26, 2026

Copy link
Copy Markdown

Deploying studyplaner with  Cloudflare Pages  Cloudflare Pages

Latest commit: 6133ead
Status: ✅  Deploy successful!
Preview URL: https://a44565ce.studyplaner.pages.dev
Branch Preview URL: https://feat-program-study-area-scra.studyplaner.pages.dev

View logs

@ydankner ydankner merged commit fe4dcef into main Jun 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants