Skip to content

Feat/catalog content sections#14

Merged
ydankner merged 3 commits into
mainfrom
feat/catalog-content-sections
Jun 26, 2026
Merged

Feat/catalog content sections#14
ydankner merged 3 commits into
mainfrom
feat/catalog-content-sections

Conversation

@ydankner

Copy link
Copy Markdown
Collaborator

No description provided.

ydankner and others added 3 commits June 24, 2026 18:19
The catalog Contents field only rendered for courses whose ALMA "Inhalte"
tab was a single unlabelled blob (stored as one section literally titled
"Inhalte"). Courses with structured labelled sub-boxes (Lernziele,
Qualifikationsziel, ...) — the majority — produced no "Inhalte"-titled
section, so Contents was empty even though the scraped data was present.

Replace the exact-title `_extract_contents` with `_build_content_sections`,
which returns every content block (title + text), de-duped against the
Description (shown verbatim) and Prerequisites (their own section) and with
the heading the scraper duplicates into the body stripped. Contents becomes a
list of {title, text}; the frontend renders each block with a sub-heading,
suppressing the generic "Inhalte" wrapper title for unstructured courses.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
parse_content_page skipped every content box titled "Inhalte", assuming it
was tab-navigation chrome. For structured courses (the majority) that box is
the actual syllabus, so the real contents (e.g. INF4151's "Aufbauend auf ...")
were never scraped — only the labelled sibling fields (Lernziele, ...) were.

Stop dropping "Inhalte" boxes; they have the same shape as the other labelled
fields and flow through the existing title-strip and dedup unchanged. Only the
genuine sibling tab panes ("Semesterplanung", "Weitere Funktionen") stay
skipped, and empty/chrome-only "Inhalte" boxes still collapse to the fallback.

Add a regression test driven by INF4151's real contents-tab HTML (saved as a
fixture): it fails before the change (no "Inhalte" section) and passes after,
while the labelled fields keep being captured.

Note: production D1 only reflects this after a re-scrape + re-import.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Capture the non-obvious facts a future agent needs: the catalog is a snapshot
so scraper changes require a re-scrape + re-import to affect live data; the
ALMA catalog is public and parse_content_page is the pure testable seam (with
fixtures under data_collection/alma/tests/); and the local-dev API base URL
trap plus the wrangler --remote preview-token refresh failure mode.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@cloudflare-workers-and-pages

cloudflare-workers-and-pages Bot commented Jun 26, 2026

Copy link
Copy Markdown

Deploying studyplaner with  Cloudflare Pages  Cloudflare Pages

Latest commit: bbefd40
Status: ✅  Deploy successful!
Preview URL: https://d0c8c9a4.studyplaner.pages.dev
Branch Preview URL: https://feat-catalog-content-section.studyplaner.pages.dev

View logs

@ydankner ydankner merged commit 4dcb6f4 into main Jun 26, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant