Feat/catalog content sections#14
Merged
Merged
Conversation
The catalog Contents field only rendered for courses whose ALMA "Inhalte"
tab was a single unlabelled blob (stored as one section literally titled
"Inhalte"). Courses with structured labelled sub-boxes (Lernziele,
Qualifikationsziel, ...) — the majority — produced no "Inhalte"-titled
section, so Contents was empty even though the scraped data was present.
Replace the exact-title `_extract_contents` with `_build_content_sections`,
which returns every content block (title + text), de-duped against the
Description (shown verbatim) and Prerequisites (their own section) and with
the heading the scraper duplicates into the body stripped. Contents becomes a
list of {title, text}; the frontend renders each block with a sub-heading,
suppressing the generic "Inhalte" wrapper title for unstructured courses.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
parse_content_page skipped every content box titled "Inhalte", assuming it
was tab-navigation chrome. For structured courses (the majority) that box is
the actual syllabus, so the real contents (e.g. INF4151's "Aufbauend auf ...")
were never scraped — only the labelled sibling fields (Lernziele, ...) were.
Stop dropping "Inhalte" boxes; they have the same shape as the other labelled
fields and flow through the existing title-strip and dedup unchanged. Only the
genuine sibling tab panes ("Semesterplanung", "Weitere Funktionen") stay
skipped, and empty/chrome-only "Inhalte" boxes still collapse to the fallback.
Add a regression test driven by INF4151's real contents-tab HTML (saved as a
fixture): it fails before the change (no "Inhalte" section) and passes after,
while the labelled fields keep being captured.
Note: production D1 only reflects this after a re-scrape + re-import.
Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Capture the non-obvious facts a future agent needs: the catalog is a snapshot so scraper changes require a re-scrape + re-import to affect live data; the ALMA catalog is public and parse_content_page is the pure testable seam (with fixtures under data_collection/alma/tests/); and the local-dev API base URL trap plus the wrangler --remote preview-token refresh failure mode. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Deploying studyplaner with
|
| Latest commit: |
bbefd40
|
| Status: | ✅ Deploy successful! |
| Preview URL: | https://d0c8c9a4.studyplaner.pages.dev |
| Branch Preview URL: | https://feat-catalog-content-section.studyplaner.pages.dev |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.