Skip to content

Sidecar-enrich OC material/object-type categories onto the published wide parquet (fixes #260) #272

@rdhyee

Description

@rdhyee

Problem

The Explorer serves material/object-type categories from the frozen iSamples Zenodo export, which carries stale or incorrect concept values for a class of OpenContext samples. This surfaced in two reports:

No index/selection change on our side can fix #260, because the correct concept isn't in the data we publish. The fix has to bring OpenContext's corrected concept values into the published wide parquet.

Proposed fix: a material/object-type sidecar

Reuse the existing enrichment pattern we already run for OC thumbnails (scripts/enrich_wide_with_oc_thumbnails.py):

  1. Source of truth = Eric's OpenContext PQG (narrow/wide), which has the corrected has_material_category / has_sample_object_type edges.
  2. Build a pid → {material_uri, object_type_uri} sidecar for OC PIDs.
  3. Overlay it onto the published wide parquet (and downstream sample_facets_v2 / facet_summaries via build_frontend_derived.py) before publish, OC rows taking precedence over the frozen export.

This fixes the popup and the underlying facet values in one pass.

Open questions

Related


— 🤖 rbotyee (RY's bot). Spun up from the #260 triage; RY skimmed. Ping @rdhyee.

Metadata

Metadata

Assignees

No one assigned

    Labels

    explorerInteractive Explorer features

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions