Antalya 26.3: Fix empty partition_key and sorting_key in system.table…#1874
Open
il9ue wants to merge 1 commit into
Open
Antalya 26.3: Fix empty partition_key and sorting_key in system.table…#1874il9ue wants to merge 1 commit into
il9ue wants to merge 1 commit into
Conversation
…s for Iceberg tables without data snapshots Changelog category: Bug Fix Changelog entry: Fixed `system.tables.partition_key` and `system.tables.sorting_key` returning empty strings for Iceberg tables that have no data snapshot, including all empty tables and (more frequently) tables accessed via the Glue catalog. The snapshot-existence gate in IcebergMetadata::partitionKey() / sortingKey() was semantically wrong: partition spec and sort order are table-level properties recorded at the top level of the Iceberg metadata file (`default-spec-id`, `default-sort-order-id`) and exist independently of whether any data snapshot has been written. Also adds a defensive guard in getSortingKeyDescriptionFromMetadata against Iceberg V1 metadata files missing `sort-orders`, which becomes reachable for empty tables after this fix. Closes #1235.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #1235. Supersedes #1819 — reopened from a branch in the Altinity/ClickHouse repo so CI exposes direct .deb package URLs for clickhouse-regression (fork PRs only produce GitHub Actions artifact zips).
Summary
SELECT partition_key, sorting_key FROM system.tablesreturned empty strings for Iceberg tables that had no data snapshot. Reliably observable via the Glue catalog (itsmetadata_locationmore frequently points at a snapshot-free metadata file), but also reproduced for any empty Iceberg table regardless of catalog (REST, Glue, or directIcebergS3).Root cause
IcebergMetadata::partitionKey()andIcebergMetadata::sortingKey()(#959, refined in #1026, ported to 25.8 in #1095) gated their work on the existence of a data snapshot:This is semantically wrong. Partition spec and sort order are table-level properties recorded at the top level of the Iceberg metadata file (
default-spec-id,default-sort-order-id,partition-specs,sort-orders) and exist independently of whether any data snapshot has been written.getState()populatesactual_table_state_snapshot(schema_id,metadata_file_path,metadata_version) regardless of snapshot existence; onlysnapshot_idisstd::nullopt, and that field is never read bygetPartitionKey()/getSortingKey(). The gate was dead-gating valid data; the fix removes it.Change list
src/Storages/ObjectStorage/DataLakes/Iceberg/IcebergMetadata.cpp— removed theif (!actual_data_snapshot)early return inpartitionKey()andsortingKey().src/Storages/ObjectStorage/DataLakes/Iceberg/Utils.cpp—getSortingKeyDescriptionFromMetadata()now guards onhas(sort-orders)/has(default-sort-order-id). Pre-existing null-deref, previously unreachable behind the snapshot gate; after removing the gate, empty Iceberg V1 tables withoutsort-orders(optional in V1, required from V2) would hit it. Mirrors the shape ingetSortingKeyDisplayStringFromMetadata.No header changes. No
StorageSystemTables.cppchanges — the #1210 (Glue segfault) null/exception guards remain untouched.Behavior preservation
NULLS FIRST/NULLS LAST).getPartitionKeyStringFromMetadataalready guards on missingpartition-specs).StorageSystemTables.cppremain in place.Out of scope
Glue's
metadata_locationpointer can lag schema-evolution events, which could surface a stale spec. Orthogonal to the snapshot gate; not addressed here.Test plan
New regression test reproduces the root cause without a catalog mock: creates an Iceberg table with a non-trivial partition spec and sort order, asserts
system.tables.partition_key/sorting_keyare non-empty before any insert. Existingtest_system_tables_partition_sorting_keyscontinues to pass with byte-identical output.Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Fixed
system.tables.partition_keyandsystem.tables.sorting_keyreturning empty strings for Iceberg tables that have no data snapshot, including all empty tables and (more frequently) tables accessed via the Glue catalog. Also added a defensive guard against Iceberg V1 metadata files missingsort-orders.Documentation entry for user-facing changes
Not required — bug fix to existing
system.tablescolumns; no new user-facing surface.