Skip to content

HIVE-29578: Iceberg: add support for logical views#6449

Open
difin wants to merge 6 commits into
apache:masterfrom
difin:iceberg_native_views
Open

HIVE-29578: Iceberg: add support for logical views#6449
difin wants to merge 6 commits into
apache:masterfrom
difin:iceberg_native_views

Conversation

@difin
Copy link
Copy Markdown
Contributor

@difin difin commented Apr 23, 2026

What changes were proposed in this pull request?

Added support for Iceberg logical views in Hive for both HMS and REST catalogs.

Why are the changes needed?

To support Iceberg logical views. This can be especially useful for REST Catalog clients.

Does this PR introduce any user-facing change?

Yes, new HQL syntax:

  1. A regular Hive view syntax with tblproperties ('view-format'='iceberg'):
create [or replace] view [if not exists] <NAME> tblproperties ('view-format'='iceberg') 
as select <COLUMNS> from <SRC_TABLE> where <CONDITION>;
  1. Regular Hive syntax (without tblproperties ('view-format'='iceberg')) and Hive config hive.default.storage.handler.class pointing to a storage handler that supports external logical views:
set hive.default.storage.handler.class=org.apache.iceberg.mr.hive.HiveIcebergStorageHandler;

create [or replace] view [if not exists] <NAME> 
as select <COLUMNS> from <SRC_TABLE> where <CONDITION>;

How was this patch tested?

Created new and updated exiting unit and integration tests with Iceberg logical views test cases.

@difin difin force-pushed the iceberg_native_views branch from 4fdad42 to 252c608 Compare April 24, 2026 23:06
@difin difin force-pushed the iceberg_native_views branch from 252c608 to e10eba5 Compare April 24, 2026 23:31
@difin difin marked this pull request as ready for review April 24, 2026 23:31
@difin difin changed the title HIVE-29578: Iceberg: support for Iceberg native views HIVE-29578: Iceberg: support native views Apr 24, 2026
@difin difin changed the title HIVE-29578: Iceberg: support native views HIVE-29578: Iceberg: add support for native views Apr 24, 2026
@difin difin force-pushed the iceberg_native_views branch from e10eba5 to 96fa476 Compare April 25, 2026 20:46
@difin difin requested review from deniskuzZ and kasakrisz April 25, 2026 20:46
@difin difin force-pushed the iceberg_native_views branch from 96fa476 to 114412a Compare April 26, 2026 15:12
Comment thread iceberg/iceberg-handler/src/test/queries/positive/iceberg_native_view.q Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewAnalyzer.java Outdated
result.setLastAccessTime(nowSec);
result.setRetention(Integer.MAX_VALUE);

boolean hiveEngineEnabled = false;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is hiveEngineEnabled and why is it false?

Copy link
Copy Markdown
Contributor Author

@difin difin May 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hiveEngineEnabled switches how HiveOperationsBase.storageDescriptor fills the Storage Desacriptor: with HiveIcebergInputFormat / HiveIcebergOutputFormat / HiveIcebergSerDe when true, or the usual placeholder FileInputFormat / FileOutputFormat / LazySimpleSerDe when false.

Why it’s false in toHiveView:

This path materializes an HMS VIRTUAL_VIEW for REST catalog that expose Iceberg view metadata through the HMS API. That row isn’t meant to drive a Hive table scan the way a real Iceberg table commit does; execution still comes from the view definition / catalog, not from wiring Iceberg MR formats on the stub. HiveViewOperations does the same thing (hiveEngineEnabled = false).

So we keep a minimal SD consistent with normal virtual views and avoid implying this HMS object is an Iceberg-backed table for the Hive engine. For tables, HiveTableOperations still turns engine integration on/off via metadata + ConfigProperties.ENGINE_HIVE_ENABLED where that actually matters.

Comment thread ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewAnalyzer.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewDesc.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewDesc.java Outdated
Comment thread ql/src/java/org/apache/hadoop/hive/ql/ddl/view/create/CreateViewDesc.java Outdated
@Deprecated // kept for backwards reference
REPLACE_VIEW_WITH_MATERIALIZED(10400, "Attempt to replace view {0} with materialized view", true),
REPLACE_MATERIALIZED_WITH_VIEW(10401, "Attempt to replace materialized view {0} with view", true),
VIEW_STORAGE_HANDLER_UNSUPPORTED(10448, "Storage handler {0} doesn't support external logical views", true),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Logical view is maybe a too verbose. I checked the Hive documentation and error messages and I only saw this naming: a view is a view or materialized view.
Technically it is true, it is a logical view. In the naming convention that Hive uses, it is just a view.

Copy link
Copy Markdown
Contributor Author

@difin difin Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me omitting the word 'logical' sounds ambiguous.
Also, we are adding support for more view types and we will have regular hive view, hive matrialized view, Iceberg (logical) view, Iceberg materialized view (your PR), I just thought it will help understanding what view type we are referring to having 'logical' in the name.

@kasakrisz what do you think?

.toList();
.forEach(names::add);

if (restCatalog instanceof ViewCatalog viewCatalog) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it a really interesting question:

In Hive, we consider views as tables. In Iceberg, view is just a view, not a table.
I'm not sure about the purpose of the rest client:
if it serves as an interface for Hive only, adding the views to the tables is fine.
If the goal is to provide an interface to Iceberg, I would add a new endpoint that lists the view.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveRESTCatalogClient is an implementation of IMetaStoreClient that allows Hive work with REST Catalog servers, so it is for Hive.

public boolean tableExists(String catName, String dbName, String tableName) {
validateCurrentCatalog(catName);
return restCatalog.tableExists(TableIdentifier.of(dbName, tableName));
TableIdentifier id = TableIdentifier.of(dbName, tableName);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a simple instanceof operator can decide if we want to get a table or a view information.
The question is: do we need the extra call in case it is a view, not a table? And how the caller will know if (or even, do the want to know) if the existing object is a view or a table?

Copy link
Copy Markdown
Contributor Author

@difin difin Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like a simple instanceof operator can decide if we want to get a table or a view information.
The question is: do we need the extra call in case it is a view, not a table?

I don't think a simple instanceof operator can decide if we want to get a table or a view information.
RESTCatalog implements both Catalog for tables and ViewCatalog

public class RESTCatalog implements Catalog, ViewCatalog, SupportsNamespaces, Configurable<Object>, Closeable {

And how the caller will know if (or even, do the want to know) if the existing object is a view or a table?

Implementation here in REST Catalog client follows same logic as regular Hive HMS clients which do not care if the existing object is a view or a table:

For example, see SessionHiveMetaStoreClient.tableExists

org.apache.iceberg.Table icebergTable = restCatalog.loadTable(id);
return MetastoreUtil.toHiveTable(icebergTable, conf);
} catch (NoSuchTableException tableMissing) {
if (restCatalog instanceof ViewCatalog viewCatalog) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same as my previous comment: if we can decide where to call with an instanceof, no reason to call loadTable.

Copy link
Copy Markdown
Contributor Author

@difin difin Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above, we can't decide where to call using just an instanceof.

params.get(BaseMetastoreTableOperations.TABLE_TYPE_PROP));
}

@Override
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is about the other overloads of alter_table and alter_table_with_environmentContext?

Copy link
Copy Markdown
Contributor

@InvisibleProgrammer InvisibleProgrammer Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forget about alter_table_with_environmentContext. As I see it is not implemented.
The interesting question is still the alter_table overloads. Now we provide an interface that can actually altering a view OR doing nothing OR throw an UnsupportedOperationException depending on the formal and actual parameters.

Copy link
Copy Markdown
Contributor Author

@difin difin Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The IMetaStoreClient client has approximately 371 interface methods.
HiveRestCatalogClient implements only 16 of them which are needed and in use for REST Catalog use cases.

I didn't encounter cases where other alter_table implementations where needed in REST client tests.


private void createOrReplaceLogicalView(Table table, String dbName, String tableName) {

List<FieldSchema> cols = Lists.newArrayList(table.getSd().getCols());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to create a virtual view that is partitioned?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No - Iceberg API doesn't support it.

    ViewBuilder builder =
        viewCatalog
            .buildView(identifier)
            .withSchema(HiveSchemaUtil.convert(fieldSchemas, Collections.emptyMap(), true))
            .withDefaultNamespace(Namespace.of(identifier.namespace().level(0)))
            .withQuery("hive", viewSql);

public static final String ICEBERG_VIEW_HMS_TABLE_TYPE_VALUE =
HiveOperationsBase.ICEBERG_VIEW_TYPE_VALUE.toUpperCase(java.util.Locale.ENGLISH);

private IcebergNativeLogicalViewSupport() {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing out.
Fixed.

if (hasIcebergNativeViewTableType(table) && restCatalog instanceof ViewCatalog) {
createOrReplaceLogicalView(table, table.getDbName(), table.getTableName());
} else {
List<FieldSchema> cols = Lists.newArrayList(table.getSd().getCols());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracting the createTable logic would improve the readability.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

public void createTable(CreateTableRequest request) throws TException {
Table table = request.getTable();
if (hasIcebergNativeViewTableType(table) && restCatalog instanceof ViewCatalog) {
createOrReplaceLogicalView(table, table.getDbName(), table.getTableName());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it leads to inconsistent behavior: if we call the method on a table, it creates it if it is a new one and throws an AlreadyExistsException if the table already exists and also if a view with the same name is already exists.
But if we call this method on a view, it will create it even if a table exists with the same name and replaces the existing one if the view is already exists.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method cannot be called on a view if it already exists - see #6449 (comment)


String comment = tblProps.get("comment");
IcebergNativeLogicalViewSupport.createOrReplaceNativeView(
conf, dbName, tableName, cols, table.getViewExpandedText(), tblProps, comment);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The view text is pretty interesting difference between Hive and Iceberg:
Hive stores it in two versions: original and expanded text.

Iceberg has only one field, sql.

I think if we want to be compatible with both Hive and Iceberg, I would rather choose the original text here and add the expanded text as a tableproperty.

But also, there is a catch: the sql is stored in ViewVersion so it can differ for each version. But in Hive, it is a single property of the table metadata.

Honestly, I'm not sure what is the correct solution to do that but if we store only one of them, I fear we can loose functionality on Hive side.

Copy link
Copy Markdown
Contributor Author

@difin difin Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe passing table.getViewExpandedText() is more correct because SemanticAnalyzer uses expanded text when resolving views.

On the write view path, view text is stored in HMS mainly for passing it to the IcebergNativeLogicalViewSupport.createOrReplaceNativeView, but on the read path view text is always retrieved from Iceberg API both on HMS and REST paths in BaseHiveIcebergMetaHook:

  @Override
  public void postGetTable(org.apache.hadoop.hive.metastore.api.Table hmsTable) {
    if (hmsTable != null) {
      try {
        if (isNativeIcebergLogicalView(hmsTable)) {
          IcebergNativeLogicalViewSupport.enrichHmsTableFromIcebergView(hmsTable, conf);
          return;
        }

HiveOperationsBase.ICEBERG_VIEW_TYPE_VALUE.toUpperCase(Locale.ENGLISH),
metadata.schema(),
maxHiveTablePropertySize);
parameters.put(hive_metastoreConstants.META_TABLE_STORAGE, HIVE_ICEBERG_STORAGE_HANDLER);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the view is the previous version of view and we store it in Hive?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure how this question is related to the pointed location in code, but same as previous comment - when querying Iceberg logical view it's properties are retrieved from Iceberg and not from HMS.


try {
if (catalog instanceof Closeable closeable) {
try (Closeable ignored = closeable) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure I'm missing something: what is the reason of checking if the catalog is Closeable and ignoring it?

Copy link
Copy Markdown
Contributor Author

@difin difin Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

buildIcebergCatalog gives us a one-off catalog for this call. The Catalog interface isn’t Closeable, but some implementations are (RestCatalog is Closable) and they hold clients/IO that need to be closed when we’re done. HiveCatalog isn’t, so we only wrap when instanceof Closeable.

We are not ignoring it. The try (Closeable ignored = closeable) bit is just try-with-resources, we need a variable so close() runs at the end of the block. We don’t reference it in the body because we already use catalog for loadView. Naming it as ignored is only to signal “this exists for cleanup, not for logic.” Without it you can leak connections on every postGetTable enrich against REST.

tblProperties == null ? Maps.newHashMap() : Maps.newHashMap(tblProperties);

for (Map.Entry<String, String> e : tblProps.entrySet()) {
if (e.getKey() != null && e.getValue() != null) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible having table properties without having a key or value?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is not needed, removed.

result.setTableType(TableType.EXTERNAL_TABLE.toString());
result.setPartitionKeys(getPartitionKeys(table, table.spec().specId()));

// TODO: Revert after HIVE-29633 is fixed
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this comment be a leftover? As I see there a toHiveView method created below. And HIVE-29633 is about views.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, this is intentional and agreed with Denys and Krisztian.

PropertyUtil.propertyAsString(
metadata.properties(), HiveCatalog.HMS_TABLE_OWNER, System.getProperty("user.name"));
result.setOwner(owner);
result.setCreateTime(nowSec);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In Iceberg view, timestamp-ms in the view version stores the create time.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to get it from the Iceberg metadata instead.

null);

// In-memory overlay for compile/describe: authoritative SQL comes from Iceberg metadata.
hmsTable.setViewOriginalText(sqlText);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can it cause any trouble on Hive side of the original text is actually the expanded text?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've implemented a lot of test cases both on HMS and REST paths and no problems are observed.

It is taken from Iceberg metadata:

  public static void applyIcebergViewToHmsTable(Table hmsTable, View view, Configuration conf) {
    ViewMetadata metadata = ((BaseView) view).operations().current();
    String sqlText = viewSqlText(view, metadata);


private void preCreateNativeIcebergLogicalView(CreateTableRequest request) {

org.apache.hadoop.hive.metastore.api.Table hmsTable = request.getTable();
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For tables, we check if the table is already existing. Why don't we need a similar check in case of views?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as #6449 (comment)

}

@Override
public void rollbackCreateTable(org.apache.hadoop.hive.metastore.api.Table hmsTable) {
Copy link
Copy Markdown
Contributor

@InvisibleProgrammer InvisibleProgrammer Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of this change and the removal of the other empty overrides?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code cleanup of empty overrides which were not needed..

throw new IllegalStateException("Hive commit lock should already be set");
}
commitLock.unlock();
if (BaseHiveIcebergMetaHook.isNativeIcebergLogicalView(hmsTable)) {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would add this part before commitLock.unlock. Yes, I know the current implementation uses NoLock for native views but it can lead to hard to detect issues in case we make a decision later to use actual locks for Iceberg views later.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


@Override
public boolean supportsExternalLogicalViewCatalog() {
return true;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it still true in other metastore clients than HiveRESTCatalogClient?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is also true for regular Hive+HMS configuration.
See iceberg_native_logical_view.q - there is no REST client in this test.

desc formatted v_hive;
drop view v_hive;

-----------------------------------------------------------------------------------------
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I love those comments. I wish all qtests would such information.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually the full comment on these lines is

-----------------------------------------------------------------------------------------
-- Replace Iceberg logical view with a Hive-native logical view
-----------------------------------------------------------------------------------------

`dept_id` bigint,
`team_id` bigint,
`company_id` bigint)
PARTITIONED BY (
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason of this removal?

Copy link
Copy Markdown
Contributor Author

@difin difin Jun 3, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is related to your previous comment: #6449 (comment)

Previously, MetastoreUtil.toHiveTable(..) was setting partition keys and this was causing failures in Iceberg views on partitioned base Iceberg tables. In this PR it was decided with Denys and Krisztian to set partition keys as empty in MetastoreUtil.toHiveTable(..) until HIVE-29633 is fixed.

    // TODO: Revert after HIVE-29633 is fixed
    // result.setPartitionKeys(getPartitionKeys(table, table.spec().specId()));
    result.setPartitionKeys(Lists.newArrayList());

# Detailed Table Information
Database: ice_rest
#### A masked pattern was here ####
Retention: 2147483647
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it a timestamp or some remaining time? I wonder if it would worth masking out those values or not to avoid test flakyness.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2147483647 is Integer.MAX_VALUE - the HMS Table.retention field is set in MetastoreUtil.applyIcebergViewToHmsTable.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed setting retention to match HMS behavior which sets 0 by default.


CreateViewDesc desc = new CreateViewDesc(dbDotView, table.getCols(), null, table.getParameters(),
table.getPartColNames(), false, false, viewOriginalText, viewExpandedText, table.getPartCols());
CreateViewDesc desc = new CreateViewDesc(dbDotView, table.getCols(), null, table.getParameters(),
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that HMS replication?
If yes, I worry about this scenario:

  • We don't have those information in HMS as we use REST client.
  • We start replication.
  • Replication adds those information on the target.
  • Target will contain different data than source.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use regular HMS client with Iceberg logical views too.

HMS still has basic info about Iceberg views, like view identity (db, view name), but all the other info is coming from Iceberg metadata at query time, so I don't think there should be an issue here.

@InvisibleProgrammer
Copy link
Copy Markdown
Contributor

On conceptional level, I have two major questions:
Should the Rest client handle views as tables? Or views should be views? At the current implementation it mixes them and basic CRUD operations doesn't separate them. For example, listing tables can receive both tables and views. But we don't know if a given entry is a table or a view.

The other important question is the boundary between what we store on HMS and what we store on Iceberg side?
What happens if we create a native Iceberg view? Do we store the metadata only on Iceberg side? Or duplicate it between Hive and Iceberg? What if we duplicate them? How we can make sure the duplicated data is in sync? And what is the plan if it is not in sync? For example, I'm creating a native Iceberg view in Beeline. Connect to the Rest client. Modify the view. Go back to Hive and query this view. What will happen?

@difin
Copy link
Copy Markdown
Contributor Author

difin commented Jun 3, 2026

@difin When compiling a query referencing an Iceberg logical views the view definition query should come via Iceberg api from the Iceberg metadata files not from the HMS backend db.

In HiveRESTCatalogClient.java I found the implementation of this logic

if (restCatalog instanceof ViewCatalog viewCatalog) {
        try {
          View icebergView = viewCatalog.loadView(id);
          return MetastoreUtil.toHiveView(icebergView, conf);
        } catch (NoSuchViewException viewMissing) {
          throw new NoSuchObjectException();
        }

Maybe I'm missing something but haven't found the implementation of this in case of HMS catalog via thrift. Could you please point it out?

Done - previously Hive was getting the view query definition that was stored in HMS, now I made it retrieve view definition from Iceberg files (via Iceberg API) in BaseHiveIcebergMetaHook.postGetTable:

  public void postGetTable(org.apache.hadoop.hive.metastore.api.Table hmsTable) {
    if (hmsTable != null) {
      try {
        if (isNativeIcebergLogicalView(hmsTable)) {
          if (!usesRestMetastoreClient(conf)) {
            IcebergNativeLogicalViewSupport.enrichHmsTableFromIcebergView(hmsTable, conf);
          }
          return;
        }

@difin
Copy link
Copy Markdown
Contributor Author

difin commented Jun 3, 2026

On conceptional level, I have two major questions: Should the Rest client handle views as tables? Or views should be views? At the current implementation it mixes them and basic CRUD operations doesn't separate them. For example, listing tables can receive both tables and views. But we don't know if a given entry is a table or a view.

Same as #6449 (comment), answered there.

The other important question is the boundary between what we store on HMS and what we store on Iceberg side? What happens if we create a native Iceberg view? Do we store the metadata only on Iceberg side? Or duplicate it between Hive and Iceberg? What if we duplicate them? How we can make sure the duplicated data is in sync? And what is the plan if it is not in sync? For example, I'm creating a native Iceberg view in Beeline. Connect to the Rest client. Modify the view. Go back to Hive and query this view. What will happen?

HMS stores only a view identity; at the read time, we construct HMS in-memory view object that gets enriched from the Iceberg metadata (including View SQL text) on both HMS and REST client paths in BaseHiveIcebergMetaHook:

  @Override
  public void postGetTable(org.apache.hadoop.hive.metastore.api.Table hmsTable) {
    if (hmsTable != null) {
      try {
        if (isNativeIcebergLogicalView(hmsTable)) {
          IcebergLogicalViewSupport.enrichHmsTableFromIcebergView(hmsTable, conf);
          return;
        }

@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 4, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants