Skip to content

HIVE-29644: HMS hang/deadlock during ACID replication: compaction enqueue incorrectly runs inside replTableWriteIdState transaction#6522

Open
shreenidhiSaigaonkar wants to merge 1 commit into
apache:masterfrom
shreenidhiSaigaonkar:HIVE-29644
Open

Conversation

@shreenidhiSaigaonkar
Copy link
Copy Markdown
Contributor

What changes were proposed in this pull request?

replTableWriteIdState now only applies the write-id state and returns whether compaction is needed; TransactionHandler.repl_tbl_writeid_state schedules the per-partition major compactions afterwards via ReplAbortedWriteCompactionScheduler, each in its own compact() transaction (restoring the pre-HIVE-27481 behaviour, same as manual ALTER TABLE COMPACT).

Why are the changes needed?

ReplTableWriteIdStateFunction enqueued major compactions inline (via CompactFunction) inside the long @transactional(POOL_TX) replTableWriteIdState call, holding the NEXT_COMPACTION_QUEUE_ID row lock across all partitions while re-acquiring the CompactionScheduler mutex; this could AB-BA deadlock with a concurrent compact() caller (initiator or another replication job) across the POOL_TX/POOL_MUTEX connection pools.

Does this PR introduce any user-facing change?

No

How was this patch tested?

…ueue incorrectly runs inside replTableWriteIdState transaction

- ReplTableWriteIdStateFunction enqueued major compactions inline (via CompactFunction)
  inside the long @transactional(POOL_TX) replTableWriteIdState call, holding the
  NEXT_COMPACTION_QUEUE_ID row lock across all partitions while re-acquiring the
  CompactionScheduler mutex; this could AB-BA deadlock with a concurrent compact() caller
  (initiator or another replication job) across the POOL_TX/POOL_MUTEX connection pools.

- replTableWriteIdState now only applies the write-id state and returns whether compaction
  is needed; TransactionHandler.repl_tbl_writeid_state schedules the per-partition major
  compactions afterwards via ReplAbortedWriteCompactionScheduler, each in its own compact()
  transaction (restoring the pre-HIVE-27481 behaviour, same as manual ALTER TABLE COMPACT).
@sonarqubecloud
Copy link
Copy Markdown

sonarqubecloud Bot commented Jun 2, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants