Skip to content

Add SUMMARIZE incremental backup mode using PG 17+ WAL summarize feature#672

Open
heysky wants to merge 1 commit into
postgrespro:masterfrom
heysky:master
Open

Add SUMMARIZE incremental backup mode using PG 17+ WAL summarize feature#672
heysky wants to merge 1 commit into
postgrespro:masterfrom
heysky:master

Conversation

@heysky
Copy link
Copy Markdown

@heysky heysky commented Dec 29, 2025

This commit adds a new incremental backup mode called SUMMARIZE that leverages PostgreSQL 17+'s native WAL summarize feature (summarize_wal GUC) to track modified data blocks without requiring external extensions like ptrack.

Feature Overview

The SUMMARIZE backup mode:

  • Requires PostgreSQL 17+ with summarize_wal=on enabled
  • Uses pg_wal_summary_contents() and pg_available_wal_summaries() functions
  • Builds pagemap bitmaps from WAL summary information for incremental backups
  • Validates WAL summary availability before backup starts

Files Modified

  • src/pg_probackup.h: Added BACKUP_MODE_DIFF_SUMMARIZE enum and function declarations
  • src/catalog.c: Updated backupModes[] array and mode parsing functions
  • src/validate.c: Added SUMMARIZE mode to validation checks
  • src/backup.c: Added WAL summary LSN validation and wait logic
  • src/data.c: Added SUMMARIZE mode to incremental mode checks
  • src/catchup.c: Added WAL summarize support for catchup command
  • src/pg_probackup.c: Added SUMMARIZE to supported backup modes
  • src/help.c: Updated help text to include SUMMARIZE mode
  • Makefile: Added src/walsummary.o to object files list

New File

  • src/walsummary.c: Core implementation of WAL summary integration
    • pg_is_walsummary_enabled(): Check if PG 17+ has summarize_wal enabled
    • get_walsummary_summarized_lsn(): Get current summarized LSN
    • wait_wal_summarization(): Wait for summarizer to catch up to target LSN (60s timeout)
    • make_pagemap_from_walsummary(): Build pagemap from WAL summary data

Key Implementation Details

Function Integration

The make_pagemap_from_walsummary() function:

  1. Queries pg_available_wal_summaries() to find overlapping summary files
  2. For each summary, calls pg_wal_summary_contents() with intersection range
  3. Builds pagemap bitmaps for changed blocks
  4. Matches WAL summary data to pgFile list using (dbOid, tblspcOid, relOid, forkName)

Usage

Create incremental backup with SUMMARIZE mode:
pg_probackup backup -B /backup/dir -b SUMMARIZE --instance=name -D /data/dir

Prerequisites:

  • PostgreSQL 17 or higher
  • summarize_wal=on in postgresql.conf
  • Previous FULL backup exists

Error Handling

If WAL summarizer is disabled:
ERROR: WAL summarize backup mode requires summarize_wal to be enabled

If summarizer doesn't catch up within 60 seconds:
ERROR: WAL summarizer did not catch up to within timeout period. Incremental backup cannot proceed without complete WAL summaries.

…ize feature

This commit adds a new incremental backup mode called SUMMARIZE that leverages
PostgreSQL 17+'s native WAL summarize feature (summarize_wal GUC) to track
modified data blocks without requiring external extensions like ptrack.

## Feature Overview

The SUMMARIZE backup mode:
- Requires PostgreSQL 17+ with summarize_wal=on enabled
- Uses pg_wal_summary_contents() and pg_available_wal_summaries() functions
- Builds pagemap bitmaps from WAL summary information for incremental backups
- Validates WAL summary availability before backup starts

## Files Modified

- src/pg_probackup.h: Added BACKUP_MODE_DIFF_SUMMARIZE enum and function declarations
- src/catalog.c: Updated backupModes[] array and mode parsing functions
- src/validate.c: Added SUMMARIZE mode to validation checks
- src/backup.c: Added WAL summary LSN validation and wait logic
- src/data.c: Added SUMMARIZE mode to incremental mode checks
- src/catchup.c: Added WAL summarize support for catchup command
- src/pg_probackup.c: Added SUMMARIZE to supported backup modes
- src/help.c: Updated help text to include SUMMARIZE mode
- Makefile: Added src/walsummary.o to object files list

## New File

- src/walsummary.c: Core implementation of WAL summary integration
  - pg_is_walsummary_enabled(): Check if PG 17+ has summarize_wal enabled
  - get_walsummary_summarized_lsn(): Get current summarized LSN
  - wait_wal_summarization(): Wait for summarizer to catch up to target LSN (60s timeout)
  - make_pagemap_from_walsummary(): Build pagemap from WAL summary data

## Key Implementation Details

### Function Integration
The make_pagemap_from_walsummary() function:
1. Queries pg_available_wal_summaries() to find overlapping summary files
2. For each summary, calls pg_wal_summary_contents() with intersection range
3. Builds pagemap bitmaps for changed blocks
4. Matches WAL summary data to pgFile list using (dbOid, tblspcOid, relOid, forkName)

## Usage

Create incremental backup with SUMMARIZE mode:
  pg_probackup backup -B /backup/dir -b SUMMARIZE --instance=name -D /data/dir

Prerequisites:
- PostgreSQL 17 or higher
- summarize_wal=on in postgresql.conf
- Previous FULL backup exists

## Error Handling

If WAL summarizer is disabled:
  ERROR: WAL summarize backup mode requires summarize_wal to be enabled

If summarizer doesn't catch up within 60 seconds:
  ERROR: WAL summarizer did not catch up to <LSN> within timeout period.
         Incremental backup cannot proceed without complete WAL summaries.
Comment thread src/walsummary.c
BlockNumber blknum = *(BlockNumber *) parray_get(map->blocknums, j);

if (blknum < nblocks)
datapagemap_add(&file->pagemap, blknum);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blknum is global, but segments (.1, .2 ...) are lost. It can be reproduced on a table bigger that 1GB. seg = blknum / RELSEG_SIZE; off = blknum % RELSEG_SIZE; An example in process_block_change function.

Comment thread src/walsummary.c
key.forkName = curr_fork_name;
key.blocknums = NULL;

found_entry = (BlockMapEntry **) parray_bsearch(blockmap_list, &key, blockmap_compare);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

blockmap_compare must be sorted before parray_bsearch.

Comment thread src/walsummary.c
"relblocknumber "
"FROM pg_wal_summary_contents("
"%u, GREATEST('%s'::pg_lsn, '%s'::pg_lsn), "
"LEAST('%s'::pg_lsn, '%s'::pg_lsn)) "
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From walsummaryfuncs.c
/*

  • List the contents of a WAL summary file identified by TLI, start LSN,
  • and end LSN.
    */
    Datum
    pg_wal_summary_contents(PG_FUNCTION_ARGS)

The function may be called only with real file boarders summary_start_lsn and summary_end_lsn, otherwise it may cause

ERROR:  could not open file "pg_wal/summaries/0000000100000000120000F80000000021A70FF0.summary": No such file or directory
2026-06-02 16:27:58.913 CEST [85974] STATEMENT:  
	 select s.start_lsn as file_start, s.end_lsn as file_end,
	        greatest(s.start_lsn,'0/120000F8'::pg_lsn) as clipped_start,
	        least(s.end_lsn,'0/21A70FF0'::pg_lsn) as clipped_end,
	        (select count(*) from pg_wal_summary_contents(1, greatest(s.start_lsn,'0/120000F8'::pg_lsn), least(s.end_lsn,'0/21A70FF0'::pg_lsn))
	           where not is_limit_block and relfilenode=16384) as blocks_of_t
	 from pg_available_wal_summaries() s
	 where s.tli=1 and s.end_lsn > '0/120000F8'::pg_lsn and s.start_lsn < '0/21A70FF0'::pg_lsn
	 order by s.start_lsn;

Comment thread src/backup.c
* If the summarizer hasn't caught up within 60 seconds, the backup
* will fail with an error, preventing a backup that would miss data.
*/
if (!wait_wal_summarization(backup_conn, prev_backup->start_lsn))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

prev_backup->start_lsn - the start of a parent backup, there is no real wait here. The lsn is from the past. The wait should target current.start_lsn. If the summarizer lags behind current.start_lsn, the most recent summary files don't exist yet, those changed blocks are missing.

Comment thread src/walsummary.c
curr_db_oid = atoi(PQgetvalue(block_res, j, 0));
curr_tblspc_oid = atoi(PQgetvalue(block_res, j, 1));
curr_relfilenode = atoi(PQgetvalue(block_res, j, 2));
curr_fork_number = atoi(PQgetvalue(block_res, j, 3));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For oid is more correct to use atooid

Comment thread src/walsummary.c

PQclear(res);

elog(INFO, "Mapped %d changed blocks to %d files", total_blocks, parray_num(blockmap_list));
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a warning -Wdeclaration-after-statement during build here. It should be %zu

@demonolock
Copy link
Copy Markdown

@heysky Thank you for you contrib. I left comments that may improve it. Please take a look. Also the target branch should be REL_2_5, not master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants