Skip to content

perf!: avoid listing index files after writes#7129

Open
Xuanwo wants to merge 2 commits into
mainfrom
xuanwo/reduce-index-file-io
Open

perf!: avoid listing index files after writes#7129
Xuanwo wants to merge 2 commits into
mainfrom
xuanwo/reduce-index-file-io

Conversation

@Xuanwo
Copy link
Copy Markdown
Collaborator

@Xuanwo Xuanwo commented Jun 5, 2026

Avoids listing index directories after new index writes by propagating IndexFile metadata from writer and builder APIs into manifest metadata. This covers scalar index writers, V3 vector builders, vector append/remap paths, and distributed vector merge while preserving legacy and extension fallbacks where the API cannot return files yet.

This reduces extra object-store list/stat IO on fresh index creation and maintenance paths, with IO-count regression coverage for scalar and V3 vector index creation.

@github-actions github-actions Bot added A-index Vector index, linalg, tokenizer performance labels Jun 5, 2026
@Xuanwo Xuanwo marked this pull request as ready for review June 5, 2026 17:31
Copy link
Copy Markdown

@claude claude Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Claude Code Review

This repository is configured for manual code reviews. Comment @claude review to trigger a review and subscribe this PR to future pushes, or @claude review once for a one-time review.

Tip: disable this comment in your organization's Code Review settings.

@Xuanwo Xuanwo changed the title perf: avoid listing index files after writes perf!: avoid listing index files after writes Jun 5, 2026
@Xuanwo
Copy link
Copy Markdown
Collaborator Author

Xuanwo commented Jun 5, 2026

@claude review once

Comment on lines 949 to +968
ivf_mut.write(&mut writer).await?;
writer.finish().await?;
let index_size = writer.finish().await? as u64;

// Write the aux file
aux_ivf.write(&mut aux_writer).await?;
aux_writer.finish().await?;
let aux_size = aux_writer.finish().await? as u64;

Ok(existing_indices.len() - start_pos)
Ok((
existing_indices.len() - start_pos,
vec![
IndexFile {
path: INDEX_FILE_NAME.to_string(),
size_bytes: index_size,
},
IndexFile {
path: INDEX_AUXILIARY_FILE_NAME.to_string(),
size_bytes: aux_size,
},
],
))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔴 In optimize_ivf_hnsw_indices (rust/lance/src/index/vector/ivf.rs:949-968), writer.finish().await? and aux_writer.finish().await? are dispatched against the concrete PreviousFileWriter constructed at lines 861-865/889-893, whose inherent finish() (rust/lance-file/src/previous/writer/mod.rs:253) returns the number of rows, not bytes. Those values are then stored in IndexFile.size_bytes and fed through LanceIndexStore::with_file_sizes as CachedFileSize hints, so the IVF_HNSW_SQ optimize path persists garbage sizes (off by orders of magnitude) into the manifest and into the HEAD-call-avoidance cache introduced by this PR. Fix by mirroring the sibling functions in this file (e.g. optimize_ivf_pq_indices at line 790, remap_index_file at line 1908, write_ivf_pq_file at line 1973) and capturing writer.tell().await? as u64 before Writer::shutdown.

Extended reasoning...

Bug

In optimize_ivf_hnsw_indices (rust/lance/src/index/vector/ivf.rs:949-968 in this PR), the IVF and auxiliary file sizes are captured as:

ivf_mut.write(&mut writer).await?;
let index_size = writer.finish().await? as u64;

// Write the aux file
aux_ivf.write(&mut aux_writer).await?;
let aux_size = aux_writer.finish().await? as u64;

and then placed into the returned IndexFile entries as size_bytes.

Why this is wrong

writer and aux_writer are not Box<dyn Writer> here. At lines 861-865 and 889-893 they are wrapped with PreviousFileWriter::with_object_writer(...), so their concrete type is PreviousFileWriter<ManifestDescribing>. PreviousFileWriter carries two finish methods:

  1. The inherent method at rust/lance-file/src/previous/writer/mod.rs:253-263, which returns Result<usize> containing self.metadata.batch_offsets.last() — i.e. the number of rows written, not bytes:

    pub async fn finish(&mut self) -> Result<usize> {
        self.write_footer().await?;
        Writer::shutdown(self.object_writer.as_mut()).await?;
        let num_rows = self.metadata.batch_offsets.last().cloned().unwrap_or_default();
        Ok(num_rows as usize)
    }
  2. The trait IndexWriter::finish impl (in lance_format.rs), which returns Result<IndexFile>.

Rust method resolution prefers the inherent method, so writer.finish().await? invokes the row-count variant. The as u64 cast at lines 950/954 also confirms this — if dispatch had hit the trait method (returning IndexFile), the cast would not compile.

The net effect: for the IVF_HNSW v1 optimize path, index_size is the count of HNSW nodes (the .idx writer) and aux_size is the count of stored vectors (the .aux writer). These row counts then become the persisted IndexFile.size_bytes.

Why this PR makes it dangerous

This PR’s explicit goal is to skip HEAD calls when opening index files by propagating writer-provided sizes into LanceIndexStore::with_file_sizes, which constructs CachedFileSize::new(size) in open_index_file. A wrong-by-orders-of-magnitude cached size feeds straight into the v2 FileReader’s end-of-file offset arithmetic (footer location, metadata length, etc.). When the reader trusts a "size_bytes" that is actually a few hundred / few thousand instead of the real megabyte+ file size, footer offset computation reads from the wrong place — surfacing as corrupt-file errors or, worse, silently wrong reads.

Even on the read paths that still issue their own HEAD (i.e. the path that bypasses with_file_sizes), the garbage values still flow into IndexMetadata::total_size_bytes() and describe_indices(), breaking the very accounting this PR adds.

How to fix

Mirror the pattern already used everywhere else in this file:

  • optimize_ivf_pq_indices (line 790): let size_bytes = writer.tell().await? as u64; immediately before Writer::shutdown(writer.as_mut()).await?;
  • remap_index_file (line 1908): same pattern.
  • write_ivf_pq_file (line 1973): same pattern.

For optimize_ivf_hnsw_indices, after ivf_mut.write(&mut writer).await? (and analogously for aux_writer), do something like:

ivf_mut.write(&mut writer).await?;
let index_size = writer.tell().await? as u64;
PreviousFileWriter::finish(&mut writer).await?;

aux_ivf.write(&mut aux_writer).await?;
let aux_size = aux_writer.tell().await? as u64;
PreviousFileWriter::finish(&mut aux_writer).await?;

or equivalently call the trait method IndexWriter::finish (which already does tell+shutdown internally for this writer type — see rust/lance-index/src/scalar/lance_format.rs impl IndexWriter for PreviousFileWriter) and use the returned IndexFile.size_bytes.

Step-by-step proof

  1. Build an IVF_HNSW_SQ index containing, say, ~10,000 vectors of dimension 128, so the underlying .idx and .aux files end up at hundreds of KB or more.
  2. Trigger the segment-commit optimize path that calls optimize_vector_indicesoptimize_ivf_hnsw_indices.
  3. After optimize, inspect the committed IndexMetadata.files for the new UUID. The reported size_bytes for index.idx equals the number of HNSW nodes (≈10,000) and for auxiliary.idx equals the number of stored vectors (≈10,000) — not the actual byte sizes (which are orders of magnitude larger).
  4. Because the PR now feeds these as CachedFileSize hints via LanceIndexStore::with_file_sizes, a subsequent read that uses this path will compute footer offsets as if the file were ~10,000 bytes long, fetching bytes from outside the real footer region.
  5. Compare with optimize_ivf_pq_indices (line 790) on the same dataset: there size_bytes reflects the true file size because that code path captures writer.tell().await? before shutdown.

All four independent verifiers reached the same conclusion (Rust dispatch goes to the inherent PreviousFileWriter::finish, the value returned is num_rows, and these values then flow into manifest size_bytes and the new HEAD-avoidance cache). The fix is a small one-liner per writer, matching the pattern already established elsewhere in the same file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

A-index Vector index, linalg, tokenizer breaking-change performance

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant