Skip to content

Inline filtered search with adaptive L#1131

Draft
magdalendobson wants to merge 22 commits into
mainfrom
users/magdalen/inline-with-adaptive-l
Draft

Inline filtered search with adaptive L#1131
magdalendobson wants to merge 22 commits into
mainfrom
users/magdalen/inline-with-adaptive-l

Conversation

@magdalendobson
Copy link
Copy Markdown
Contributor

No description provided.

Copy link
Copy Markdown
Contributor

@hildebrandmw hildebrandmw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Magdalen! In addition to my inline comments - can you also add some integration tests exercising the functionality here? These go a surprisingly long way towards protecting the algorithm.

Also, can we bikeshed InlineSearch a little? Maybe FilteredSearch? Or InlineFilteredSearch? Not a huge deal, but InlineSearch seems a little opaque to me.

Comment thread diskann/src/graph/search/inline_filter_search.rs Outdated
Comment thread diskann/src/graph/search/inline_filter_search.rs
Comment thread diskann/src/graph/search/inline_filter_search.rs
/// Label evaluator for determining node matches and early termination.
pub label_evaluator: &'q dyn QueryLabelProvider<InternalId>,
/// Adaptive L for the search.
pub adaptive_l: Option<AdaptiveL>,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts on making this a full Algo enum instead of an Option to give us room to expand in the future?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a huge proponent of this idea since hypothetical future variants might want to mix and match with adaptive L rather than plug and play. Seems like we should just leave this to future work if needed.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's nothing keeping us from keeping the AdaptiveL struct and using that in the enum like

pub enum Kind {
    Standard,
    AdaptiveL(AdaptiveL),
}

Comment thread diskann/src/graph/search/inline_filter_search.rs Outdated
Comment thread diskann/src/graph/search/inline_filter_search.rs Outdated
Comment thread diskann/src/graph/search/inline_filter_search.rs Outdated
Comment thread diskann/src/graph/search/inline_filter_search.rs Outdated
Comment thread diskann-benchmark/src/inputs/graph_index.rs Outdated
Comment thread diskann-benchmark/src/inputs/graph_index.rs
Comment thread diskann-benchmark-core/src/search/graph/inline.rs
@codecov-commenter
Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 94.32049% with 28 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.49%. Comparing base (bc106df) to head (be183f8).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
diskann/src/graph/search/inline_filter_search.rs 94.41% 12 Missing ⚠️
diskann-benchmark/src/inputs/graph_index.rs 81.39% 8 Missing ⚠️
diskann-benchmark/src/backend/index/benchmarks.rs 85.71% 5 Missing ⚠️
diskann-benchmark-core/src/search/graph/inline.rs 98.79% 2 Missing ⚠️
diskann-benchmark/src/backend/index/search/knn.rs 94.44% 1 Missing ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1131      +/-   ##
==========================================
+ Coverage   89.40%   90.49%   +1.08%     
==========================================
  Files         485      487       +2     
  Lines       92079    92560     +481     
==========================================
+ Hits        82323    83759    +1436     
+ Misses       9756     8801     -955     
Flag Coverage Δ
miri 90.49% <94.32%> (+1.08%) ⬆️
unittests 90.45% <94.32%> (+1.39%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark-core/src/search/graph/mod.rs 100.00% <ø> (ø)
...kann-benchmark/src/backend/index/search/plugins.rs 73.58% <100.00%> (+7.58%) ⬆️
diskann-benchmark/src/main.rs 91.26% <100.00%> (+0.09%) ⬆️
diskann/src/graph/search/multihop_search.rs 100.00% <100.00%> (+0.55%) ⬆️
diskann-benchmark/src/backend/index/search/knn.rs 77.77% <94.44%> (+4.16%) ⬆️
diskann-benchmark-core/src/search/graph/inline.rs 98.79% <98.79%> (ø)
diskann-benchmark/src/backend/index/benchmarks.rs 70.82% <85.71%> (+2.02%) ⬆️
diskann-benchmark/src/inputs/graph_index.rs 50.23% <81.39%> (+2.24%) ⬆️
diskann/src/graph/search/inline_filter_search.rs 94.41% <94.41%> (ø)

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@magdalendobson
Copy link
Copy Markdown
Contributor Author

Thanks Magdalen! In addition to my inline comments - can you also add some integration tests exercising the functionality here? These go a surprisingly long way towards protecting the algorithm.

Also, can we bikeshed InlineSearch a little? Maybe FilteredSearch? Or InlineFilteredSearch? Not a huge deal, but InlineSearch seems a little opaque to me.

I've added some integration test. In addition to throwing some of the existing cases in multihop at it, I also designed a test that should return different results depending on whether or not the adaptive L feature is enabled.

Regarding the naming, I named it InlineSearch to be consistent with MultihopSearch (ie, multihop search isn't called MultihopFilteredSearch either). Should we maybe rename both to mention filtering?

/// specificity = 0.1% (1/1000) → 8× L
/// and so on up to a pre-set maximum multipler
#[derive(Debug)]
pub struct InlineSearch<'q, InternalId> {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not a blocker for this PR, but how does this compose with range search and diverse search functionalities?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From an algorithmic perspective, inline filtering (without any adaptation) will compose seamlessly with any other type of search, because it's just adding the extra step of checking match and keeping track of matched elements. However, from the perspective of our codebase as written right now, it would require new function signatures that accept a LabelProvider.

Thinking about how adaptive L in particular would compose with range search, we could use it within the initial graph search before deciding whether to move on to the unbounded part of range search. Similarly we could compose it with diverse search to increase L_search when few matching candidates are found.

I: VectorId,
A: ExpandBeam<T, Id = I> + SearchExt,
SR: SearchRecord<I> + ?Sized,
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would we want to allow a paginated API For this function?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, that's a good question. Using pagination and adaptive L seems like a strange thing to do. Adapting L_search is morally pretty similar to pagination, since it extends the search for longer when few matched results are found. If there was a user who wanted to use pure inline filtering and control length of the search via pagination only, on the other hand that seems reasonable.

@magdalendobson magdalendobson linked an issue Jun 5, 2026 that may be closed by this pull request
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Merge new filtered search algorithms to main

5 participants