Skip to content

Comments

[SVS] Implement 2-stage backend SVS index initialization#903

Open
rfsaliev wants to merge 7 commits intomainfrom
rfsaliev/svs-async-create-impl
Open

[SVS] Implement 2-stage backend SVS index initialization#903
rfsaliev wants to merge 7 commits intomainfrom
rfsaliev/svs-async-create-impl

Conversation

@rfsaliev
Copy link
Collaborator

@rfsaliev rfsaliev commented Feb 2, 2026

Purpose: prevent long time SVS Tiered index lock at initialization time.

Logic:

  • First stage: create svs::...::MutableVamanaIndex instance with R/O shared lock
  • Second stage: set svs::...::MutableVamanaIndex created before under R/W unique lock

Which issues this PR fixes

  1. #7831

Main objects this PR modified

  1. 2 methods added to SVSIndex: createImpl() and setImpl()
  2. SVSTiered update job is modified to call createImpl() under shared lock and setImpl() under unique lock if backend index is empty.

Mark if applicable

  • This PR introduces API changes
  • This PR introduces serialization changes

Note

Medium Risk
Touches concurrency and synchronization in tiered indexing and backend initialization; regressions could surface as deadlocks, stale/deleted labels remaining searchable, or incorrect update behavior under contention.

Overview
Implements two-stage initialization for the backend SVS index to reduce long exclusive locks during tiered index updates: the SVS implementation can now be constructed via createImpl() under a shared lock and installed via setImpl() under a unique lock.

Tiered SVS update jobs were adjusted to use this path when the backend is empty, and to track and propagate deletions that happen during an update (via a new deleted-labels journal) so those labels are removed from the backend after the move. Deletion logic was tightened by adding SVSIndexBase::isLabelExists() and routing single-label deletes through a dedicated deleteVectorImpl().

Adds unit coverage for two-stage init and for deletion synchronization during an in-flight tiered update (single- and multi-value).

Written by Cursor Bugbot for commit 2a3d44c. This will update automatically on new commits. Configure here.

@jit-ci
Copy link

jit-ci bot commented Feb 2, 2026

Hi, I’m Jit, a friendly security platform designed to help developers build secure applications from day zero with an MVS (Minimal viable security) mindset.

In case there are security findings, they will be communicated to you as a comment inside the PR.

Hope you’ll enjoy using Jit.

Questions? Comments? Want to learn more? Get in touch with us.

@codecov
Copy link

codecov bot commented Feb 2, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 97.11%. Comparing base (d5f91a8) to head (2a3d44c).
⚠️ Report is 1 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #903      +/-   ##
==========================================
+ Coverage   97.09%   97.11%   +0.01%     
==========================================
  Files         129      129              
  Lines        7500     7551      +51     
==========================================
+ Hits         7282     7333      +51     
  Misses        218      218              

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@alonre24 alonre24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!
A few small comments.
Also please:

  1. Validate that covering the affected scenarios in unit tests (I believe we are)
  2. Add micro benchmarks that will prove the improvement (add vector/run query while initial index creation is executed in the background)

}

void setImpl(std::unique_ptr<ImplHandler> handler) override {
assert(handler && "SVSIndex::setImpl called with null handler");
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is a debug-only assert, let's add this to the log in a warning level as well

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This assert is added just to simplify logic error catching in DEBUG mode - as well as the next-line assert.
In release mode, the logic_error will be thrown later if handler is null.

assert()s here are not really needed - except for debugging purposes.
I can just remove them.

Comment on lines +233 to +238
std::span<const labelType> ids(labels, n);
auto processed_blob = this->preprocessForBatchStorage(vectors_data, n);
auto typed_vectors_data = static_cast<DataType *>(processed_blob.get());
// Wrap data into SVS SimpleDataView for SVS API
auto points = svs::data::SimpleDataView<DataType>{typed_vectors_data, n, this->dim};

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic seems to be a duplication of what we do in addVectorsImpl. Consider unifying these into a single function

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main point here is the processed_blob which lifetime should be managed till end of initImpl() and impl_->add_points() calls.
A single function, which will wrap all this code would look like:

std::tuple<std::span<const labelType>, MemoryUtils::unique_blob, svs::data::SimpleDataView<DataType>> preprocessAndPrepareSVSArgs(...)

Comment on lines +249 to +252
SVSImplHandler *svs_handler = dynamic_cast<SVSImplHandler *>(handler.get());
if (!svs_handler) {
throw std::logic_error("Failed to cast to SVSImplHandler");
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation to have an abstract ImplHandler rather than have only SVSImplHandler? The dynamic_cast here seems a bit awkward

Copy link
Collaborator Author

@rfsaliev rfsaliev Feb 16, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SVSImplHandler is not just a simple type - it is template class with a number of parameters, and it's full declaration looks like:

template <typename MetricType,
          typename DataType,
          bool isMulti,
          size_t QuantBits,
          size_t ResidualBits,
          bool IsLeanVec>
struct SVSIndex<MetricType, DataType, isMulti, QuantBits, ResidualBits, IsLeanVec>::SVSImplHandler;

This why the abstract SVSIndexBase::ImplHandler is defined for client code (TieredSVSIndex).

svs_index->setNumThreads(std::min(availableThreads, labels_to_move.size()));
svs_index->addVectors(vectors_to_move.data(), labels_to_move.data(),
labels_to_move.size());
if (this->backendIndex->indexSize() == 0) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this also handle re-initialization after the index was emptied? Is that scenario tested?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, as it was before in SVSIndex::AddVectors()

@rfsaliev rfsaliev force-pushed the rfsaliev/svs-async-create-impl branch from 23d0223 to 2538673 Compare February 17, 2026 11:23
@jit-ci
Copy link

jit-ci bot commented Feb 17, 2026

❌ Jit Scanner failed - Our team is investigating

Jit Scanner failed - Our team has been notified and is working to resolve the issue. Please contact support if you have any questions.


💡 Need to bypass this check? Comment @sera bypass to override.

@rfsaliev rfsaliev force-pushed the rfsaliev/svs-async-create-impl branch from 33ef4e8 to ff11b80 Compare February 18, 2026 11:23
Purpose: prevent long time SVS Tiered index lock at initialization time.
Logic:
 First stage: create `svs::...::MutableVamanaIndex` instance with R/O shared lock
 Second stage: set `svs::...::MutableVamanaIndex` created before under R/W unique lock
@rfsaliev rfsaliev force-pushed the rfsaliev/svs-async-create-impl branch from ff11b80 to e270c00 Compare February 18, 2026 13:01
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

auto processed_blob = this->preprocessForBatchStorage(vectors_data, n);
auto typed_vectors_data = static_cast<DataType *>(processed_blob.get());
// Wrap data into SVS SimpleDataView for SVS API
auto points = svs::data::SimpleDataView<DataType>{typed_vectors_data, n, this->dim};
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated preprocessing logic in createImpl and addVectorsImpl

Low Severity

The new createImpl method duplicates the same preprocessing sequence found in addVectorsImpl: constructing a std::span from labels, calling preprocessForBatchStorage, casting the blob to DataType *, and constructing a SimpleDataView. If the preprocessing logic changes in one place but not the other, the two code paths will silently diverge, potentially causing subtle bugs.

Additional Locations (1)

Fix in Cursor Fix in Web

Comment on lines +732 to +744
// delete vectors from backend index that were deleted from the frontend index during
// the update process.
{
std::lock_guard main_lock(this->mainIndexGuard);

std::sort(deleted_labels_during_update.begin(), deleted_labels_during_update.end());
auto it = std::unique(deleted_labels_during_update.begin(),
deleted_labels_during_update.end());
deleted_labels_during_update.erase(it, deleted_labels_during_update.end());
auto svs_index = GetSVSIndex();
svs_index->deleteVectors(deleted_labels_during_update.data(),
deleted_labels_during_update.size());
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Jurnal mechianism for removing vectors that were deleted during the training looks good. Can you add a unit test for this edge case? This isn't covered in two_stage_initialization_test from what I understand

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added appropriate test to test_svs_tiered.cpp

Comment on lines +3668 to +3669
std::unique_lock lock(mtx);
EXPECT_TRUE(cv.wait_for(lock, std::chrono::seconds(100), [&] { return adding_to_svs; }));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible that we get here AFTER the tracing_callback called notify_one? In that case, we would wait for 100 seconds before waking up?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants