Fully switch to async `KVStore` persistence by tnull · Pull Request #919 · lightningdevkit/ldk-node

tnull · 2026-06-03T10:59:55Z

We planned to make the full switch to the async KVStore for a while. Here we do just that: first step-by-step moving the remaining dependencies over to use KVStore, then finally dropping KVStoreSync and all implementations.

Depends on lightningdevkit/rust-lightning#4658 (will drop the DROPME commit once that's merged).

ldk-reviews-bot · 2026-06-03T10:59:58Z

👋 I see @joostjager was un-assigned.
If you'd like another reviewer assignment, please click here.

joostjager

Overall happy with this simplification. Big changeset though, even though not all that much is happening. Risk nonetheless.

Bigger open question for me is what the latest is on runtime abstraction and how that is going to work in combination with potential wasm support.

Also, are there sync public api calls that can now be made async such as export_pathfinding_scores?

joostjager · 2026-06-03T13:26:39Z

+			locked_peers.insert(peer_info.node_id, peer_info);
+			PeerStoreSerWrapper(&locked_peers).encode()
+		};
+		self.persist_peers(data).await


Is it safe to have already dropped the lock? If two add_peer calls run interleaved, outdated state may be written?

Fair, would you prefer we take a mutation guard as we do in DataStore in instances like this?

I think so? Not sure if there are other ways to do it?

Yeah, now did add safeguards for node metrics and peer store persistence, too, but once we switch to be fully async we'll be able to switch to tokio::sync::{Mutex,RwLock}s and hold them across await.

joostjager · 2026-06-03T13:37:49Z

+		let _guard = self.mutation_lock.lock().await;

-		self.persist(&object)?;
+		self.persist(&object).await?;


Now that this no longer holds self.objects, doesn't that, isn't this creating a window where readers get interleaved in an undesirable way?

That's why we introduced the mutation_lock for the write paths. Or maybe I'm misunderstanding the question?

Yes it works for the writers, but readers can still see old in-memory state while a write is being persisted. insert_or_update is the other way around. A reader can see in-memory state that isn't on disk yet. Perhaps it is acceptable, but a comment is justified then.

joostjager · 2026-06-03T14:06:33Z

+	signer_provider: &'a test_utils::TestKeysInterface,
+}
+
+impl<K: KVStore + Sync> TestMonitorUpdatePersister<'_, K> {


Commit message says dropping the trait, but this looks more substantial?

joostjager · 2026-06-03T14:08:34Z

-pub(crate) fn create_persister<'a, K: KVStoreSync + Sync>(
-	store: &'a K, chanmon_cfg: &'a TestChanMonCfg, max_pending_updates: u64,
+pub(crate) fn create_persister<'a, K: KVStore + Sync>(
+	store: &'a K, chanmon_cfg: &'a TestChanMonCfg, _max_pending_updates: u64,


Wasn't max pending updates exercised in a test?

joostjager · 2026-06-03T14:15:16Z

 		)?;

-		let store_key = self.build_obfuscated_key(&primary_namespace, &secondary_namespace, &key);
+		let schema_version = match self.schema_version().await {


This repetitive code doesn't look great. Maybe an explicit setup call is better?

benthecarman

the vss ci job hung for 2 hours hopefully a transient issue

benthecarman · 2026-06-03T19:14:12Z

codex seems to agree on the locks

[P2] Serialize peer-store persistence updates — /home/ben/projects/ldk-node/src/peer_store.rs:43-52
When two peer-store mutations overlap, this now releases the peer map lock before the async write completes, so an
older snapshot can persist after a newer one. For example, concurrent add_peer calls can leave memory with both
peers but the stored peer list with only the first one, causing reconnect peers to be missing after restart. Please
serialize the mutation plus persistence, as DataStore now does with a separate async mutation lock.
[P2] Preserve node-metrics update ordering — /home/ben/projects/ldk-node/src/io/utils.rs:346-351
If two node-metrics updates run concurrently, the lock is released before the async KV write, so an earlier encoded
snapshot can finish after a later one and overwrite it. This can drop one of the timestamp fields on disk and make
the node believe a sync/broadcast task has not run after restart. The previous code explicitly held the write lock
through persistence to avoid this, so this path needs equivalent serialization for the async write.

tnull · 2026-06-05T08:25:40Z

As discussed offline, now updated:

Re-added the internal runtimes for Postgres / VSS to avoid tokio getting stuck if we end up using block_on while the same thread holds the IO driver. We'll be able to revert this/drop the runtimes once we go fully async API in a following PR.
Added mutation locks for node metrics and peer store persistence for now, but we'll be able to simplify more once we switch to tokio::sync, once we go fully-async.
Added a commit that enables eager handoff of the IO driver if we're built under cfg(tokio_unstable) and enables that for bindings builds.

Use a temporary rust-lightning fork revision that exposes async migratable KV-store support. This lets ldk-node migrate filesystem stores without reimplementing LDK's migration logic locally. Co-Authored-By: HAL 9000

Read and write BDK wallet state through async KVStore helpers while keeping the current WalletPersister entry points bridged through the node runtime. This reduces the wallet persistence surface that still depends on KVStoreSync. Co-Authored-By: HAL 9000

Static invoice persistence already runs from async handlers, so use KVStore directly instead of routing those reads and writes through the blocking KVStoreSync trait. Co-Authored-By: HAL 9000

Persist peer store updates through async KVStore operations. The synchronous node APIs keep bridging at their runtime boundary while async event handling awaits peer persistence directly. Co-Authored-By: HAL 9000

Co-Authored-By: HAL 9000

Persist DataStore mutations through async KVStore operations while keeping the existing synchronous APIs bridged through the node runtime. Async event handling now awaits payment store writes directly. Co-Authored-By: HAL 9000

Persist node metric updates through async KVStore writes and await them from the chain, gossip, and scoring tasks. This removes the remaining blocking metrics writer while keeping the helper name stable. Co-Authored-By: HAL 9000

Avoid introducing a temporary macro when moving node metrics persistence onto async KV storage. Co-Authored-By: HAL 9000

Co-Authored-By: HAL 9000

Persist the on-chain wallet through BDK's AsyncWalletPersister so wallet state writes use the async KVStore path. Existing synchronous wallet APIs keep bridging through the node runtime until their callers are made async. Co-Authored-By: HAL 9000

Open filesystem stores through the async LDK migration helper so v1-to-v2 store migration no longer depends on the blocking KVStoreSync migration path. Co-Authored-By: HAL 9000

Move the existing in-memory test store into a shared module without changing its behavior. This lets integration tests reuse it while keeping the later async TestSyncStore change separate. Co-Authored-By: HAL 9000

Keep the shared test store move as a pure code move by restoring the original comments and spacing. Co-Authored-By: HAL 9000

Exercise async KVStore operations in TestSyncStore and filesystem migration tests while keeping the temporary sync comparison path until the final KVStoreSync removal. Also route pathfinding score export through async KVStore reads. Co-Authored-By: HAL 9000

Drop the remaining synchronous KV store trait bounds and implementations. After the preceding migrations, custom stores only need to provide async KVStore persistence. Co-Authored-By: HAL 9000

Compile the VSS persistence tests after the shared KV store helper moved to async persistence. Co-Authored-By: HAL 9000

Add a crate-local runtime wrapper for store backends that need to keep their I/O isolated while shutting down safely from async contexts. Co-Authored-By: HAL 9000

Co-Authored-By: HAL 9000

Enable Tokio's eager driver handoff when building with tokio_unstable so node-owned runtimes can use the dedicated driver handoff path where available. Build binding artifacts and selected CI coverage with tokio_unstable so the cfg-gated runtime path remains exercised. Co-Authored-By: HAL 9000

tnull · 2026-06-05T08:41:29Z

Hmm, seems Github CI might have an outage, at least it's not running right now..

joostjager · 2026-06-05T09:36:33Z

+				let mut runtime_builder = tokio::runtime::Builder::new_multi_thread();
+				runtime_builder.enable_all();
+				#[cfg(tokio_unstable)]
+				runtime_builder.enable_eager_driver_handoff();


I'd definitely add docs why this is needed and/or link to the tokio issue.

joostjager · 2026-06-05T09:39:04Z

 	}

 	pub(crate) async fn add_peer(&self, peer_info: PeerInfo) -> Result<(), Error> {
+		let _guard = self.mutation_lock.lock().await;


Do you think that reading doesn't need to be in this lock too? Similar to what we discussed in the other store discussion threads? #919 (comment)

joostjager · 2026-06-05T09:49:45Z

 	Handle(tokio::runtime::Handle),
 }

+pub(crate) struct StoreRuntime {


This can also use some serious documentation on why it is needed, even if temporary.

Also wondering if it isn't possible to unify this with the existing Runtime wrapper? That might also make the various modes more clear: current runtime, passed runtime, new runtime, or some combo.

Also wondering if it isn't possible to unify this with the existing Runtime wrapper? That might also make the various modes more clear: current runtime, passed runtime, new runtime, or some combo.

Considered that, but given the plan is to remove it ASAP, I decided to keep it separate which will make the cleanup much more straightforward.

joostjager · 2026-06-05T09:50:35Z

+				format!("{}-{}", thread_name_prefix, id)
+			})
+			.worker_threads(worker_threads)
+			.max_blocking_threads(worker_threads)


What is the reason for this setting?

joostjager · 2026-06-05T09:51:12Z

+	) -> io::Result<Self> {
+		let runtime = tokio::runtime::Builder::new_multi_thread()
+			.enable_all()
+			.thread_name_fn(move || {


Maybe add this debugging/naming setting also to other places where a runtime is created?

joostjager · 2026-06-05T09:57:04Z

-		tokio::task::block_in_place(move || drop(internal_runtime));
+		if let Some(runtime) = self.internal_runtime.take() {
+			if let Ok(runtime) = Arc::try_unwrap(runtime) {
+				runtime.shutdown_background();


What does it mean if we skip shutdown here?

joostjager · 2026-06-05T09:59:14Z

 // Keep this small while still allowing progress if one runtime worker blocks on sync store access.
 const INTERNAL_RUNTIME_WORKERS: usize = 2;

+async fn run_on_internal_runtime<T>(


Can this be a method on StoreRuntime?

joostjager · 2026-06-05T10:00:41Z

+	runtime: Option<tokio::runtime::Runtime>,
+}
+
+impl StoreRuntime {


Name describes usage, not really what it does. IsolatedRuntime?

tnull marked this pull request as draft June 3, 2026 11:00

tnull force-pushed the 2026-06-async-kvstore-persistence branch 2 times, most recently from 42c16e2 to 4371f6d Compare June 3, 2026 11:25

tnull mentioned this pull request Jun 3, 2026

Full WASM support #902

Open

7 tasks

tnull force-pushed the 2026-06-async-kvstore-persistence branch 2 times, most recently from 05ebc28 to 911c169 Compare June 3, 2026 12:15

tnull added this to the 0.8 milestone Jun 3, 2026

tnull force-pushed the 2026-06-async-kvstore-persistence branch 5 times, most recently from 1b4d24f to c62a503 Compare June 3, 2026 12:28

tnull marked this pull request as ready for review June 3, 2026 12:28

tnull requested a review from joostjager June 3, 2026 12:28

tnull mentioned this pull request Jun 3, 2026

Add database persistence benchmarks #915

Draft

joostjager reviewed Jun 3, 2026

View reviewed changes

tnull force-pushed the 2026-06-async-kvstore-persistence branch 2 times, most recently from 290b0a5 to 9ad8f2d Compare June 3, 2026 15:42

benthecarman reviewed Jun 3, 2026

View reviewed changes

Comment thread src/peer_store.rs

Comment thread src/io/utils.rs

tnull force-pushed the 2026-06-async-kvstore-persistence branch from 9ad8f2d to 8b93895 Compare June 5, 2026 08:04

tnull requested review from benthecarman and joostjager June 5, 2026 08:25

tnull added 5 commits June 5, 2026 10:35

DROPME: Bump LDK for async store migration

c8cbaf2

Use a temporary rust-lightning fork revision that exposes async migratable KV-store support. This lets ldk-node migrate filesystem stores without reimplementing LDK's migration logic locally. Co-Authored-By: HAL 9000

Use async KV storage for static invoices

bec0719

Static invoice persistence already runs from async handlers, so use KVStore directly instead of routing those reads and writes through the blocking KVStoreSync trait. Co-Authored-By: HAL 9000

f Simplify

7bfdd0d

Move peer persistence onto async KV storage

cbf87bb

Persist peer store updates through async KVStore operations. The synchronous node APIs keep bridging at their runtime boundary while async event handling awaits peer persistence directly. Co-Authored-By: HAL 9000

tnull added 16 commits June 5, 2026 10:35

f - Serialize peer store persistence updates

8c949ff

Co-Authored-By: HAL 9000

Move DataStore persistence onto async KV storage

a7c4cbe

Persist DataStore mutations through async KVStore operations while keeping the existing synchronous APIs bridged through the node runtime. Async event handling now awaits payment store writes directly. Co-Authored-By: HAL 9000

Move node metrics persistence onto async KV storage

ff489f2

Persist node metric updates through async KVStore writes and await them from the chain, gossip, and scoring tasks. This removes the remaining blocking metrics writer while keeping the helper name stable. Co-Authored-By: HAL 9000

f - Move on-chain wallet update helper out of macro

d71fb06

Avoid introducing a temporary macro when moving node metrics persistence onto async KV storage. Co-Authored-By: HAL 9000

f - Serialize node metrics persistence updates

1ecafce

Co-Authored-By: HAL 9000

Use BDK's async wallet persister

a454ce1

Persist the on-chain wallet through BDK's AsyncWalletPersister so wallet state writes use the async KVStore path. Existing synchronous wallet APIs keep bridging through the node runtime until their callers are made async. Co-Authored-By: HAL 9000

Use async KVStore migration for filesystem stores

de48a60

Open filesystem stores through the async LDK migration helper so v1-to-v2 store migration no longer depends on the blocking KVStoreSync migration path. Co-Authored-By: HAL 9000

Move test InMemoryStore into shared module

0b95f04

Move the existing in-memory test store into a shared module without changing its behavior. This lets integration tests reuse it while keeping the later async TestSyncStore change separate. Co-Authored-By: HAL 9000

f - Preserve moved InMemoryStore code exactly

45d52bb

Keep the shared test store move as a pure code move by restoring the original comments and spacing. Co-Authored-By: HAL 9000

Remove blocking KV store support

fe2f09f

Drop the remaining synchronous KV store trait bounds and implementations. After the preceding migrations, custom stores only need to provide async KVStore persistence. Co-Authored-By: HAL 9000

f - Await async VSS store test helper

55e0d03

Compile the VSS persistence tests after the shared KV store helper moved to async persistence. Co-Authored-By: HAL 9000

Add shared store runtime wrapper

393f078

Add a crate-local runtime wrapper for store backends that need to keep their I/O isolated while shutting down safely from async contexts. Co-Authored-By: HAL 9000

Isolate VSS persistence from the node runtime

b454e5a

Co-Authored-By: HAL 9000

Isolate PostgreSQL persistence from the node runtime

88eb0fb

Co-Authored-By: HAL 9000

tnull force-pushed the 2026-06-async-kvstore-persistence branch from 326cbf7 to be2b74d Compare June 5, 2026 08:35

tnull removed the request for review from joostjager June 5, 2026 08:55

joostjager reviewed Jun 5, 2026

View reviewed changes

Conversation

tnull commented Jun 3, 2026

Uh oh!

ldk-reviews-bot commented Jun 3, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

joostjager left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

benthecarman left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

benthecarman commented Jun 3, 2026

Uh oh!

tnull commented Jun 5, 2026

Uh oh!

tnull commented Jun 5, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

ldk-reviews-bot commented Jun 3, 2026 •

edited

Loading

benthecarman left a comment •

edited

Loading