Fully switch to async KVStore persistence#919
Conversation
|
👋 I see @joostjager was un-assigned. |
42c16e2 to
4371f6d
Compare
05ebc28 to
911c169
Compare
1b4d24f to
c62a503
Compare
joostjager
left a comment
There was a problem hiding this comment.
Overall happy with this simplification. Big changeset though, even though not all that much is happening. Risk nonetheless.
Bigger open question for me is what the latest is on runtime abstraction and how that is going to work in combination with potential wasm support.
Also, are there sync public api calls that can now be made async such as export_pathfinding_scores?
| locked_peers.insert(peer_info.node_id, peer_info); | ||
| PeerStoreSerWrapper(&locked_peers).encode() | ||
| }; | ||
| self.persist_peers(data).await |
There was a problem hiding this comment.
Is it safe to have already dropped the lock? If two add_peer calls run interleaved, outdated state may be written?
There was a problem hiding this comment.
Fair, would you prefer we take a mutation guard as we do in DataStore in instances like this?
There was a problem hiding this comment.
I think so? Not sure if there are other ways to do it?
There was a problem hiding this comment.
Yeah, now did add safeguards for node metrics and peer store persistence, too, but once we switch to be fully async we'll be able to switch to tokio::sync::{Mutex,RwLock}s and hold them across await.
| let _guard = self.mutation_lock.lock().await; | ||
|
|
||
| self.persist(&object)?; | ||
| self.persist(&object).await?; |
There was a problem hiding this comment.
Now that this no longer holds self.objects, doesn't that, isn't this creating a window where readers get interleaved in an undesirable way?
There was a problem hiding this comment.
That's why we introduced the mutation_lock for the write paths. Or maybe I'm misunderstanding the question?
There was a problem hiding this comment.
Yes it works for the writers, but readers can still see old in-memory state while a write is being persisted. insert_or_update is the other way around. A reader can see in-memory state that isn't on disk yet. Perhaps it is acceptable, but a comment is justified then.
| signer_provider: &'a test_utils::TestKeysInterface, | ||
| } | ||
|
|
||
| impl<K: KVStore + Sync> TestMonitorUpdatePersister<'_, K> { |
There was a problem hiding this comment.
Commit message says dropping the trait, but this looks more substantial?
| pub(crate) fn create_persister<'a, K: KVStoreSync + Sync>( | ||
| store: &'a K, chanmon_cfg: &'a TestChanMonCfg, max_pending_updates: u64, | ||
| pub(crate) fn create_persister<'a, K: KVStore + Sync>( | ||
| store: &'a K, chanmon_cfg: &'a TestChanMonCfg, _max_pending_updates: u64, |
There was a problem hiding this comment.
Wasn't max pending updates exercised in a test?
| )?; | ||
|
|
||
| let store_key = self.build_obfuscated_key(&primary_namespace, &secondary_namespace, &key); | ||
| let schema_version = match self.schema_version().await { |
There was a problem hiding this comment.
This repetitive code doesn't look great. Maybe an explicit setup call is better?
290b0a5 to
9ad8f2d
Compare
|
codex seems to agree on the locks
|
9ad8f2d to
8b93895
Compare
|
As discussed offline, now updated:
|
Use a temporary rust-lightning fork revision that exposes async migratable KV-store support. This lets ldk-node migrate filesystem stores without reimplementing LDK's migration logic locally. Co-Authored-By: HAL 9000
Read and write BDK wallet state through async KVStore helpers while keeping the current WalletPersister entry points bridged through the node runtime. This reduces the wallet persistence surface that still depends on KVStoreSync. Co-Authored-By: HAL 9000
Static invoice persistence already runs from async handlers, so use KVStore directly instead of routing those reads and writes through the blocking KVStoreSync trait. Co-Authored-By: HAL 9000
Persist peer store updates through async KVStore operations. The synchronous node APIs keep bridging at their runtime boundary while async event handling awaits peer persistence directly. Co-Authored-By: HAL 9000
Co-Authored-By: HAL 9000
Persist DataStore mutations through async KVStore operations while keeping the existing synchronous APIs bridged through the node runtime. Async event handling now awaits payment store writes directly. Co-Authored-By: HAL 9000
Persist node metric updates through async KVStore writes and await them from the chain, gossip, and scoring tasks. This removes the remaining blocking metrics writer while keeping the helper name stable. Co-Authored-By: HAL 9000
Avoid introducing a temporary macro when moving node metrics persistence onto async KV storage. Co-Authored-By: HAL 9000
Co-Authored-By: HAL 9000
Persist the on-chain wallet through BDK's AsyncWalletPersister so wallet state writes use the async KVStore path. Existing synchronous wallet APIs keep bridging through the node runtime until their callers are made async. Co-Authored-By: HAL 9000
Open filesystem stores through the async LDK migration helper so v1-to-v2 store migration no longer depends on the blocking KVStoreSync migration path. Co-Authored-By: HAL 9000
Move the existing in-memory test store into a shared module without changing its behavior. This lets integration tests reuse it while keeping the later async TestSyncStore change separate. Co-Authored-By: HAL 9000
Keep the shared test store move as a pure code move by restoring the original comments and spacing. Co-Authored-By: HAL 9000
Exercise async KVStore operations in TestSyncStore and filesystem migration tests while keeping the temporary sync comparison path until the final KVStoreSync removal. Also route pathfinding score export through async KVStore reads. Co-Authored-By: HAL 9000
Drop the remaining synchronous KV store trait bounds and implementations. After the preceding migrations, custom stores only need to provide async KVStore persistence. Co-Authored-By: HAL 9000
Compile the VSS persistence tests after the shared KV store helper moved to async persistence. Co-Authored-By: HAL 9000
Add a crate-local runtime wrapper for store backends that need to keep their I/O isolated while shutting down safely from async contexts. Co-Authored-By: HAL 9000
Co-Authored-By: HAL 9000
Co-Authored-By: HAL 9000
Enable Tokio's eager driver handoff when building with tokio_unstable so node-owned runtimes can use the dedicated driver handoff path where available. Build binding artifacts and selected CI coverage with tokio_unstable so the cfg-gated runtime path remains exercised. Co-Authored-By: HAL 9000
326cbf7 to
be2b74d
Compare
|
Hmm, seems Github CI might have an outage, at least it's not running right now.. |
| let mut runtime_builder = tokio::runtime::Builder::new_multi_thread(); | ||
| runtime_builder.enable_all(); | ||
| #[cfg(tokio_unstable)] | ||
| runtime_builder.enable_eager_driver_handoff(); |
There was a problem hiding this comment.
I'd definitely add docs why this is needed and/or link to the tokio issue.
| } | ||
|
|
||
| pub(crate) async fn add_peer(&self, peer_info: PeerInfo) -> Result<(), Error> { | ||
| let _guard = self.mutation_lock.lock().await; |
There was a problem hiding this comment.
Do you think that reading doesn't need to be in this lock too? Similar to what we discussed in the other store discussion threads? #919 (comment)
| Handle(tokio::runtime::Handle), | ||
| } | ||
|
|
||
| pub(crate) struct StoreRuntime { |
There was a problem hiding this comment.
This can also use some serious documentation on why it is needed, even if temporary.
There was a problem hiding this comment.
Also wondering if it isn't possible to unify this with the existing Runtime wrapper? That might also make the various modes more clear: current runtime, passed runtime, new runtime, or some combo.
There was a problem hiding this comment.
Also wondering if it isn't possible to unify this with the existing
Runtimewrapper? That might also make the various modes more clear: current runtime, passed runtime, new runtime, or some combo.
Considered that, but given the plan is to remove it ASAP, I decided to keep it separate which will make the cleanup much more straightforward.
| format!("{}-{}", thread_name_prefix, id) | ||
| }) | ||
| .worker_threads(worker_threads) | ||
| .max_blocking_threads(worker_threads) |
There was a problem hiding this comment.
What is the reason for this setting?
| ) -> io::Result<Self> { | ||
| let runtime = tokio::runtime::Builder::new_multi_thread() | ||
| .enable_all() | ||
| .thread_name_fn(move || { |
There was a problem hiding this comment.
Maybe add this debugging/naming setting also to other places where a runtime is created?
| tokio::task::block_in_place(move || drop(internal_runtime)); | ||
| if let Some(runtime) = self.internal_runtime.take() { | ||
| if let Ok(runtime) = Arc::try_unwrap(runtime) { | ||
| runtime.shutdown_background(); |
There was a problem hiding this comment.
What does it mean if we skip shutdown here?
| // Keep this small while still allowing progress if one runtime worker blocks on sync store access. | ||
| const INTERNAL_RUNTIME_WORKERS: usize = 2; | ||
|
|
||
| async fn run_on_internal_runtime<T>( |
There was a problem hiding this comment.
Can this be a method on StoreRuntime?
| runtime: Option<tokio::runtime::Runtime>, | ||
| } | ||
|
|
||
| impl StoreRuntime { |
There was a problem hiding this comment.
Name describes usage, not really what it does. IsolatedRuntime?
We planned to make the full switch to the async
KVStorefor a while. Here we do just that: first step-by-step moving the remaining dependencies over to useKVStore, then finally droppingKVStoreSyncand all implementations.Depends on lightningdevkit/rust-lightning#4658 (will drop the
DROPMEcommit once that's merged).