Fix peer store management on channel closure by Jolah1 · Pull Request #895 · lightningdevkit/ldk-node

Jolah1 · 2026-05-01T02:33:38Z

Summary

This PR resolves both issues described in #745 related to peer store management on channel closure.

Fixed behaviors

Local force-close → peer is retained to allow channel_reestablish recovery
Counterparty force-closes last channel → peer is removed to avoid unnecessary reconnections

Problems

1. Local force-close removes peer too early

When we force-closed a channel in close_channel_internal, we immediately removed the peer from the store regardless of closure type:

// Before — peer removed for BOTH cooperative and force-close
if open_channels.len() == 1 {
    self.peer_store.remove_peer(&counterparty_node_id)?;
}

This broke recovery. The background reconnection task only reconnects to peers in the store. Removing the peer immediately guaranteed we would never reconnect, so channel_reestablish could never complete. This is especially problematic for LND peers, which may not handle force-closure error messages correctly and rely on us reconnecting to recover.

2. Counterparty force-close never cleans up peer

When a counterparty force-closed their last channel with us, the ChannelClosed event handler did nothing with the peer store. The peer remained persisted indefinitely and the node kept attempting to reconnect on every background tick. wasted resources and misleading peer state.

Solution

1. Retain peer on local force-close (`src/lib.rs`)

In close_channel_internal, peers are now only removed for cooperative
closes:

if open_channels.len() == 1 && !force {
    self.peer_store.remove_peer(&counterparty_node_id)?;
}

For force-closes, the peer is retained in the store so the background reconnection task can fire, channel_reestablish can complete, and peer cleanup is deferred to the ChannelClosed event handler.

2. Remove peer on counterparty-initiated closure (`src/event.rs`)

In the LdkEvent::ChannelClosed handler, we now check:

Is this the last channel with the peer?
Did the counterparty initiate the closure?

If both conditions are true, the peer is removed from the store.

Implementation Details

LDK timing behavior

LDK emits ChannelClosed before removing the channel from its internal state. A naive call to list_channels_with_counterparty() inside the handler would still return the closing channel, making the "last channel" check always wrong. We fix this by excluding the closing channel explicitly:

let has_other_channels = self
    .channel_manager
    .list_channels_with_counterparty(&counterparty_node_id)
    .iter()
    .any(|c| c.channel_id != channel_id); // exclude the channel that just closed

Ordering to avoid race conditions

Peer removal is performed before calling add_event. This ensures consumers of next_event_async() always observe a consistent peer store when they react to ChannelClosed — avoiding a race where the event is visible before the removal completes.

Tests

Added two integration tests in tests/integration_tests_rust.rs:

test_peer_removed_on_counterparty_force_close
Verifies that after the counterparty force-closes the last channel, the peer is no longer persisted in the local peer store.

test_peer_retained_on_local_force_close
Verifies that after a locally initiated force-close, the peer remains persisted in the local peer store to allow background reconnection and channel_reestablish recovery.

Both tests assert on is_persisted rather than peer presence to correctly distinguish stored peers from transient TCP connections that may still appear in list_peers() during teardown.

Results

test test_peer_removed_on_counterparty_force_close ... ok
test test_peer_retained_on_local_force_close ... ok
test result: ok. 41 passed; 0 failed; 0 ignored; 0 measured
cargo build → no errors
cargo clippy → no new warnings
cargo fmt → applied

ldk-reviews-bot · 2026-05-01T02:33:42Z

👋 Thanks for assigning @tnull as a reviewer!
I'll wait for their review and will help manage the review process.
Once they submit their review, I'll check if a second reviewer would be helpful.

Camillarhi

Thanks for working on this. I left some comments. Also, can you squash the commits?

Camillarhi · 2026-05-03T21:15:26Z

 					user_channel_id: UserChannelId(user_channel_id),
 					counterparty_node_id,
-					reason: Some(reason),
+					reason: Some(reason.clone()),


reason.clone() is unnecessary as reason was not moved by the match

Good catch, removed.
Thanks!

Camillarhi · 2026-05-03T21:29:25Z

+				if let Some(counterparty_node_id) = counterparty_node_id {
+					let all_channels =
+						self.channel_manager.list_channels_with_counterparty(&counterparty_node_id);
+
+					let has_other_channels =
+						all_channels.iter().any(|c| c.channel_id != channel_id);
+
+					let counterparty_initiated = matches!(
+						reason,
+						ClosureReason::CounterpartyForceClosed { .. }
+							| ClosureReason::CounterpartyInitiatedCooperativeClosure
+					);


We should flip the order here and check the closure reason first, so the read on the channel manager only happens when it's actually relevant

That's true, no point querying the channel manager if the closure reason rules out action.
It has been reordered.

Camillarhi · 2026-05-03T21:38:20Z

 			// Check if this was the last open channel, if so, forget the peer.
-			if open_channels.len() == 1 {
+			//For force-closes we keep the peer
+			// in the store so the background reconnection task can fire and
+			// complete the channel_reestablish recovery flow. This is especially
+			// important for LND peers
+			// The peer will be removed by the ChannelClosed event handler once
+			// the counterparty reconnects and confirms closure.


This comment should flow as one paragraph rather than fragmented lines, and the last sentence is misleading, the ChannelClosed event handler only removes the peer on counterparty-initiated closures, so after a local force-close the peer isn't actually cleaned up there. It's retained indefinitely until the user calls disconnect.

Something like:

// If this was the last open channel and we're closing cooperatively, forget // the peer, since we have no further reason to reconnect. For force-closes we // intentionally keep the peer in the store so the background reconnection // task keeps firing and can drive the channel_reestablish recovery flow. // This is especially important against LND peers, which don't always handle // force-closure error messages correctly. Note that this means a force-closed // peer is retained until the user explicitly calls Node::disconnect.

You're right, the last sentence was inaccurate.
The handler only removes on counterparty-initiated closures, not local force-closes.
Replaced with your suggested comment.

Camillarhi · 2026-05-03T22:24:06Z

+}
+
+#[tokio::test(flavor = "multi_thread", worker_threads = 1)]
+async fn test_peer_retained_on_local_force_close() {


This test can be folded into do_channel_full_cycle as it already runs both force_close == true and false paths, so the assertion can be done there

Thanks for pointing this out. I have folded the assertions into do_channel_full_cycle alongside the existing force-close paths rather than keeping them as separate tests.

ldk-reviews-bot · 2026-05-04T00:00:24Z

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Jolah1 · 2026-05-04T12:01:05Z

Thanks for working on this. I left some comments. Also, can you squash the commits?

Thanks for the thorough review. All four points have been addressed . Will squash the commits once the changes are in.

Camillarhi · 2026-05-04T16:16:44Z

+				// If the counterparty initiated closure of their last remaining channel
+				// with us, remove them from the peer store so we stop trying to reconnect.
+				//
+				// If we initiated the closure, keep them in the peer store so the
+				// background reconnection task fires and we can complete the
+				// channel_reestablish recovery flow. This matters especially for LND
+				// peers, which need us to reconnect to recover from force-closures.
+				//
+				// We exclude `channel_id` from the remaining-channel check because LDK
+				// fires ChannelClosed before removing the channel from its internal list,
+				// so list_channels_with_counterparty still includes the closing channel.


This comment partially duplicates the rationale in the lib.rs change. I suggest trimming it to just what this block does, and let the lib.rs comment carry the LND recovery rationale

Thank you!
I have trimmed the comment here to focus on the behavior of this block.

tnull

Thanks for looking into this. I think technically we could consider cleaning up the stored peer once all funds of the last channel were swept successfully, though I'm not sure if it's worth the complexity adding explicit tracking for it (but let me know if you see an easy/minimally invasive approach to do it).

tnull · 2026-05-05T12:33:24Z

+							.any(|c| c.channel_id != channel_id);
+
+						if !has_other_channels {
+							if let Err(e) = self.peer_store.remove_peer(&counterparty_node_id) {


We usually replay the event in case of persistence failures. While this path is not critical, we should probably do the same here to keep it consistent.

I agree, I've updated this to return ReplayEvent() on failure to keep it consistent with other persistence paths.

tnull · 2026-05-05T12:34:18Z

+			// task keeps firing and can drive the channel_reestablish recovery flow.
+			// This is especially important against LND peers, which don't always handle force-closure error messages correctly.
+
+			//Note that this means a force-closed peer is retained until the user explicitly calls Node::disconnect.


nit:

Suggested change

//Note that this means a force-closed peer is retained until the user explicitly calls Node::disconnect.

// Note that this means a force-closed peer is retained until the user explicitly calls Node::disconnect.

Fixed, Thanks!

Jolah1 · 2026-05-05T15:49:31Z

Thanks for looking into this. I think technically we could consider cleaning up the stored peer once all funds of the last channel were swept successfully, though I'm not sure if it's worth the complexity adding explicit tracking for it (but let me know if you see an easy/minimally invasive approach to do it).

That makes sense, and would be the ideal cleanup point. I don't see a minimally invasive way to track it right now without additional tracking of channel resolution state across events, so I'd lean toward leaving it as a follow-up issue rather than adding complexity here. Happy to open one if that's useful.

ldk-reviews-bot · 2026-05-13T22:22:27Z

🔔 1st Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-05-16T00:00:57Z

🔔 2nd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-05-18T00:01:58Z

🔔 3rd Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-05-20T00:02:27Z

🔔 4th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-05-23T00:00:56Z

🔔 5th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

Camillarhi · 2026-05-23T18:44:55Z

CI seems to be failing.

Jolah1 · 2026-05-24T07:30:13Z

CI seems to be failing.

Yes, the CI failure was a flaky async_payment test on 1.85.0 — connection refused on the mock Esplora server before it was ready, unrelated to this change. Triggered a fresh run and all checks are now passing.
Thank youuuu.

ldk-reviews-bot · 2026-05-25T00:02:04Z

🔔 6th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-05-27T00:03:31Z

🔔 7th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-05-30T00:01:15Z

🔔 8th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

ldk-reviews-bot · 2026-06-01T00:01:59Z

🔔 9th Reminder

Hey @tnull! This PR has been waiting for your review.
Please take a look when you have a chance. If you're unable to review, please let us know so we can find another reviewer.

tnull

Excuse the delay here - LGTM!

Please squash and clean up the commit history while you're at it. All commits should feature a description that (briefly!) outlines the motivation and approach for the change. Feel free to check out https://cbea.ms/git-commit/ for guidance.

tnull · 2026-06-01T11:53:14Z

+				// We exclude `channel_id` from the check because LDK emits
+				// ChannelClosed before removing the channel from its internal list.
+
+				if let Some(counterparty_node_id) = counterparty_node_id {


nit: The counterparty_node_id field is always set post LDK v0.0.117 and LDK Node v0.2 was already based on LDK v0.0.118. We should be good to just expecting it, reducing indetation.

tnull · 2026-06-01T12:00:42Z

+				// ChannelClosed before removing the channel from its internal list.
+
+				if let Some(counterparty_node_id) = counterparty_node_id {
+					let counterparty_initiated = matches!(


Codex:

[P2] Include on-chain force-closes in peer cleanup — /home/tnull/workspace/ldk-node/src/event.rs:1595-1599
When a peer unilaterally broadcasts its commitment while we are disconnected (or their error message is never processed), LDK closes our channel with ClosureReason::CommitmentTxConfirmed, not CounterpartyForceClosed. Because this match only treats error-message force closes as counterparty
initiated, the last-channel peer remains in peer_store and the reconnection loop continues indefinitely after the channel is gone; handle the confirmed commitment/closing-tx case or otherwise clean it up as well.

• Yes, the retention is intentional, but only for locally initiated force-closes.

The review is pointing at a different case: the peer force-closes while we are disconnected, and LDK later reports the terminal close as ClosureReason::CommitmentTxConfirmed rather than CounterpartyForceClosed. At that point the channel is already closed on-chain, so keeping the peer
persisted no longer helps channel_reestablish; it just keeps the reconnect loop alive for a peer with no remaining channel.

So I’d treat the review as valid. The fix should preserve:

HolderForceClosed { .. }: keep persisted peer, per the comment in src/lib.rs:1847

CounterpartyForceClosed { .. }: remove if it was their last channel

CommitmentTxConfirmed: also remove if it was their last channel, because this is the disconnected/on-chain remote force-close path

Caveat: CommitmentTxConfirmed is not a perfect “counterparty initiated” signal; LDK documents that it can also come from our own monitor copy. But for peer-store cleanup after a terminal ChannelClosed, it still seems closer to “no reconnect needed” than to the local HolderForceClosed
recovery case.

ldk-reviews-bot requested a review from tnull May 1, 2026 02:44

Camillarhi suggested changes May 3, 2026

View reviewed changes

event: remove peer from store on counterparty-initiated force-close

7609c26

Jolah1 force-pushed the peer-handling-on-force-close branch from c1bd96c to 7609c26 Compare May 4, 2026 11:35

Camillarhi reviewed May 4, 2026

View reviewed changes

trim duplicate comments in ChannelClosed

6c6e9a7

tnull reviewed May 5, 2026

View reviewed changes

replay ChannelClosed event on peer removal failure

25fc22a

Jolah1 requested a review from tnull May 11, 2026 22:21

chore: re-trigger CI

bd8bf38

tnull reviewed Jun 1, 2026

View reviewed changes

	//Note that this means a force-closed peer is retained until the user explicitly calls Node::disconnect.
	// Note that this means a force-closed peer is retained until the user explicitly calls Node::disconnect.

Conversation

Jolah1 commented May 1, 2026

Summary

Fixed behaviors

Problems

1. Local force-close removes peer too early

2. Counterparty force-close never cleans up peer

Solution

1. Retain peer on local force-close (src/lib.rs)

2. Remove peer on counterparty-initiated closure (src/event.rs)

Implementation Details

LDK timing behavior

Ordering to avoid race conditions

Tests

Results

Uh oh!

ldk-reviews-bot commented May 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Camillarhi left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ldk-reviews-bot commented May 4, 2026

Uh oh!

Jolah1 commented May 4, 2026

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

tnull left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Jolah1 commented May 5, 2026

Uh oh!

ldk-reviews-bot commented May 13, 2026

Uh oh!

ldk-reviews-bot commented May 16, 2026

Uh oh!

ldk-reviews-bot commented May 18, 2026

Uh oh!

ldk-reviews-bot commented May 20, 2026

Uh oh!

ldk-reviews-bot commented May 23, 2026

Uh oh!

Camillarhi commented May 23, 2026

Uh oh!

Jolah1 commented May 24, 2026

Uh oh!

ldk-reviews-bot commented May 25, 2026

Uh oh!

ldk-reviews-bot commented May 27, 2026

Uh oh!

ldk-reviews-bot commented May 30, 2026

Uh oh!

ldk-reviews-bot commented Jun 1, 2026

1. Retain peer on local force-close (`src/lib.rs`)

2. Remove peer on counterparty-initiated closure (`src/event.rs`)

ldk-reviews-bot commented May 1, 2026 •

edited

Loading