connection: release request ids after failed sends by dkropachev · Pull Request #874 · scylladb/python-driver

dkropachev · 2026-05-06T18:53:26Z

Fixes #873. Reclaims request ids when send_msg fails and covers the async keyspace path with unit tests.

Lorak-mmk · 2026-05-07T13:08:47Z

+        try:
+            msg = encoder(msg, request_id, self.protocol_version, compressor=self.compressor,
+                          allow_beta_protocol_version=self.allow_beta_protocol_version)

-        if self._is_checksumming_enabled:
-            buffer = io.BytesIO()
-            self._segment_codec.encode(buffer, msg)
-            msg = buffer.getvalue()
+            if self._is_checksumming_enabled:
+                buffer = io.BytesIO()
+                self._segment_codec.encode(buffer, msg)
+                msg = buffer.getvalue()

-        self.push(msg)
+            self.push(msg)
+        except Exception:
+            self._requests.pop(request_id, None)
+            raise
        return len(msg)

    def wait_for_response(self, msg, timeout=None, **kwargs):
        return self.wait_for_responses(msg, timeout=timeout, **kwargs)[0]

    def wait_for_responses(self, *msgs, **kwargs):
        """
        Returns a list of (success, response) tuples.  If success
        is False, response will be an Exception.  Otherwise, response
        will be the normal query response.

        If fail_on_error was left as True and one of the requests
        failed, the corresponding Exception will be raised.
        """
        if self.is_closed or self.is_defunct:
            msg = "Connection %s is already closed" % (self,)
            if self.last_error:
                msg += ": %s" % (self.last_error,)
            raise ConnectionShutdown(msg)
        timeout = kwargs.get('timeout')
        original_timeout = timeout  # preserve for exception reporting
        fail_on_error = kwargs.get('fail_on_error', True)
        waiter = ResponseWaiter(self, len(msgs), fail_on_error)

        # busy wait for sufficient space on the connection
        messages_sent = 0
        while True:
            needed = len(msgs) - messages_sent
            with self.lock:
                available = min(needed, self.max_request_id - self.in_flight + 1)
                request_ids = [self.get_request_id() for _ in range(available)]
                self.in_flight += available

            for i, request_id in enumerate(request_ids):
-                self.send_msg(msgs[messages_sent + i],
-                              request_id,
-                              partial(waiter.got_response, index=messages_sent + i))
+                try:
+                    self.send_msg(msgs[messages_sent + i],
+                                  request_id,
+                                  partial(waiter.got_response, index=messages_sent + i))
+                except Exception:
+                    unsent_request_ids = request_ids[i:]
+                    with self.lock:
+                        self.in_flight -= len(unsent_request_ids)
+                        self.request_ids.extend(unsent_request_ids)
+                    raise
            messages_sent += available


There are now multiple PRs regarding request_ids, in_flight etc.
It is incredible that we need to ever worry about this stuff.
Why is it even responsibility of the caller to adjust those values?
Connection should have a method for sending request. This method should be responsible for managing in_flight, request_ids and other state of Connection. Callers should never worry about that.

This is the only sane solution, and anything else will just require fixing callsites forever.

And yes I know this is a code in connection. But you also have PRs for e.g. hearbeats. Heartbeats should never need to touch this stuff.

Acknowledged. I removed the async keyspace cleanup from this branch, so this PR is now scoped to the concrete send-failure leak only. The broader connection-level helper/refactor can stay as a separate follow-up.

Lorak-mmk · 2026-05-08T09:44:24Z

As I said on the hearbeat PR - maybe the send_msg should clean this up?

dkropachev added 2 commits May 6, 2026 14:53

connection: release stream ids after send failures

c4862e7

connection: clean up failed async keyspace sends

856efaa

dkropachev force-pushed the dk/connection-send-failure-cleanup branch from 32e8008 to 856efaa Compare May 7, 2026 06:48

dkropachev self-assigned this May 7, 2026

dkropachev requested review from Lorak-mmk and sylwiaszunejko May 7, 2026 06:56

Lorak-mmk reviewed May 7, 2026

View reviewed changes

connection: drop async keyspace send cleanup

52eed90

dkropachev marked this pull request as ready for review May 8, 2026 02:55

dkropachev requested a review from Lorak-mmk May 8, 2026 02:55

Lorak-mmk force-pushed the master branch from f2a9e87 to 763af09 Compare June 15, 2026 10:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

connection: release request ids after failed sends#874

connection: release request ids after failed sends#874
dkropachev wants to merge 3 commits into
scylladb:masterfrom
dkropachev:dk/connection-send-failure-cleanup

dkropachev commented May 6, 2026

Uh oh!

Lorak-mmk May 7, 2026

Uh oh!

Lorak-mmk May 7, 2026

Uh oh!

dkropachev May 7, 2026 •

edited

Loading

Uh oh!

Lorak-mmk commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

dkropachev commented May 6, 2026

Uh oh!

Lorak-mmk May 7, 2026

Choose a reason for hiding this comment

Uh oh!

Lorak-mmk May 7, 2026

Choose a reason for hiding this comment

Uh oh!

dkropachev May 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Lorak-mmk commented May 8, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

dkropachev May 7, 2026 •

edited

Loading