Skip to content

feat(auth): implement regional access boundary support for standalone JWT and async service accounts#17025

Open
nbayati wants to merge 24 commits into
googleapis:mainfrom
nbayati:rab-support-async-sa-jwt
Open

feat(auth): implement regional access boundary support for standalone JWT and async service accounts#17025
nbayati wants to merge 24 commits into
googleapis:mainfrom
nbayati:rab-support-async-sa-jwt

Conversation

@nbayati
Copy link
Copy Markdown
Contributor

@nbayati nbayati commented May 11, 2026

This PR implements the following changes:

  • Add RAB support to async service account credential type, by providing async manager and fetching methods.
  • Update unit tests to accept both mtls and standard lookup endpoint urls.
  • Refactor before_request to use a _after_refresh hook so we don't have to override the method.
  • Add RAb support for self signed jwt flow through jwt.py
  • some small enhancements for test coverage and backward compatibility

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request implements asynchronous support for Regional Access Boundary (RAB) management, including background refresh tasks and mTLS endpoint support. Key changes include the addition of _AsyncRegionalAccessBoundaryRefreshManager, updates to JWT and service account credentials to handle RAB state during cloning and serialization, and comprehensive test coverage for these new flows. However, a critical issue was identified where the refresh method in google/auth/jwt.py was renamed to _perform_refresh_token, which will break token updates as the base class expects a refresh method. Additionally, a typo was found in a test assertion URL.

Comment thread packages/google-auth/google/auth/jwt.py
Comment thread packages/google-auth/tests/test_external_account_authorized_user.py Outdated
@nbayati nbayati changed the title Add RAB support for async SA and jwt.py feat(auth): implement regional access boundary support for standalone JWT and async service accounts May 11, 2026
@nbayati nbayati marked this pull request as ready for review May 11, 2026 20:40
@nbayati nbayati requested review from a team as code owners May 11, 2026 20:40
@macastelaz macastelaz self-assigned this May 12, 2026
Comment thread packages/google-auth/google/auth/_regional_access_boundary_utils.py Outdated
Comment thread packages/google-auth/google/auth/_regional_access_boundary_utils.py Outdated
Comment thread packages/google-auth/google/oauth2/_client_async.py Outdated
Comment thread packages/google-auth/google/oauth2/_client_async.py Outdated
Comment thread packages/google-auth/google/oauth2/_client_async.py Outdated
Comment thread packages/google-auth/google/auth/credentials.py Outdated
Comment thread packages/google-auth/google/oauth2/_client_async.py Outdated
except ValueError:
response_data = response_body

if response.status == http_client.OK:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In google.auth.aio.transport.Response, the HTTP status code is exposed via the status_code property, not status. Passing a compliant google.auth.aio transport callable raises AttributeError: 'Response' object has no attribute 'status'. Please update the async lookup and grant methods to check .status_code.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually decided to only apply the fix to the RAB lookup, and instead open a bug to bring the grant methods up to date separately. This was can keep the blast radius of this PR smaller and limit it to only RAB changes without touching any token fetching flows.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created #17139 to track the necessary changes for the token endpoints.

Comment thread packages/google-auth/google/oauth2/_client_async.py
Comment thread packages/google-auth/google/oauth2/_service_account_async.py Outdated
# A refresh is already in progress.
return

async def _worker():
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unlike the synchronous refresh manager which safely deepcopies the transport (copied_request = copy.deepcopy(request)), the async manager passes the exact same request instance directly into the background coroutine task. Because start_refresh is invoked inside before_request, the main application coroutine immediately proceeds to make its actual service API HTTP call using the exact same request transport while the background task is concurrently using it, risking HTTP state corruption or interleaved headers.

Additionally, spawning asyncio.create_task(_worker()) without tracking cancellation hooks upon client session closure can potentially cause dangling tasks that raise RuntimeError: Session is closed when executing against closed client sessions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both concerns are valid, but safe under the hood:

  1. Deepcopying the Transport is impossible and not needed in async
    The async transport object (e.g., aiohttp_requests.Request) contains an active aiohttp.ClientSession with open TCP sockets. Attempting to copy.deepcopy(request) will instantly raise a TypeError: cannot pickle 'ClientSession' object and crash the application at runtime. Unlike synchronous transports running on separate OS threads, async HTTP clients (like aiohttp or httpx) are natively designed to handle concurrent requests sharing the same connection session. All request-specific states (headers, payloads) are stored in localized coroutine call stack frames, preventing any HTTP state corruption or interleaving.

  2. Session closure and dangling tasks are handled safely
    The background worker is a single-shot asyncio task that executes exactly one lookup request and then immediately terminates and gets garbage-collected.
    If the user's application closes the underlying client session while a background task is still running, the resulting RuntimeError: Session is closed is safely caught by the worker's generic except Exception as e: block. It logs a warning, fails open cleanly, and does not raise an unhandled exception or crash the application.

I think no code changes are required here, as the current design is fully protected.

Comment thread packages/google-auth/google/oauth2/_service_account_async.py Outdated
Comment thread packages/google-auth/google/auth/credentials.py
Comment thread packages/google-auth/google/oauth2/_client_async.py
Comment thread packages/google-auth/google/oauth2/_client_async.py
Comment thread packages/google-auth/tests/test_jwt.py Outdated
expected_url_standard = "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/{}/allowedLocations".format(
self.SERVICE_ACCOUNT_EMAIL
)
expected_url_mtls = "https://iamcredentials.mtls.googleapis.com/v1/projects/-/serviceAccounts/{}/allowedLocations".format(
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you remind me what prompted the mTLS addition?

We should have test(s) that set that signal and makes sure the right endpoint is called on refresh etc.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that we only had hardcoded the non-mtls endpoint, while the other existing endpoints switch between mtls and non-mtls dynalically at runtime. I added the same treatment to the RAB URL, however, based on my recent talk with Yao, it seems like it's safe to always just send the request to the mtls one. If that ends up being the case, we can simplify the code and always call the mtls RAB endpoint.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Leaving this comment thread open, so I either come back and add more unit tests as you requested, or simplify the logic and update the existing unit tests.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After further investigation, it turned out that we couldn't just always call mtls endpoint, and the library had to switch dynamically between the two. I did a refactoring to move the RAB endpoints to the RAB file and out of iam. I'm filing an issue to add unit tests to iam, and have them dynamically switch too, because the current static decision that happens in the beginning of the run won't cover the GCE's CUJ for dynamically changing identity to MWLID. I've opened a separate issue (#17282) to migrate the rest of the endpoints.

nbayati and others added 19 commits May 19, 2026 03:10
@nbayati nbayati force-pushed the rab-support-async-sa-jwt branch from be67ab6 to 7e65ee7 Compare May 19, 2026 03:14
@parthea parthea assigned lsirac and nbayati and unassigned macastelaz and lsirac May 22, 2026
@nbayati nbayati requested review from daniel-sanche and lsirac May 27, 2026 00:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants