Skip to content

Brev 9032/Use Nebius Capacity Advisor to determine GPU instance availability#123

Open
kirtiip20 wants to merge 2 commits into
mainfrom
BREV-9032/Add-nebius-live-capacity
Open

Brev 9032/Use Nebius Capacity Advisor to determine GPU instance availability#123
kirtiip20 wants to merge 2 commits into
mainfrom
BREV-9032/Add-nebius-live-capacity

Conversation

@kirtiip20

@kirtiip20 kirtiip20 commented Jun 16, 2026

Copy link
Copy Markdown

Problem

Nebius GPU instance types showed as available in the Brev UI based on tenant quota alone, even when Nebius had no on-demand capacity in that region. Users could select such a type (e.g. 8× H200, L40s) and the launch would then fail at provisioning time.

Root cause
Availability was computed only from tenant quota allowances, with no check against the provider's actual capacity. A tenant can hold quota in a region where Nebius currently has no capacity available so the type was still marked available and failed on launch.

Fix
Integrated the Nebius Capacity Advisor (ResourceAdvice) API so availability reflects real-time on-demand capacity & tenant quota:

  1. Fetch Capacity Advisor data during each instance-type synchronization and build a region:platform:preset availability map (getResourceAdviceMap, buildResourceAdviceMapFromItems).
  2. Updated GPU availability resolution (resolvePresetAvailability) to require:Available capacity from Capacity Advisor, and
    Remaining tenant quota.
    Treated DATA_STATE_UNKNOWN and AVAILABILITY_LEVEL_LIMIT_REACHED as unavailable capacity
  3. If the Capacity Advisor API is fully unavailable, degrade gracefully to quota-only (logged as a warning) so the catalog doesn't go blank.
  4. Upgraded github.com/nebius/gosdk to v0.2.22, which includes support for the Capacity Advisor API.

@kirtiip20 kirtiip20 self-assigned this Jun 17, 2026
@kirtiip20 kirtiip20 force-pushed the BREV-9032/Add-nebius-live-capacity branch from 769f1d0 to d2a77f4 Compare June 24, 2026 09:22
@kirtiip20 kirtiip20 changed the title Brev 9032/add nebius live capacity Brev 9032/Use Nebius Capacity Advisor to determine GPU instance availability Jun 25, 2026
@kirtiip20 kirtiip20 marked this pull request as ready for review June 25, 2026 14:58
@kirtiip20 kirtiip20 requested a review from a team as a code owner June 25, 2026 14:58
return key, available, true
}

func buildResourceAdviceMapFromItems(items []*capacityv1.ResourceAdvice) map[string]uint32 {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we move this to the test file as this is only used there?

return available > 0 && hasQuota
}

func resourceAdviceEntry(item *capacityv1.ResourceAdvice) (key string, available uint32, ok bool) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's err on the side of no named returns (instead of (key string, available uint32, ok bool), just use (string, uint32, bool))

}
isAvailable := c.resolvePresetAvailability(
ctx, isCPUOnly, hasQuota,
location.Name, platform.Metadata.Name, preset.Name,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor but it might be nice to just hand the capacity lookup key here directly, rather than the individual components that will only be used to build the key.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants