Skip to content

add defaultTrafficPolicy to MeshConfig#3720

Open
PetrMc wants to merge 1 commit into
istio:masterfrom
PetrMc:petrmc/default-traffic-policy
Open

add defaultTrafficPolicy to MeshConfig#3720
PetrMc wants to merge 1 commit into
istio:masterfrom
PetrMc:petrmc/default-traffic-policy

Conversation

@PetrMc

@PetrMc PetrMc commented Jun 2, 2026

Copy link
Copy Markdown
Member

adds a mesh-wide baseline traffic policy that DestinationRules inherit and override per-field. today a DestinationRule traffic policy replaces the policy wholesale, so any field it leaves unset falls back to istio's hardcoded defaults (max-uint32 connection limits, no outlier detection) rather than an operator baseline. the field reuses the full networking.v1alpha3.TrafficPolicy type; the planned istio implementation will initially honor only connectionPool and outlierDetection, with the remaining sub-policies warned on and support added later without an API change. mirrors the existing defaultHttpRetryPolicy pattern.

@PetrMc PetrMc requested a review from a team as a code owner June 2, 2026 21:29
@istio-testing istio-testing added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Jun 2, 2026
@hzxuzhonghu

Copy link
Copy Markdown
Member

I am afraid this will break existing default traffic policy settings, previoulsy if the dr.trafficpolicy not set, we have many different ways to get some timeouts(i donot have a thorough look)

@PetrMc

PetrMc commented Jun 3, 2026

Copy link
Copy Markdown
Member Author

I am afraid this will break existing default traffic policy settings, previoulsy if the dr.trafficpolicy not set, we have many different ways to get some timeouts(i donot have a thorough look)

thanks @hzxuzhonghu, few points:

  • it's opt-in and nil by default. if defaultTrafficPolicy isn't set, nothing changes. the existing default paths (mesh connectTimeout, tcpKeepalive, the hardcoded CB thresholds) all behave exactly as today. no existing deployment is affected unless the field is explicitly sets

  • when it is set, it slots into a precedence chain/ladder that already exists for these fields. today cluster.ConnectTimeout is seeded from Mesh.ConnectTimeout and then overridden by the DR's connectionPool.tcp.connectTimeout (cluster_builder.go here + cluster_traffic_policy.go - 1 2). so there's already a mesh-baseline -> DR-override model. defaultTrafficPolicy just makes that baseline configurable for connectionPool + outlierDetection, sitting between the legacy mesh fields and any DR:

hardcoded default -> MeshConfig.connectTimeout/tcpKeepalive (legacy) -> defaultTrafficPolicy -> DR trafficPolicy -> DR portLevelSettings
  • each layer only overrides the fields it actually sets. legacy MeshConfig.connectTimeout keeps applying whenever defaultTrafficPolicy doesn't set tcp.connectTimeout, so nothing existing breaks.

  • this scope is intentionally just connectionPool + outlierDetection (planned impl warns on and ignores the rest). those are exactly the fields that already follow the mesh-baseline -> DR-override pattern, so this isn't a new merge model, just extending an existing one. same shape as defaultHttpRetryPolicy.

the actual merge, precedence, and backward-compat tests will land in the istio/istio impl PR, not here.

please let me know if it helps or if there a specific default path you're most worried about (timeouts, lb, outlier) so i can confirm it's covered?

@hzxuzhonghu

Copy link
Copy Markdown
Member

Thank for the thorough analysis.

hardcoded default -> MeshConfig.connectTimeout/tcpKeepalive (legacy) -> defaultTrafficPolicy -> DR trafficPolicy -> DR portLevelSettings

The tier is super long, it would be tricky to debug for end user.

The TrafficPolicy is such a big struct, covers a lot of attributes, some may not be supported,
some have default values, how would we handle the default values like below?
Though it provide the flexibility, i think it also brings complexity.

// Settings common to both HTTP and TCP upstream connections.
type ConnectionPoolSettings_TCPSettings struct {
	state protoimpl.MessageState `protogen:"open.v1"`
	// Maximum number of HTTP1 /TCP connections to a destination host. Default 2^32-1.
	MaxConnections int32 `protobuf:"varint,1,opt,name=max_connections,json=maxConnections,proto3" json:"max_connections,omitempty"`
	// TCP connection timeout. format:
	// 1h/1m/1s/1ms. MUST be >=1ms. Default is 10s.
	ConnectTimeout *duration.Duration `protobuf:"bytes,2,opt,name=connect_timeout,json=connectTimeout,proto3" json:"connect_timeout,omitempty"`

@PetrMc

PetrMc commented Jun 4, 2026

Copy link
Copy Markdown
Member Author

thanks @hzxuzhonghu

both come down to merge granularity, and we reuse the existing mergeTrafficPolicy (https://github.com/istio/istio/blob/e9c2753b9f66749d6c28faddb641f57f2b6592f2/pilot/pkg/networking/util/util.go#L859) instead of inventing one. it merges at the block level: connectionPool and outlierDetection taken as whole units, not field-by-field inside them. so the MaxConnections 0-vs-unset problem doesn't arise, and the documented defaults (10s, 2^32-1) are unchanged, they stay the fallback when nothing in the chain sets the block. set nothing, behaves exactly as today.

scope is just connectionPool + outlierDetection, the rest is warned on and ignored.

on the long ladder: the resolved per-cluster policy is already visible via istioctl proxy-config cluster, so debugging is unchanged. proposal to keep it simple: block-level merge only.

does that work, or were you expecting field-level merge inside the blocks?

@PetrMc

PetrMc commented Jun 5, 2026

Copy link
Copy Markdown
Member Author

@hzxuzhonghu - would you be able to have another look at it? Thank you!

@hzxuzhonghu

Copy link
Copy Markdown
Member

Instead of reusing TrafficPolicy, i would suggest design a separate struct, eliminating the confusion about whether a field make sense.

Another concern is reusing the existing merge semantic could not 100% match.

hardcoded default -> MeshConfig.connectTimeout/tcpKeepalive (legacy) -> defaultTrafficPolicy -> DR trafficPolicy -> DR portLevelSettings

I think currently we donot actually override the ConnectTimeout if only the Tcp.ConnectTimeout = nil.

@PetrMc PetrMc force-pushed the petrmc/default-traffic-policy branch from 81a985d to c5c5fe5 Compare June 8, 2026 16:51
@istio-testing istio-testing added the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 8, 2026
… policy

Signed-off-by: Petr McAllister <petr.mcallister@gmail.com>
@PetrMc PetrMc force-pushed the petrmc/default-traffic-policy branch from c5c5fe5 to b738b7e Compare June 8, 2026 18:52
@istio-testing istio-testing removed the needs-rebase Indicates a PR needs to be rebased before being merged label Jun 8, 2026
@PetrMc

PetrMc commented Jun 8, 2026

Copy link
Copy Markdown
Member Author

thanks @hzxuzhonghu - great point - switched to a dedicated MeshConfig.DefaultTrafficPolicy instead of reusing the full TrafficPolicy, so the API only exposes the two fields we actually honor, no warn-and-ignore surface.

  • both fields reuse the existing DR sub-types - merging against a DestinationRule stays well defined.
  • merge is per block: a DR that sets connectionPool overrides the whole baseline connectionPool, otherwise it inherits the baseline block (same for outlierDetection).
  • per-block keeps it deterministic and avoids the scalar presence problem you flagged on maxConnections.

on connectTimeout with per-block merge the resolved block goes through the same apply path as today, so behavior is unchanged from a DR-only connectionPool. baseline connectTimeout applies when the host has no DR connectionPool
if a DR sets connectionPool, the baseline block is replaced and connectTimeout follows its existing fallback to Mesh.ConnectTimeout. Will document that in the istio/istio impl + docs.

Please let me know if the shape looks right to you!

@PetrMc

PetrMc commented Jun 12, 2026

Copy link
Copy Markdown
Member Author

@hzxuzhonghu how do you feel about the adjusted approach here? Thank you!

@hzxuzhonghu hzxuzhonghu left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@PetrMc PetrMc requested a review from hzxuzhonghu June 15, 2026 19:16
@PetrMc

PetrMc commented Jun 15, 2026

Copy link
Copy Markdown
Member Author

@hzxuzhonghu /lgtm doesn't seem to trigger the approval. if it's ok with you - can you, please, hit the approve button. Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size/S Denotes a PR that changes 10-29 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants