Microsoft.Extensions.Resilience and Microsoft.Extensions.Http.Resilience
provide resilience mechanisms against transient failures. These two packages are
built on top of the open-source Polly resilience library.
Build a Resilience Pipeline
Given a ServiceCollection services, configure a keyed resilience pipeline as
follows:
const string key = "Retry-Timeout";
services.AddResiliencePipeline(key, static builder =>
{
builder.AddRetry(new RetryStrategyOptions
{
ShouldHandle = new PredicateBuilder().Handle<TimeoutRejectedException>()
});
builder.AddTimeout(TimeSpan.FromSeconds(1.5));
});
Other Add* extension methods include AddCircuitBreaker, AddRateLimiter,
AddConcurrencyLimiter, AddFallback, and AddHedging.
Using AddResiliencePipeline separates the pipeline’s definition from its usage
points where it’s injected. This allows for convenient unit testing, e.g.,
supplying ResiliencePipeline<T>.Empty for faster and less complicated tests.
When adding resilience, you should only add one resilience handler. Multiple resilience strategies are stacked atop one another. The order in which you add them is significant.
Consider this ResiliencePipeline with an outer Timeout and an inner Retry:
ResiliencePipeline pipeline = new ResiliencePipelineBuilder()
.AddTimeout(TimeSpan.FromSeconds(10)) // outer
.AddRetry(new()) // inner
.Build();
Suppose the first and second requests are failing. The third request is not awaited since the overarching timeout elapsed. The sequence diagram is:
Build an HTTP Resilience Pipeline
Given an IHTTPClientBuilder httpClientBuilder returned from any of the
AddHttpClient methods, one can call
httpClientBuilder.AddStandardResilienceHandler() which comes with reasonable
defaults, or manually do something like:
services.ConfigureHttpClientDefaults(
builder => builder.AddStandardResilienceHandler());
services.AddHttpClient("hedgingOnly")
.RemoveAllResilienceHandlers()
.AddStandardHedgingHandler();
services.AddHttpClient("custom")
.AddResilienceHandler("CustomPipeline", static builder =>
{
builder.AddRetry(new HttpRetryStrategyOptions
{
BackoffType = DelayBackoffType.Linear,
});
builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
ShouldHandle = static args =>
{
return ValueTask.FromResult(args is
{
Outcome.Result.StatusCode: HttpStatusCode.RequestTimeout
});
}
});
builder.AddTimeout(TimeSpan.FromSeconds(5));
});
services.AddHttpClient("custom-with-reloads")
.AddResilienceHandler(
"AdvancedPipeline",
static (ResiliencePipelineBuilder<HttpResponseMessage> builder, ResilienceHandlerContext context) =>
{
// Enable reloads from the "RetryOptions" section in an appsettings.json.
context.EnableReloads<HttpRetryStrategyOptions>("RetryOptions");
builder.AddRetry(context.GetOptions<HttpRetryStrategyOptions>("RetryOptions"));
});
Configuring HTTP Resilience Strategies
The rate limiter pipeline limits the maximum number of concurrent requests
being sent to the dependency. Defaults to Queue: 0 and Permit: 1,000.
The total request timeout pipeline ensures that the request, including
retry (or hedging) attempts, doesn’t exceed the configured limit. Defaults to
TotalTimeout: 30s.
The retry pipeline retries the request in case the dependency is slow or
returns an error, e.g., 5XX, 408 (request timeout), 429 (too many requests),
HttpRequestException, TimeoutRejectedException. Defaults to MaxRetries: 3,
Backoff: Exponential, UseJitter: true, and Delay: 2s. The default retries
on all HTTP methods, but can be customized, e.g.,
Retry.DisableFor(HttpMethod.Post) or Retry.DisableForUnsafeHttpMethods()
which covers POST, PATCH, PUT, DELETE, and CONNECT.
The hedging strategy executes the requests against multiple endpoints in
case the dependency is slow or returns a transient error. By default, it hedges
the URL provided by the original HttpRequestMessage. Defaults to MinAttempts: 1, MaxAttempts: 10, and Delay: 2s.
The circuit breaker pipeline blocks execution if too many direct failures or
timeouts are detected, e.g., 5XX, 408 (request timeout), 429 (too many
requests), HttpRequestException, TimeoutRejectedException. Defaults to
FailureRatio: 10%, MinThroughPut: 100, SamplingDuration: 30s, and
BreakDuration: 5s.
The attempt timeout pipeline limits each request attempt duration and throws
if its exceeded. Defaults to AttemptTimeout: 10s.
AddStandardResilienceHandler chains 5 resilience strategies in the following
order: rate limiter, total request timeout, retry, circuit breaker, and attempt
timeout.
AddStandardHedgingHandler chains 5 resilience strategies in the following
order: total request timeout, hedging, rate limiter (per endpoint), circuit
breaker (per endpoint), and attempt timeout (per endpoint). Route selection is
customizable, e.g.,
httpClientBuilder.AddStandardHedgingHandler(static (IRoutingStrategyBuilder builder) =>
{
builder.ConfigureOrderedGroups(static options => {
options.Groups.Add(new UriEndpointGroup()
{
Endpoints =
{
new() { Uri = new("https://example.net/api/v1"), Weight = 97 },
new() { Uri = new("https://example.net/api/v2"), Weight = 3 }
}
});
});
});
… with other configs like ConfigureWeightedGroups also available.
Using the Resilience Pipeline
Given a ServiceProvider provider and a string key:
ResiliencePipelineProvider<string> pipelineProvider =
provider.GetRequiredService<ResiliencePipelineProvider<string>>();
ResiliencePipeline pipeline = pipelineProvider.GetPipeline(key);
await pipeline.ExecuteAsync(
async token => await httpClient.GetAsync(endpoint, token),
cancellationToken);
// Can also execute an arbitrary callback without allocating a lambda...
await pipeline.ExecuteAsync(
static async (state, token) => await state.httpClient.GetAsync(state.endpoint, token),
(httpClient, endpoint), // State provided here
cancellationToken);
ResiliencePipeline.ExecuteOutcomeAsync never throws exceptions; instead, it
stores either the result or the exception in an Outcome<T> struct. This is
useful in high-performance scenarios where you wish to avoid re-throwing
exceptions.
Metrics Enrichment
Enrichment adds Cluster name, Process name, Region, Tenant ID, and more to the log as it’s being sent to the telemetry backend; the app code is not involved in this. Metrics enrichment ensure that metric records contain the necessary information to pinpoint failures (e.g., a problematic data center) in distributed systems.
The AddResilienceEnricher extension method on IServiceCollection adds metric
dimensions based on IExceptionSummarizer and RequestMetadata.
Metric tags typically support a limited number of distinct values, compared to
the highly variable output of Exception.ToString().
IExceptionSummarizer.Summarize outputs an ExceptionSummary containing 3
strings:
ExceptionType: Not guaranteed to be a type name. For inner exceptions, also contains the type name of the outer exception.Description: A low-cardinality string suitable for use as a metric dimension.AdditionalDetails: A high-cardinality string intended for low-level diagnostics.
RequestMetadata contains 3 string properties:
DependencyName: The dependency to which the outgoing request is being made.MethodType: The HTTP method, e.g., GET, POST, PUT, PATCH, DELETE, etc.RequestName: Display name for the activity. Defaults toRequestRoute.RequestRoute: Supports redaction, e.g.,/v1/users/{userId}/chats/{chatId}/messages, whereuserIdandchatIdare redacted for being sensitive data.
References
- Introduction to resilient app development - .NET | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
- Meet Polly: The .NET resilience library | Polly. www.pollydocs.org . Accessed Jul 19, 2025.
- Build resilient HTTP apps: Key development patterns - .NET | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
- Exception summarization in C# - .NET | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
- ExceptionSummary Struct (Microsoft.Extensions.Diagnostics.ExceptionSummarization) | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
- RequestMetadata Class (Microsoft.Extensions.Http.Diagnostics) | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
- Resilience pipelines | Polly. www.pollydocs.org . Accessed Jul 19, 2025.
Why should a 5XX server error be considered transient? I assumed that in general, server errors are deterministic, e.g., input
Xtriggers a code path that tries to dereferencenull; retrying such a case won’t help.