Microsoft.Extensions.Resilience
and Microsoft.Extensions.Http.Resilience
provide resilience mechanisms against transient failures. These two packages are
built on top of the open-source Polly
resilience library.
Build a Resilience Pipeline
Given a ServiceCollection services
, configure a keyed resilience pipeline as
follows:
const string key = "Retry-Timeout";
services.AddResiliencePipeline(key, static builder =>
{
builder.AddRetry(new RetryStrategyOptions
{
ShouldHandle = new PredicateBuilder().Handle<TimeoutRejectedException>()
});
builder.AddTimeout(TimeSpan.FromSeconds(1.5));
});
Other Add*
extension methods include AddCircuitBreaker
, AddRateLimiter
,
AddConcurrencyLimiter
, AddFallback
, and AddHedging
.
Using AddResiliencePipeline
separates the pipeline’s definition from its usage
points where it’s injected. This allows for convenient unit testing, e.g.,
supplying ResiliencePipeline<T>.Empty
for faster and less complicated tests.
When adding resilience, you should only add one resilience handler. Multiple resilience strategies are stacked atop one another. The order in which you add them is significant.
Consider this ResiliencePipeline
with an outer Timeout
and an inner Retry
:
ResiliencePipeline pipeline = new ResiliencePipelineBuilder()
.AddTimeout(TimeSpan.FromSeconds(10)) // outer
.AddRetry(new()) // inner
.Build();
Suppose the first and second requests are failing. The third request is not awaited since the overarching timeout elapsed. The sequence diagram is:
Build an HTTP Resilience Pipeline
Given an IHTTPClientBuilder httpClientBuilder
returned from any of the
AddHttpClient
methods, one can call
httpClientBuilder.AddStandardResilienceHandler()
which comes with reasonable
defaults, or manually do something like:
services.ConfigureHttpClientDefaults(
builder => builder.AddStandardResilienceHandler());
services.AddHttpClient("hedgingOnly")
.RemoveAllResilienceHandlers()
.AddStandardHedgingHandler();
services.AddHttpClient("custom")
.AddResilienceHandler("CustomPipeline", static builder =>
{
builder.AddRetry(new HttpRetryStrategyOptions
{
BackoffType = DelayBackoffType.Linear,
});
builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
{
ShouldHandle = static args =>
{
return ValueTask.FromResult(args is
{
Outcome.Result.StatusCode: HttpStatusCode.RequestTimeout
});
}
});
builder.AddTimeout(TimeSpan.FromSeconds(5));
});
services.AddHttpClient("custom-with-reloads")
.AddResilienceHandler(
"AdvancedPipeline",
static (ResiliencePipelineBuilder<HttpResponseMessage> builder, ResilienceHandlerContext context) =>
{
// Enable reloads from the "RetryOptions" section in an appsettings.json.
context.EnableReloads<HttpRetryStrategyOptions>("RetryOptions");
builder.AddRetry(context.GetOptions<HttpRetryStrategyOptions>("RetryOptions"));
});
Configuring HTTP Resilience Strategies
The rate limiter pipeline limits the maximum number of concurrent requests
being sent to the dependency. Defaults to Queue: 0
and Permit: 1,000
.
The total request timeout pipeline ensures that the request, including
retry (or hedging) attempts, doesn’t exceed the configured limit. Defaults to
TotalTimeout: 30s
.
The retry pipeline retries the request in case the dependency is slow or
returns an error, e.g., 5XX, 408 (request timeout), 429 (too many requests),
HttpRequestException
, TimeoutRejectedException
. Defaults to MaxRetries: 3
,
Backoff: Exponential
, UseJitter: true
, and Delay: 2s
. The default retries
on all HTTP methods, but can be customized, e.g.,
Retry.DisableFor(HttpMethod.Post)
or Retry.DisableForUnsafeHttpMethods()
which covers POST
, PATCH
, PUT
, DELETE
, and CONNECT
.
The hedging strategy executes the requests against multiple endpoints in
case the dependency is slow or returns a transient error. By default, it hedges
the URL provided by the original HttpRequestMessage
. Defaults to MinAttempts: 1
, MaxAttempts: 10
, and Delay: 2s
.
The circuit breaker pipeline blocks execution if too many direct failures or
timeouts are detected, e.g., 5XX, 408 (request timeout), 429 (too many
requests), HttpRequestException
, TimeoutRejectedException
. Defaults to
FailureRatio: 10%
, MinThroughPut: 100
, SamplingDuration: 30s
, and
BreakDuration: 5s
.
The attempt timeout pipeline limits each request attempt duration and throws
if its exceeded. Defaults to AttemptTimeout: 10s
.
AddStandardResilienceHandler
chains 5 resilience strategies in the following
order: rate limiter, total request timeout, retry, circuit breaker, and attempt
timeout.
AddStandardHedgingHandler
chains 5 resilience strategies in the following
order: total request timeout, hedging, rate limiter (per endpoint), circuit
breaker (per endpoint), and attempt timeout (per endpoint). Route selection is
customizable, e.g.,
httpClientBuilder.AddStandardHedgingHandler(static (IRoutingStrategyBuilder builder) =>
{
builder.ConfigureOrderedGroups(static options => {
options.Groups.Add(new UriEndpointGroup()
{
Endpoints =
{
new() { Uri = new("https://example.net/api/v1"), Weight = 97 },
new() { Uri = new("https://example.net/api/v2"), Weight = 3 }
}
});
});
});
… with other configs like ConfigureWeightedGroups
also available.
Using the Resilience Pipeline
Given a ServiceProvider provider
and a string key
:
ResiliencePipelineProvider<string> pipelineProvider =
provider.GetRequiredService<ResiliencePipelineProvider<string>>();
ResiliencePipeline pipeline = pipelineProvider.GetPipeline(key);
await pipeline.ExecuteAsync(
async token => await httpClient.GetAsync(endpoint, token),
cancellationToken);
// Can also execute an arbitrary callback without allocating a lambda...
await pipeline.ExecuteAsync(
static async (state, token) => await state.httpClient.GetAsync(state.endpoint, token),
(httpClient, endpoint), // State provided here
cancellationToken);
ResiliencePipeline.ExecuteOutcomeAsync
never throws exceptions; instead, it
stores either the result or the exception in an Outcome<T>
struct. This is
useful in high-performance scenarios where you wish to avoid re-throwing
exceptions.
Metrics Enrichment
Enrichment adds Cluster name, Process name, Region, Tenant ID, and more to the log as it’s being sent to the telemetry backend; the app code is not involved in this. Metrics enrichment ensure that metric records contain the necessary information to pinpoint failures (e.g., a problematic data center) in distributed systems.
The AddResilienceEnricher
extension method on IServiceCollection
adds metric
dimensions based on IExceptionSummarizer
and RequestMetadata
.
Metric tags typically support a limited number of distinct values, compared to
the highly variable output of Exception.ToString()
.
IExceptionSummarizer.Summarize
outputs an ExceptionSummary
containing 3
strings:
ExceptionType
: Not guaranteed to be a type name. For inner exceptions, also contains the type name of the outer exception.Description
: A low-cardinality string suitable for use as a metric dimension.AdditionalDetails
: A high-cardinality string intended for low-level diagnostics.
RequestMetadata
contains 3 string properties:
DependencyName
: The dependency to which the outgoing request is being made.MethodType
: The HTTP method, e.g., GET, POST, PUT, PATCH, DELETE, etc.RequestName
: Display name for the activity. Defaults toRequestRoute
.RequestRoute
: Supports redaction, e.g.,/v1/users/{userId}/chats/{chatId}/messages
, whereuserId
andchatId
are redacted for being sensitive data.
Why should a 5XX server error be considered transient? I assumed that in general, server errors are deterministic, e.g., input
X
triggers a code path that tries to dereferencenull
; retrying such a case won’t help.