Resilient App Development in .NET

Dated Jul 19, 2025; last modified on Sat, 19 Jul 2025

Microsoft.Extensions.Resilience and Microsoft.Extensions.Http.Resilience provide resilience mechanisms against transient failures. These two packages are built on top of the open-source Polly resilience library.

Build a Resilience Pipeline

Given a ServiceCollection services, configure a keyed resilience pipeline as follows:

const string key = "Retry-Timeout";

services.AddResiliencePipeline(key, static builder =>
{
  builder.AddRetry(new RetryStrategyOptions
  {
    ShouldHandle = new PredicateBuilder().Handle<TimeoutRejectedException>()
  });
  builder.AddTimeout(TimeSpan.FromSeconds(1.5));
});

Other Add* extension methods include AddCircuitBreaker, AddRateLimiter, AddConcurrencyLimiter, AddFallback, and AddHedging.

Using AddResiliencePipeline separates the pipeline’s definition from its usage points where it’s injected. This allows for convenient unit testing, e.g., supplying ResiliencePipeline<T>.Empty for faster and less complicated tests.

When adding resilience, you should only add one resilience handler. Multiple resilience strategies are stacked atop one another. The order in which you add them is significant.

Consider this ResiliencePipeline with an outer Timeout and an inner Retry:

ResiliencePipeline pipeline = new ResiliencePipelineBuilder()
  .AddTimeout(TimeSpan.FromSeconds(10)) // outer
  .AddRetry(new()) // inner
  .Build();

Suppose the first and second requests are failing. The third request is not awaited since the overarching timeout elapsed. The sequence diagram is:

DecoratedUserCallbackRetryTimeoutPipelineDecoratedUserCallbackRetryTimeoutPipelineWait startInitial attemptFirst retry attemptSecond retry attemptWait endCallerCalls ExecuteAsync1Calls ExecuteCore2Calls ExecuteCore3Invokes4Performsoperation5Fails6Sleeps7Invokes8Performsoperation9Fails10Sleeps11Invokes12Performsoperation13Times out14Propagates cancellation15Propagates cancellation16Cancellation of callback17Cancellation finished18ThrowsTimeoutRejectedException19Propagates exception20Caller

Build an HTTP Resilience Pipeline

Given an IHTTPClientBuilder httpClientBuilder returned from any of the AddHttpClient methods, one can call httpClientBuilder.AddStandardResilienceHandler() which comes with reasonable defaults, or manually do something like:

services.ConfigureHttpClientDefaults(
  builder => builder.AddStandardResilienceHandler());

services.AddHttpClient("hedgingOnly")
  .RemoveAllResilienceHandlers()
  .AddStandardHedgingHandler();

services.AddHttpClient("custom")
  .AddResilienceHandler("CustomPipeline", static builder =>
  {
    builder.AddRetry(new HttpRetryStrategyOptions
    {
      BackoffType = DelayBackoffType.Linear,
    });

    builder.AddCircuitBreaker(new HttpCircuitBreakerStrategyOptions
    {
      ShouldHandle = static args =>
      {
        return ValueTask.FromResult(args is
        {
          Outcome.Result.StatusCode: HttpStatusCode.RequestTimeout
        });
      }
    });

    builder.AddTimeout(TimeSpan.FromSeconds(5));
  });

services.AddHttpClient("custom-with-reloads")
  .AddResilienceHandler(
    "AdvancedPipeline",
    static (ResiliencePipelineBuilder<HttpResponseMessage> builder, ResilienceHandlerContext context) =>
    {
      // Enable reloads from the "RetryOptions" section in an appsettings.json.
      context.EnableReloads<HttpRetryStrategyOptions>("RetryOptions");
      builder.AddRetry(context.GetOptions<HttpRetryStrategyOptions>("RetryOptions"));
    });

Configuring HTTP Resilience Strategies

The rate limiter pipeline limits the maximum number of concurrent requests being sent to the dependency. Defaults to Queue: 0 and Permit: 1,000.

The total request timeout pipeline ensures that the request, including retry (or hedging) attempts, doesn’t exceed the configured limit. Defaults to TotalTimeout: 30s.

The retry pipeline retries the request in case the dependency is slow or returns an error, e.g., 5XX, 408 (request timeout), 429 (too many requests), HttpRequestException, TimeoutRejectedException. Defaults to MaxRetries: 3, Backoff: Exponential, UseJitter: true, and Delay: 2s. The default retries on all HTTP methods, but can be customized, e.g., Retry.DisableFor(HttpMethod.Post) or Retry.DisableForUnsafeHttpMethods() which covers POST, PATCH, PUT, DELETE, and CONNECT.

Why should a 5XX server error be considered transient? I assumed that in general, server errors are deterministic, e.g., input X triggers a code path that tries to dereference null; retrying such a case won’t help.

The hedging strategy executes the requests against multiple endpoints in case the dependency is slow or returns a transient error. By default, it hedges the URL provided by the original HttpRequestMessage. Defaults to MinAttempts: 1, MaxAttempts: 10, and Delay: 2s.

The circuit breaker pipeline blocks execution if too many direct failures or timeouts are detected, e.g., 5XX, 408 (request timeout), 429 (too many requests), HttpRequestException, TimeoutRejectedException. Defaults to FailureRatio: 10%, MinThroughPut: 100, SamplingDuration: 30s, and BreakDuration: 5s.

The attempt timeout pipeline limits each request attempt duration and throws if its exceeded. Defaults to AttemptTimeout: 10s.

AddStandardResilienceHandler chains 5 resilience strategies in the following order: rate limiter, total request timeout, retry, circuit breaker, and attempt timeout.

AddStandardHedgingHandler chains 5 resilience strategies in the following order: total request timeout, hedging, rate limiter (per endpoint), circuit breaker (per endpoint), and attempt timeout (per endpoint). Route selection is customizable, e.g.,

httpClientBuilder.AddStandardHedgingHandler(static (IRoutingStrategyBuilder builder) =>
{
  builder.ConfigureOrderedGroups(static options => {
    options.Groups.Add(new UriEndpointGroup()
    {
      Endpoints =
      {
        new() { Uri = new("https://example.net/api/v1"), Weight = 97 },
        new() { Uri = new("https://example.net/api/v2"), Weight = 3 }
      }
    });
  });
});

… with other configs like ConfigureWeightedGroups also available.

The hedging example from doesn’t make sense to me. Instead of having the branch in code, isn’t it conventional to define a feature flag and then control its enablement from outside of the app, e.g., and experimentation platform?

Using the Resilience Pipeline

Given a ServiceProvider provider and a string key:

ResiliencePipelineProvider<string> pipelineProvider =
  provider.GetRequiredService<ResiliencePipelineProvider<string>>();
ResiliencePipeline pipeline = pipelineProvider.GetPipeline(key);

await pipeline.ExecuteAsync(
  async token => await httpClient.GetAsync(endpoint, token),
  cancellationToken);

// Can also execute an arbitrary callback without allocating a lambda...
await pipeline.ExecuteAsync(
  static async (state, token) => await state.httpClient.GetAsync(state.endpoint, token),
  (httpClient, endpoint), // State provided here
  cancellationToken);

ResiliencePipeline.ExecuteOutcomeAsync never throws exceptions; instead, it stores either the result or the exception in an Outcome<T> struct. This is useful in high-performance scenarios where you wish to avoid re-throwing exceptions.

Metrics Enrichment

Enrichment adds Cluster name, Process name, Region, Tenant ID, and more to the log as it’s being sent to the telemetry backend; the app code is not involved in this. Metrics enrichment ensure that metric records contain the necessary information to pinpoint failures (e.g., a problematic data center) in distributed systems.

The AddResilienceEnricher extension method on IServiceCollection adds metric dimensions based on IExceptionSummarizer and RequestMetadata.

Metric tags typically support a limited number of distinct values, compared to the highly variable output of Exception.ToString(). IExceptionSummarizer.Summarize outputs an ExceptionSummary containing 3 strings:

  • ExceptionType: Not guaranteed to be a type name. For inner exceptions, also contains the type name of the outer exception.
  • Description: A low-cardinality string suitable for use as a metric dimension.
  • AdditionalDetails: A high-cardinality string intended for low-level diagnostics.

RequestMetadata contains 3 string properties:

  • DependencyName: The dependency to which the outgoing request is being made.
  • MethodType: The HTTP method, e.g., GET, POST, PUT, PATCH, DELETE, etc.
  • RequestName: Display name for the activity. Defaults to RequestRoute.
  • RequestRoute: Supports redaction, e.g., /v1/users/{userId}/chats/{chatId}/messages, where userId and chatId are redacted for being sensitive data.

References

  1. Introduction to resilient app development - .NET | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
  2. Meet Polly: The .NET resilience library | Polly. www.pollydocs.org . Accessed Jul 19, 2025.
  3. Build resilient HTTP apps: Key development patterns - .NET | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
  4. Exception summarization in C# - .NET | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
  5. ExceptionSummary Struct (Microsoft.Extensions.Diagnostics.ExceptionSummarization) | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
  6. RequestMetadata Class (Microsoft.Extensions.Http.Diagnostics) | Microsoft Learn. learn.microsoft.com . Accessed Jul 19, 2025.
  7. Resilience pipelines | Polly. www.pollydocs.org . Accessed Jul 19, 2025.