从零散文本到关联可观测性:Serilog与OpenTelemetry重塑.NET应用调试体验

作者:微信公众号:【架构师老卢】
9-23 14:43
94

适用于现代.NET应用程序的Serilog和OpenTelemetry架构 当您的.NET应用程序在生产环境凌晨3点抛出一个难以理解的错误时,您最不愿意做的事情就是翻阅成千上万的非结构化日志文件,试图拼凑出问题所在。传统的日志记录感觉就像大海捞针——不同的是,这个"草堆"可能正在着火,而且那根"针"甚至可能不存在。

Serilog和OpenTelemetry登场:这对强力组合将日志记录从一种必要的麻烦转变为理解分布式系统的秘密武器。

按回车键或点击以全尺寸查看图片

传统日志记录与结构化日志记录对比

传统日志记录的问题 想象一下:您的微服务架构横跨15个不同的服务,每个服务都像这样输出日志:

2025-09-10 14:32:17 INFO: Processing request for user John
2025-09-10 14:32:18 ERROR: Database timeout occurred
2025-09-10 14:32:19 INFO: Retrying operation

现在回答这些问题:

  • 哪个用户触发了错误?
  • 原始请求是什么?
  • 哪个服务实际失败了?
  • 整个请求花了多长时间?

使用传统日志记录,您就像在用不完整的证据进行侦探工作。

为什么Serilog + OpenTelemetry是游戏规则改变者

使用Serilog进行结构化日志记录 Serilog不是转储文本,而是创建机器可以理解的结构化数据:

// 传统方式(不佳)
_logger.LogInformation($"User {userId} ordered {itemCount} items for ${totalAmount}");

// Serilog结构化方式(佳)
_logger.LogInformation("User {UserId} completed order {OrderId} with {ItemCount} items for {TotalAmount:C}", 
    userId, orderId, itemCount, totalAmount);

这会生成如下所示的JSON:

{
  "timestamp": "2025-09-10T14:32:17.123Z",
  "level": "Information",
  "messageTemplate": "User {UserId} completed order {OrderId} with {ItemCount} items for {TotalAmount:C}",
  "message": "User john.doe completed order ORD-12345 with 3 items for $299.99",
  "properties": {
    "UserId": "john.doe",
    "OrderId": "ORD-12345", 
    "ItemCount": 3,
    "TotalAmount": 299.99
  }
}

现在您可以查询:"显示所有超过200美元的订单"或"查找用户john.doe的所有错误"。

OpenTelemetry:缺失的一环 OpenTelemetry添加了关联层,连接整个分布式系统中的日志。每条日志都会自动丰富以下信息:

  • TraceId:跨所有服务跟踪单个用户请求
  • SpanId:标识该请求中的特定操作
  • 服务上下文:哪个服务、版本和环境

设置这对强力组合

步骤1:安装所需的NuGet包

dotnet add package Serilog.AspNetCore
dotnet add package Serilog.Sinks.OpenTelemetry
dotnet add package OpenTelemetry.Extensions.Hosting
dotnet add package OpenTelemetry.Instrumentation.AspNetCore
dotnet add package OpenTelemetry.Exporter.OpenTelemetryProtocol

步骤2:配置您的Program.cs 以下是提供具有完整可观测性的结构化日志记录的完整设置:

using Serilog;
using OpenTelemetry.Logs;
using OpenTelemetry.Metrics;
using OpenTelemetry.Trace;

// 首先配置Serilog
Log.Logger = new LoggerConfiguration()
    .MinimumLevel.Information()
    .MinimumLevel.Override("Microsoft.AspNetCore", LogEventLevel.Warning)
    .Enrich.FromLogContext()
    .Enrich.WithProperty("Application", "YourAppName")
    .Enrich.WithProperty("Environment", Environment.GetEnvironmentVariable("ASPNETCORE_ENVIRONMENT"))
    .WriteTo.Console(new JsonFormatter()) // 结构化控制台输出
    .WriteTo.OpenTelemetry(options =>
    {
        options.Endpoint = "http://localhost:4317"; // OTLP端点
        options.Protocol = OtlpProtocol.Grpc;
        options.ResourceAttributes = new Dictionary<string, object>
        {
            ["service.name"] = "your-service-name",
            ["service.version"] = "1.0.0"
        };
    })
    .CreateLogger();

var builder = WebApplication.CreateBuilder(args);

// 使用Serilog进行日志记录
builder.Host.UseSerilog();

// 配置OpenTelemetry
builder.Services.AddOpenTelemetry()
    .WithTracing(tracing => tracing
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddEntityFrameworkCoreInstrumentation() // 如果使用EF Core
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://localhost:4317");
        }))
    .WithMetrics(metrics => metrics
        .AddAspNetCoreInstrumentation()
        .AddHttpClientInstrumentation()
        .AddOtlpExporter(options =>
        {
            options.Endpoint = new Uri("http://localhost:4317");
        }));

var app = builder.Build();

// 添加请求日志记录中间件
app.UseSerilogRequestLogging(options =>
{
    options.MessageTemplate = "HTTP {RequestMethod} {RequestPath} responded {StatusCode} in {Elapsed:0.0000} ms";
    options.EnrichDiagnosticContext = (diagnosticContext, httpContext) =>
    {
        diagnosticContext.Set("RequestHost", httpContext.Request.Host.Value);
        diagnosticContext.Set("RequestScheme", httpContext.Request.Scheme);
        diagnosticContext.Set("UserAgent", httpContext.Request.Headers["User-Agent"].FirstOrDefault());
        // 添加自定义业务上下文
        if (httpContext.User.Identity.IsAuthenticated)
        {
            diagnosticContext.Set("UserId", httpContext.User.FindFirst("sub")?.Value);
        }
    };
});

app.Run();

步骤3:设置OpenTelemetry Collector 创建docker-compose.yml以运行本地可观测性堆栈:

version: '3.8'
services:
  # OpenTelemetry Collector
  otel-collector:
    image: otel/opentelemetry-collector-contrib:latest
    container_name: otel-collector
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317"   # OTLP gRPC接收器
      - "4318:4318"   # OTLP HTTP接收器
      - "8889:8889"   # Prometheus指标
    depends_on:
      - jaeger
      - prometheus

  # Jaeger用于追踪
  jaeger:
    image: jaegertracing/all-in-one:latest
    container_name: jaeger
    ports:
      - "16686:16686"
      - "14250:14250"
    environment:
      - COLLECTOR_OTLP_ENABLED=true

  # Prometheus用于指标
  prometheus:
    image: prom/prometheus:latest
    container_name: prometheus
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml

  # Grafana用于可视化
  grafana:
    image: grafana/grafana:latest
    container_name: grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin

创建otel-collector-config.yaml:

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  resource:
    attributes:
      - key: environment
        value: development
        action: upsert

exporters:
  # 将追踪导出到Jaeger
  jaeger:
    endpoint: jaeger:14250
    tls:
      insecure: true

  # 将指标导出到Prometheus
  prometheus:
    endpoint: "0.0.0.0:8889"

  # 将日志导出到控制台(您可以在此处添加Loki)
  logging:
    loglevel: debug

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [jaeger]
    metrics:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [prometheus]
    logs:
      receivers: [otlp]
      processors: [batch, resource]
      exporters: [logging]

启动堆栈:

docker-compose up -d

高级日志记录模式

1. 使用作用域的上下文日志记录 添加强制应用于作用域内所有日志的业务上下文:

public class OrderService
{
    private readonly ILogger<OrderService> _logger;

    public async Task ProcessOrderAsync(int orderId, string userId)
    {
        // 创建带有上下文的日志记录作用域
        using var scope = _logger.BeginScope(new Dictionary<string, object>
        {
            ["OrderId"] = orderId,
            ["UserId"] = userId,
            ["Operation"] = "ProcessOrder"
        });

        _logger.LogInformation("Starting order processing");

        try
        {
            await ValidateOrderAsync(orderId);
            await ChargePaymentAsync(orderId);
            await FulfillOrderAsync(orderId);
            _logger.LogInformation("Order processing completed successfully");
        }
        catch (Exception ex)
        {
            _logger.LogError(ex, "Order processing failed");
            throw;
        }
    }
}

此作用域内的每条日志都会自动包含OrderId、UserId和Operation。

2. 用于业务上下文的自定义扩展器 创建添加一致业务上下文的扩展器:

public class TenantEnricher : ILogEventEnricher
{
    private readonly IHttpContextAccessor _contextAccessor;

    public TenantEnricher(IHttpContextAccessor contextAccessor)
    {
        _contextAccessor = contextAccessor;
    }

    public void Enrich(LogEvent logEvent, ILogEventPropertyFactory propertyFactory)
    {
        var context = _contextAccessor.HttpContext;
        if (context?.User?.Identity?.IsAuthenticated == true)
        {
            var tenantId = context.User.FindFirst("tenant_id")?.Value;
            if (!string.IsNullOrEmpty(tenantId))
            {
                logEvent.AddOrUpdateProperty(propertyFactory.CreateProperty("TenantId", tenantId));
            }
        }
    }
}

// 在Program.cs中注册
builder.Services.AddSingleton<IHttpContextAccessor, HttpContextAccessor>();
Log.Logger = new LoggerConfiguration()
    .Enrich.With<TenantEnricher>()
    // ... 其他配置
    .CreateLogger();

3. 性能关键的日志记录 对于高吞吐量场景,使用源生成的日志记录:

public partial class OrderService
{
    private readonly ILogger<OrderService> _logger;

    [LoggerMessage(
        EventId = 1001,
        Level = LogLevel.Information,
        Message = "Processing order {OrderId} for user {UserId} with {ItemCount} items totaling {TotalAmount:C}")]
    public static partial void LogOrderProcessing(ILogger logger, int orderId, string userId, int itemCount, decimal totalAmount);

    [LoggerMessage(
        EventId = 1002,
        Level = LogLevel.Error,
        Message = "Failed to process order {OrderId}: {ErrorReason}")]
    public static partial void LogOrderProcessingError(ILogger logger, Exception exception, int orderId, string errorReason);

    public async Task ProcessOrderAsync(Order order)
    {
        LogOrderProcessing(_logger, order.Id, order.UserId, order.Items.Count, order.TotalAmount);

        try
        {
            // 处理订单...
        }
        catch (Exception ex)
        {
            LogOrderProcessingError(_logger, ex, order.Id, ex.Message);
            throw;
        }
    }
}

这会生成零分配的日志记录代码,以实现最佳性能。

生产环境最佳实践

1. 安全和敏感数据 切勿记录敏感信息。使用Serilog的解构策略来清理数据:

public class SensitiveDataPolicy : IDestructuringPolicy
{
    public bool TryDestructure(object value, ILogEventPropertyValueFactory propertyValueFactory, out LogEventPropertyValue result)
    {
        result = null;

        if (value is CreditCard card)
        {
            result = propertyValueFactory.CreatePropertyValue(new 
            {
                Last4Digits = card.Number?.Substring(card.Number.Length - 4),
                ExpiryMonth = card.ExpiryMonth,
                ExpiryYear = card.ExpiryYear
                // 切勿记录完整号码或CVV
            });
            return true;
        }

        return false;
    }
}

Log.Logger = new LoggerConfiguration()
    .Destructure.With<SensitiveDataPolicy>()
    // ... 其他配置
    .CreateLogger();

2. 特定环境配置 为每个环境使用不同的日志记录配置:

public static void ConfigureLogging(WebApplicationBuilder builder)
{
    var environment = builder.Environment.EnvironmentName;

    var loggerConfig = new LoggerConfiguration()
        .ReadFrom.Configuration(builder.Configuration);

    if (environment == "Development")
    {
        loggerConfig
            .MinimumLevel.Debug()
            .WriteTo.Console(new JsonFormatter());
    }
    else if (environment == "Production")
    {
        loggerConfig
            .MinimumLevel.Information()
            .MinimumLevel.Override("Microsoft", LogEventLevel.Warning)
            .WriteTo.OpenTelemetry(options =>
            {
                options.Endpoint = builder.Configuration["OpenTelemetry:Endpoint"];
                options.Headers = GetAuthHeaders(builder.Configuration);
            });
    }

    Log.Logger = loggerConfig.CreateLogger();
}

3. 性能监控 监控日志记录性能以避免影响应用程序性能:

// 添加用于监控日志记录性能的指标
public class LoggingMetrics
{
    private readonly Counter<long> _logEventsCounter;
    private readonly Histogram<double> _logProcessingDuration;

    public LoggingMetrics(IMeterFactory meterFactory)
    {
        var meter = meterFactory.Create("MyApp.Logging");
        _logEventsCounter = meter.CreateCounter<long>("log_events_total");
        _logProcessingDuration = meter.CreateHistogram<double>("log_processing_duration_ms");
    }

    public void RecordLogEvent(LogEventLevel level)
    {
        _logEventsCounter.Add(1, new KeyValuePair<string, object>("level", level.ToString()));
    }
}

常见陷阱及如何避免

1. 过度记录 问题:记录所有内容会导致噪音和成本增加。 解决方案:使用适当的日志级别并按命名空间配置最低级别:

.MinimumLevel.Information()
.MinimumLevel.Override("Microsoft.AspNetCore", LogEventLevel.Warning)
.MinimumLevel.Override("Microsoft.EntityFrameworkCore", LogEventLevel.Error)

2. 阻塞应用程序线程 问题:同步日志记录会降低应用程序速度。 解决方案:使用异步接收器和批处理:

.WriteTo.Async(a => a.OpenTelemetry(options =>
{
    options.Endpoint = "http://localhost:4317";
    options.BatchingOptions = new BatchingOptions
    {
        BatchSizeLimit = 1000,
        Period = TimeSpan.FromSeconds(2)
    };
}))

3. 缺失关联上下文 问题:跨服务边界的日志未正确关联。 解决方案:确保HTTP调用中的TraceId传播:

builder.Services.AddHttpClient<ExternalApiClient>(client =>
{
    client.BaseAddress = new Uri("https://api.external.com");
})
.AddHttpMessageHandler<CorrelationIdHandler>();

public class CorrelationIdHandler : DelegatingHandler
{
    protected override async Task<HttpResponseMessage> SendAsync(HttpRequestMessage request, CancellationToken cancellationToken)
    {
        var activity = Activity.Current;
        if (activity != null)
        {
            request.Headers.Add("X-Correlation-ID", activity.TraceId.ToString());
        }
        return await base.SendAsync(request, cancellationToken);
    }
}

监控和告警 在结构化日志上设置告警:

# Prometheus的示例告警规则
groups:
  - name: application.alerts
    rules:
      - alert: HighErrorRate
        expr: rate(log_events_total{level="Error"}[5m]) > 0.1
        for: 2m
        labels:
          severity: warning
        annotations:
          summary: "检测到高错误率"
          description: "错误率为每秒 {{ $value }} 个错误"
      - alert: DatabaseErrors
        expr: increase(log_events_total{level="Error",logger=~".*Repository.*"}[1m]) > 5
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "检测到数据库错误激增"

结果:前后对比

| 方面 | 之前(传统) | 之后(Serilog + OpenTelemetry) | | :--- | :--- | :--- | | 调试时间 | 数小时的日志搜索 | 几分钟的结构化查询 | | 跨服务追踪 | 手动关联 | 通过TraceId自动关联 | | 查询能力 | 文本搜索/grep | 丰富的结构化查询 | | 告警 | 日志量阈值 | 业务逻辑告警 | | 性能影响 | 可变 | 可预测且优化 | | 团队效率 | 个人侦探工作 | 协作式可观测性 |

入门清单

  • [ ] 安装Serilog和OpenTelemetry包
  • [ ] 配置具有JSON输出的结构化日志记录
  • [ ] 使用Docker设置OpenTelemetry Collector
  • [ ] 为您的业务领域添加上下文扩展器
  • [ ] 按环境配置不同的日志级别
  • [ ] 实现敏感数据过滤
  • [ ] 设置基本告警规则
  • [ ] 培训团队进行结构化查询

核心要点 Serilog + OpenTelemetry不仅仅是更好的日志记录——它是一种可观测性,改变了您理解和调试.NET应用程序的方式。

当凌晨3点的警报响起时,您将拥有:

  • 可以立即查询的结构化数据
  • 跨所有服务的完整关联
  • 讲述完整故事的丰富上下文
  • 与日志并行的性能指标
相关留言评论
昵称:
邮箱:
阅读排行