OpenTelemetry: ActivitySource.StartActivity 在存在已连接的监听器时返回 null 活动。

huangapple go评论51阅读模式
英文:

OpenTelemetry: ActivitySource.StartActivity returns null activity when there are listeners hooked

问题

我使用:

OpenTelemetry 1.4.0
OpenTelemetry.Extensions.Hosting 1.4.0
OpenTelemetry.Instrumentation.AspNetCore 1.0.0-rc9.14
运行时版本:.NET 6.0 ASP.NET WebApi,使用 Docker 基础镜像 6.0-alpine3.17

我有两个服务 A 和 B。服务 A 暴露了 REST API 端点,服务 B 也暴露了 REST API 端点。

在服务 A 中注册 OpentTelemetry 如下:

public static class OTRegistration
{
    private static readonly ActivitySource _activitySource = new ActivitySource(Assembly.GetExecutingAssembly().GetName().Name!, "1.0.0");

    public static void AddOT(this IServiceCollection services)
    {
        services.AddOpenTelemetry()
            .WithTracing(tracerProviderBuilder =>
                tracerProviderBuilder
                .AddSource(_activitySource.Name)
                .ConfigureResource(resource => resource.AddService(_activitySource.Name))
                .AddAspNetCoreInstrumentation()
            );
    }
}

客户端调用服务 A 端点,然后该端点调用服务 B。在对服务 B 的调用中,服务 A 有时会随机发送 traceparent 标头,有时以 00 结尾,有时以 01 结尾,例如:00-000000000000000056473954588e71ac-36ae11cc57b1e9c1-00

当以 00 结尾时,服务 B 会创建空活动:

Activity? activity = _activitySource.StartActivity("TestActivity");

我再次检查了服务 B 中的监听器是否在活动创建时被挂钩。我添加了一些日志来证明:

Activity was created as null. | OperationName=file.uploaded, traceparent=00-000000000000000056473954588e71ac-36ae11cc57b1e9c1-00, HasListeners=True.
Activity was created as not null. | OperationName=file.uploaded, traceparent=00-0000000000000000517d22c0e0b634be-5b066682501cc9b8-01, HasListeners=True, RootId=0000000000000000517d22c0e0b634be, ParentId=00-0000000000000000517d22c0e0b634be-a258d1b5dddfc7de-01, Id=00-0000000000000000517d22c0e0b634be-58691a9e53e3ae37-01.

我注意到 https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.activitytraceflags?view=net-6.0 有两个值,None (0) 和 Recorded (1),所以我怀疑 00 标志表示不会创建活动。

有关为什么服务 A 有时以 00 结尾,有时以 01 结尾的任何想法吗?如何控制这种行为以消除随机性?

英文:

I use:

OpenTelemetry 1.4.0
OpenTelemetry.Extensions.Hosting 1.4.0
OpenTelemetry.Instrumentation.AspNetCore 1.0.0-rc9.14
and runtime version: .NET 6.0 ASP.NET WebApi using docker base image 6.0-alpine3.17

I have 2 services A and B. Service A exposes REST API endpoint and also service B exposes REST API endpoint.

OpentTelemetry in service A is registered like this:

public static class OTRegistration
{
	private static readonly ActivitySource _activitySource = new ActivitySource(Assembly.GetExecutingAssembly().GetName().Name!, "1.0.0");

	public static void AddOT(this IServiceCollection services)
	{
		services.AddOpenTelemetry()
			.WithTracing(tracerProviderBuilder =>
				tracerProviderBuilder
				.AddSource(_activitySource.Name)
				.ConfigureResource(resource => resource.AddService(_activitySource.Name))
				.AddAspNetCoreInstrumentation()
				);
	}
}

A client calls service A endpoint and next this endpoint calls service B. In a call to service B service A randomly sends traceparent header which ends on sometimes 00 and sometimes 01, for example: 00-000000000000000056473954588e71ac-36ae11cc57b1e9c1-00.
When it ends on 00 service B creates null activities:

Activity? activity = _activitySource.StartActivity("TestActivity");

I double checked that in service B is hooked listener when the activity is created. I added some logs to prove it:

Activity was created as null. | OperationName=file.uploaded, traceparent=00-000000000000000056473954588e71ac-36ae11cc57b1e9c1-00, HasListeners=True.
Activity was created as not null. | OperationName=file.uploaded, traceparent=00-0000000000000000517d22c0e0b634be-5b066682501cc9b8-01, HasListeners=True, RootId=0000000000000000517d22c0e0b634be, ParentId=00-0000000000000000517d22c0e0b634be-a258d1b5dddfc7de-01, Id=00-0000000000000000517d22c0e0b634be-58691a9e53e3ae37-01.

I see that https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.activitytraceflags?view=net-6.0 has to values None (0) and Recorded (1) so that`s why I suspect that 00 flag means that activity will not be created.

Any ideas why service A sometimes ends the traceparent with 00 and sometimes with 01?
How to control this behavior to not have any randomness?

答案1

得分: 0

traceparent 的最后一部分确定是否应该对活动进行采样: https://www.w3.org/TR/trace-context/#trace-flags

Sampler 负责设置/确定是否应该记录/创建活动。

默认情况下,Sample 设置为 parentbased_always_on。这意味着如果父级未记录,则没有理由创建新活动。

您可以通过调用 .SetSampler(new AlwaysOnSampler()) 在服务 B 中覆盖此功能,或者检查服务 A 为什么决定不记录活动(也许它是从服务 A 的客户端传播过来的?)。

英文:

The last part of traceparent determines if activity should be sampled or no: https://www.w3.org/TR/trace-context/#trace-flags

Sampler is responsible to set/determine if the activity should be recorder/created.

By default Sample is set to parentbased_always_on. It means that if the parent was not recorder, then the there is no reason to create new acitivity.

You can overwrite this functionality by call .SetSampler(new AlwaysOnSampler()) in Service B or check why Service A decided not to record activity (maybe it is propagated from the client of Service A?).

huangapple
  • 本文由 发表于 2023年5月29日 18:13:50
  • 转载请务必保留本文链接:https://go.coder-hub.com/76356454.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定