OpenTelemetry Lambda层

huangapple go评论92阅读模式
英文:

OpenTelemetry Lambda Layer

问题

有没有办法减少Lambda Layer丢弃事件?它在将跟踪数据发送到中央收集器之前一直在丢弃跟踪数据。在导出跟踪数据之前,它会获取令牌以进行授权发送到中央收集器。但是,由于Lambda函数的执行已经完成,它不会推送被丢弃的跟踪数据。

Lambda扩展层参考链接:https://github.com/open-telemetry/opentelemetry-lambda/tree/main/collector

导出器错误:

导出失败。没有更多的重试机会。正在丢弃数据。
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlp",
    "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded",
    "dropped_items": 8
}
英文:

Is there any way to lessen the Lambda Layer dropped events? It keeps on dropping the traces before they reached the central collector. Before it exports the traces, it will then fetch the token to make an authorized sending of traces to the central collector. But it does not push the traces as it is being dropped because the lambda function execution is already done.

Lambda Extension Layer Reference: https://github.com/open-telemetry/opentelemetry-lambda/tree/main/collector

Exporter Error:

Exporting failed. No more retries left. Dropping data.
{
    "kind": "exporter",
    "data_type": "traces",
    "name": "otlp",
    "error": "max elapsed time expired rpc error: code = DeadlineExceeded desc = context deadline exceeded",
    "dropped_items": 8
}

答案1

得分: 2

我遇到了同样的问题,并进行了一些研究。

不幸的是,在最新版本的AWS Distro for OpenTelemetry Lambda (ADOT Lambda)中,这是一个已知的问题,尚未解决。

Github问题票据:

简短回答:目前,otel收集器扩展在向导环境发送数据到导出器时不可靠,因为它在发送数据时被lambda环境冻结。作为解决方法,您可以直接将跟踪发送到在lambda容器外运行的收集器。

问题是:

  • lambda在执行期间将跟踪发送给收集器扩展进程
  • 收集器将它们排队以发送到配置的导出器
  • 收集器扩展在告诉lambda环境扩展已完成之前不等待收集器完成处理其队列;相反,它总是立即告诉环境扩展已完成,而不查看收集器正在做什么
  • 当lambda完成时,扩展已经完成,因此lambda容器会被冻结,直到下一次lambda调用。
  • 当下一个lambda调用到达时,容器会解冻。如果下一个调用很快到来并且持续时间足够长,收集器可能能够在发送跟踪到导出器之前完成发送。否则,在发送完成之前,与后端系统的连接将超时。

解决方案的复杂之处在于,扩展很难检测到主lambda是否已完成处理。

理想情况下,遥测扩展应该:

  1. 等待lambda完成处理
  2. 检查lambda是否发送了任何要处理和转发的数据
  3. 等待所有处理和转发完成(如果有)
  4. 向lambda环境发出扩展已完成的信号

lambda扩展协议不会告诉扩展主lambda何时完成处理(如果AWS能够将其作为新的事件类型添加到扩展协议中,那将非常好)。

有一个提议的PR试图通过假设lambda始终发送跟踪来解决此问题,因此它不是等待lambda完成,而是等待OTLP接收器的TCP请求到达。这种方法有效,但如果lambda从未发送任何跟踪,扩展将永远挂起。

注意:我们在这里看到的跟踪问题也存在于指标中。

英文:

I encountered the same problem and did some research.
Unfortunately, it is a known issue that has not been resolved yet in the latest version of AWS Distro for OpenTelemetry Lambda (ADOT Lambda)

Github issue tickets:

The short answer: currently the otel collector extension does not work reliably as it gets frozen by the lamda environment while it is still sending data to the exporters. As a workaround, you can send the traces directly to a collector running outside the lambda container.

The problem is:

  • the lambda sends the traces to the collector extension process during its execution
  • the collector queues them for sending them on to the configured exporters
  • the collector extension does not wait for the collector to finish processing its queue before telling the lambda environment that the extension is done; instead it always immediately tells the environment immediately that it's done, without looking at what the collector is doing
  • when the lambda is done, the extension is already done, so the lambda container is frozen until the next lambda invocation.
  • the container is thawed when the next lambda invocation arrives. if the next invocation comes soon and takes long enough, the collector may be able to finish sending the traces to the exporters. if not, the connection to the backend system times out before sending is complete.

What complicates the solution is that it is very hard for an extension to detect whether the main lambda has finished processing.

Ideally, a telemetry extension would:

  1. Wait for the lambda to finish processing
  2. Check if the lambda sent it any data to process and forward
  3. Wait for all processing and forwarding to complete (if any)
  4. Signal to the lambda environment that the extension is done

The lambda extension protocol doesn't tell the extension when the main lambda has finished processing (it would be great if AWS could add that to the extension protocol as a new event type).

There is a proposed PR that tries to work around this by assuming that lambdas always send traces, so instead of waiting for the lambda to complete, it waits for a TCP request to the OTLP receiver to arrive. This works, but it makes the extension hang forever if the lambda never sends any traces.

Note: the same problem that we see here for traces also exists for metrics.

huangapple
  • 本文由 发表于 2022年11月10日 07:47:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/74382655.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定