2023年5月17日 21:28:06go评论147阅读模式

英文:

Why do my grafana tempo ingester pods go into Backoff restarting state after max_block_duration?

问题

I am using grafana-tempo distributed helm chart. It is successfully deployed and its backend is configured on Azure Storage (blob containers) and working fine.

我正在使用grafana-tempo分布式Helm图表。它已成功部署，并且其后端已配置为Azure存储（Blob容器），正常运行。

I have a demo application which is sending traces to grafana-tempo. I can confirm I'm receiving traces.

我有一个演示应用程序，它正在将跟踪信息发送到grafana-tempo。我可以确认我收到了跟踪信息。

The issue I have observed is that exactly after 30m, my ingester pods are going into Back-off restarting state. And I have to manually restart its statefulset.

我观察到的问题是，在30分钟后，我的Ingestor Pod会进入Back-off重新启动状态。我必须手动重新启动其StatefulSet。

While searching the root cause, found that their is one parameter max_block_duration which has a default value of 30m: "max_block_duration: maximum length of time before cutting a block."

在搜索根本原因时，发现有一个参数max_block_duration，其默认值为30分钟："max_block_duration: 在切割块之前的最大时间长度。"

So I tried to increase the timing, and given value 60m. Now after 60 minutes my ingester pods are going into Back-off restarting state.

因此，我尝试增加时间，将值设为60分钟。现在，在60分钟后，我的Ingestor Pod会进入Back-off重新启动状态。

I have also enabled autoscaling. But no new pods are coming up if all ingester pods are in the same error state.

我还启用了自动缩放。但如果所有Ingestor Pod都处于相同的错误状态，将不会出现新的Pod。

Can someone help me out to understand why its happening like this and the possible solution to eleminate the issue?

有人能帮助我理解为什么会发生这种情况以及可能的解决方案吗？

What value should be passed to max_block_duration so that this pods will not so in Back-off restarting?

应该传递什么值给max_block_duration，以便这些Pod不会进入Back-off重新启动状态？

I expect my Ingester pods should run fine every time.

我期望我的Ingestor Pod每次都能正常运行。

英文:

I am using grafana-tempo distributed helm chart. It is successfully deployed and its backend is configured on Azure Storage (blob containers) and working fine.

I have a demo application which is sending traces to grafana-tempo. I can confirm I'm receiving traces.

The issue I have observed is that exactly after 30m, my ingester pods are going into Back-off restarting state. And I have to manually restart its statefulset.

While searching the root cause, found that their is one parameter max_block_duration which has a default value of 30m: "max_block_duration: maximum length of time before cutting a block."

So I tried to increase the timing, and given value 60m.
Now after 60 minutes my ingester pods are going into Back-off restarting state.

I have also enabled autoscaling. But no new pods are coming up if all ingester pods are in the same error state.

Can someone help me out to understand why its happening like this and the possible solution to eleminate the issue?

What value should be passed to max_block_duration so that this pods will not so in Back-off restarting?

I expect my Ingester pods should run fine every time.

答案1

得分: 0

我也在tempo的GitHub上提出了一个问题。
现在这个问题在我的端上已经不存在了。
如果有人也遇到同样的问题，可以查看我的GitHub问题，以获取更多信息：https://github.com/grafana/tempo/issues/2488

英文:

I also opened a github issue on tempo.
And now this issue no more exist at my end.
If someone is also facing same, you can have a look into my github issue to get some more insights : https://github.com/grafana/tempo/issues/2488

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Why do my grafana tempo ingester pods go into Backoff restarting state after max_block_duration?

问题

答案1

监控多个Airflow实例

App is failing with "required a bean of type 'brave.Tracer' that could not be found" while trying to get Sleuth's trace

应用Prometheus指标于特定部署

如何在IntelliJ中逐步调试Swing应用程序。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论