英文:
Clickhouse: n/n brokers are down
问题
我们在我们的三节点Clickhouse暂存集群中使用ENGINE = Kafka()
创建了一些表格。
在日志中,我们经常看到这个错误:
[rdk:ERROR] [thrd:GroupCoordinator]: 15/15个代理都宕机了
我确定代理都没有宕机(一个也没有)。同时检查了主题,目前有16个已消耗的主题都没有出现任何滞后。
通过以下方式重启Clickhouse服务:
systemctl restart clickhouse-server
可以使错误消失(同时删除并重新创建表格)。
我希望禁用DNS缓存会有帮助,但实际上没有。这是否可能发生在数据量很少的情况下?
还有其他我可以尝试的想法吗?
SELECT version()
┌─version()─┐
│ 23.1.3.5 │
└───────────┘
英文:
We have set up a few tables with ENGINE = Kafka()
across our three node Clickhouse staging cluster.
In the logs we can frequently see this error:
[rdk:ERROR] [thrd:GroupCoordinator]: 15/15 brokers are down
I am sure the brokers are not down (not a single one). Also checking the topics, none of the currently 16 consumed topics have developed any lag.
Restarting the Clickhouse service with:
systemctl restart clickhouse-server
makes the errors go away (deleting and recreating the tables as well).
I was hoping disabling the DNS cache would help, but it didn't. Is it possible these occur when there is not much data?
Or any other ideas I could try?
SELECT version()
┌─version()─┐
│ 23.1.3.5 │
└───────────┘
答案1
得分: 1
-
Kafka状态。确保所有的代理都处于运行状态。
-
检查
/etc/clickhouse-server/config.xml
文件,查找以下内容:
对于KAFKA_BROKERS
设置,请检查:
增加KAFKA_RECONNECT_INTERVAL
设置。此设置控制Clickhouse尝试重新连接到不可用的代理的频率。
增加KAFKA_POLL_TIMEOUT
设置。此设置控制Clickhouse在考虑代理不可用之前等待Kafka响应的时间。
如果以上方法都不起作用,请检查clickhouse
日志并发布相关事件以进行进一步故障排除。
英文:
kafka-status
-
Kafka status. Make sure that all of the brokers are up and running
-
check /
etc/clickhouse-server/config.xml
file.
for following
KAFKA_BROKERS
setting check
Increase the KAFKA_RECONNECT_INTERVAL
setting. This setting controls how often Clickhouse
will try to reconnect to a down broker
.
Increase the KAFKA_POLL_TIMEOUT
setting. This setting controls how long Clickhouse will wait for a response from Kafka before it considers the broker to be down.
If none of these work check clickhouse
logs & post relevant events for further troubleshooting
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论