英文:
Distributed Mode in Dedicated MirrorMaker 2 cluster - not balancing tasks evenly
问题
我一直在尝试在基于MirrorMaker的Kubernetes集群上设置节点,使用Kafka版本3.5.1。
我使用的设置基于这个KPI中描述的内容,与dedicated.mode.enable.internal.rest
的使用有关,这是作为JIRA #10586 的一部分发布的 - 这意味着它应该从Kafka 3.5.0开始可用。
我遇到的问题与主题/偏移复制无关,这一部分一直很正常运行,但基本上有以下问题:
- 所有主题复制都是从一个Pod运行的,其余的Pod保持空闲,
- 我可以跟踪日志指示节点通过广告的领导者URI加入集群,但我实际上无法调用API并验证结果。
我可以让它运行数小时,结果仍然相同。
以下是我尝试过的镜像制作者配置:
clusters = source, destination
source.bootstrap.servers = <SOURCE_BOOTSTRAP_SERVERS>
destination.bootstrap.servers = <TARGET_BOOTSTRAP_SERVERS>
source->destination.enabled = true
source->destination.topics = <TARGET_TOPIC_WHITELIST>
groups=.*
topics.blacklist = .*[\-\.]internal
emit.heartbeats.enabled = true
source->destination.sync.group.offsets.enabled = true
# 认证配置
security.protocol=SASL_SSL
source.security.protocol=SASL_SSL
destination.security.protocol=SASL_SSL
source.sasl.mechanism=SCRAM-SHA-512
destination.sasl.mechanism=SCRAM-SHA-512
source.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<SOURCE_USERNAME>" password="<SOURCE_PASSWORD>";
destination.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<TARGET_USERNAME>" password="<TARGET_PASSWORD>";
replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy
# REST API - 自3.5.0以来的新功能
dedicated.mode.enable.internal.rest=true
其中包括用于启用REST API的标志。
当节点启动时(通过connect-mirror-maker.sh
),我可以在我们的一个副本中看到以下信息:
[2023-07-27 12:36:10,825] INFO Advertised URI: http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:394)
[2023-07-27 12:36:10,826] INFO REST server listening at http://<advertised_IP>:8083/, advertising URL http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:202)
随后是组加入指示(下面的行可以在所有副本中验证):
2023-07-27 12:36:34,719] INFO [Worker clientId=source->destination, groupId=source-mm2] Joined group at generation 44 with protocol version 2 and got assignment: Assignment{error=0, leader='source->destination-224a6033-361f-426b-9840-fc078eb87333', leaderUrl='http://<advertised_IP>:8083/', offset=705, connectorIds=[MirrorHeartbeatConnector], taskIds=[MirrorHeartbeatConnector-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:2394)
如果我尝试从任何节点运行curl来针对上述的leaderUrl(位于根目录或/connectors
),使用广告IP(或领导者的本地主机),我最终会得到{"error_code":404,"message":"HTTP 404 Not Found"}
。
我可以看到繁忙的Pod CPU消耗在复制所有消息时急剧上升,而其他Pod保持空闲。
英文:
I've been trying to set a cluster in Kubernetes with nodes based on MirrorMaker, using Kafka version 3.5.1.
The setting I used is based on what is described in this KPI, and relates to the usage of dedicated.mode.enable.internal.rest
, which has been delivered as part of JIRA #10586 - meaning it should be available since Kafka 3.5.0.
The issues I get are not related to the topics/offsets replication, this has been working fine - but basically:
- it's running all topic replications from one pod, leaving the remaining ones free,
- I can trace log indications where nodes join the cluster via the advertised leader URI, but I cannot actually call the API and verify the results.
I can leave it running for hours and the result remains the same.
Following is the mirror maker configuration I have tried:
clusters = source, destination
source.bootstrap.servers = <SOURCE_BOOTSTRAP_SERVERS>
destination.bootstrap.servers = <TARGET_BOOTSTRAP_SERVERS>
source->destination.enabled = true
source->destination.topics = <TARGET_TOPIC_WHITELIST>
groups=.*
topics.blacklist = .*[\-\.]internal
emit.heartbeats.enabled = true
source->destination.sync.group.offsets.enabled = true
# Auth config
security.protocol=SASL_SSL
source.security.protocol=SASL_SSL
destination.security.protocol=SASL_SSL
source.sasl.mechanism=SCRAM-SHA-512
destination.sasl.mechanism=SCRAM-SHA-512
source.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<SOURCE_USERNAME>" password="<SOURCE_PASSWORD>";
destination.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<TARGET_USERNAME>" password="<TARGET_PASSWORD>";
replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy
# rest API - new since 3.5.0
dedicated.mode.enable.internal.rest=true
which includes the flag set for enabling the REST API.
As the nodes are launched (via connect-mirror-maker.sh
) I can see in one of our replicas the following information:
[2023-07-27 12:36:10,825] INFO Advertised URI: http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:394)
[2023-07-27 12:36:10,826] INFO REST server listening at http://<advertised_IP>:8083/, advertising URL http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:202)
followed by the group join indication (the lines below can be verified in all replicas):
2023-07-27 12:36:34,719] INFO [Worker clientId=source->destination, groupId=source-mm2] Joined group at generation 44 with protocol version 2 and got assignment: Assignment{error=0, leader='source->destination-224a6033-361f-426b-9840-fc078eb87333', leaderUrl='http://<advertised_IP>:8083/', offset=705, connectorIds=[MirrorHeartbeatConnector], taskIds=[MirrorHeartbeatConnector-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:2394)
If I try running curl targeting the leaderUrl above (on the root or at /connectors
) from any node, using the advertised IP (or localhost in the leader) I end up with {"error_code":404,"message":"HTTP 404 Not Found"}
I can see the busy pod CPU consumption going high as it replicates all messages, while other pods are free.
答案1
得分: 0
最大任务数量产生了所期望的效果,我们开始使用 tasks.max=10
进行测试,这已经产生了巨大的差异(现在我们正在根据我们的需求进行调整)。一些相关的背景信息可以在 此答案 中找到。
英文:
The maximum number of tasks did the desired effect, we started testing with tasks.max=10
and it already made a huge difference (we are now tuning it based on our needs). Some related context can be found on this answer.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论