在专用MirrorMaker 2集群中的分布式模式下 – 任务平衡不均匀

huangapple go评论69阅读模式
英文:

Distributed Mode in Dedicated MirrorMaker 2 cluster - not balancing tasks evenly

问题

我一直在尝试在基于MirrorMaker的Kubernetes集群上设置节点,使用Kafka版本3.5.1。

我使用的设置基于这个KPI中描述的内容,与dedicated.mode.enable.internal.rest的使用有关,这是作为JIRA #10586 的一部分发布的 - 这意味着它应该从Kafka 3.5.0开始可用。

我遇到的问题与主题/偏移复制无关,这一部分一直很正常运行,但基本上有以下问题:

  • 所有主题复制都是从一个Pod运行的,其余的Pod保持空闲,
  • 我可以跟踪日志指示节点通过广告的领导者URI加入集群,但我实际上无法调用API并验证结果。

我可以让它运行数小时,结果仍然相同。

以下是我尝试过的镜像制作者配置:

clusters = source, destination

source.bootstrap.servers = <SOURCE_BOOTSTRAP_SERVERS>
destination.bootstrap.servers = <TARGET_BOOTSTRAP_SERVERS>

source->destination.enabled = true
source->destination.topics = <TARGET_TOPIC_WHITELIST>
groups=.*
topics.blacklist = .*[\-\.]internal
emit.heartbeats.enabled = true
source->destination.sync.group.offsets.enabled = true

# 认证配置
security.protocol=SASL_SSL
source.security.protocol=SASL_SSL
destination.security.protocol=SASL_SSL
source.sasl.mechanism=SCRAM-SHA-512
destination.sasl.mechanism=SCRAM-SHA-512
source.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<SOURCE_USERNAME>" password="<SOURCE_PASSWORD>";
destination.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<TARGET_USERNAME>" password="<TARGET_PASSWORD>";

replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy

# REST API - 自3.5.0以来的新功能
dedicated.mode.enable.internal.rest=true

其中包括用于启用REST API的标志。

当节点启动时(通过connect-mirror-maker.sh),我可以在我们的一个副本中看到以下信息:

[2023-07-27 12:36:10,825] INFO Advertised URI: http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:394)
[2023-07-27 12:36:10,826] INFO REST server listening at http://<advertised_IP>:8083/, advertising URL http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:202)

随后是组加入指示(下面的行可以在所有副本中验证):

2023-07-27 12:36:34,719] INFO [Worker clientId=source->destination, groupId=source-mm2] Joined group at generation 44 with protocol version 2 and got assignment: Assignment{error=0, leader='source->destination-224a6033-361f-426b-9840-fc078eb87333', leaderUrl='http://<advertised_IP>:8083/', offset=705, connectorIds=[MirrorHeartbeatConnector], taskIds=[MirrorHeartbeatConnector-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:2394)

如果我尝试从任何节点运行curl来针对上述的leaderUrl(位于根目录或/connectors),使用广告IP(或领导者的本地主机),我最终会得到{"error_code":404,"message":"HTTP 404 Not Found"}

我可以看到繁忙的Pod CPU消耗在复制所有消息时急剧上升,而其他Pod保持空闲。

英文:

I've been trying to set a cluster in Kubernetes with nodes based on MirrorMaker, using Kafka version 3.5.1.

The setting I used is based on what is described in this KPI, and relates to the usage of dedicated.mode.enable.internal.rest, which has been delivered as part of JIRA #10586 - meaning it should be available since Kafka 3.5.0.

The issues I get are not related to the topics/offsets replication, this has been working fine - but basically:

  • it's running all topic replications from one pod, leaving the remaining ones free,
  • I can trace log indications where nodes join the cluster via the advertised leader URI, but I cannot actually call the API and verify the results.

I can leave it running for hours and the result remains the same.

Following is the mirror maker configuration I have tried:

clusters = source, destination

source.bootstrap.servers = <SOURCE_BOOTSTRAP_SERVERS>
destination.bootstrap.servers = <TARGET_BOOTSTRAP_SERVERS>

source->destination.enabled = true
source->destination.topics = <TARGET_TOPIC_WHITELIST>
groups=.*
topics.blacklist = .*[\-\.]internal
emit.heartbeats.enabled = true
source->destination.sync.group.offsets.enabled = true

# Auth config
security.protocol=SASL_SSL
source.security.protocol=SASL_SSL
destination.security.protocol=SASL_SSL
source.sasl.mechanism=SCRAM-SHA-512
destination.sasl.mechanism=SCRAM-SHA-512
source.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<SOURCE_USERNAME>" password="<SOURCE_PASSWORD>";
destination.sasl.jaas.config=org.apache.kafka.common.security.scram.ScramLoginModule required username="<TARGET_USERNAME>" password="<TARGET_PASSWORD>";

replication.policy.class=org.apache.kafka.connect.mirror.IdentityReplicationPolicy

# rest API - new since 3.5.0
dedicated.mode.enable.internal.rest=true

which includes the flag set for enabling the REST API.

As the nodes are launched (via connect-mirror-maker.sh) I can see in one of our replicas the following information:

[2023-07-27 12:36:10,825] INFO Advertised URI: http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:394)
[2023-07-27 12:36:10,826] INFO REST server listening at http://<advertised_IP>:8083/, advertising URL http://<advertised_IP>:8083/ (org.apache.kafka.connect.runtime.rest.RestServer:202)

followed by the group join indication (the lines below can be verified in all replicas):

2023-07-27 12:36:34,719] INFO [Worker clientId=source->destination, groupId=source-mm2] Joined group at generation 44 with protocol version 2 and got assignment: Assignment{error=0, leader='source->destination-224a6033-361f-426b-9840-fc078eb87333', leaderUrl='http://<advertised_IP>:8083/', offset=705, connectorIds=[MirrorHeartbeatConnector], taskIds=[MirrorHeartbeatConnector-0], revokedConnectorIds=[], revokedTaskIds=[], delay=0} with rebalance delay: 0 (org.apache.kafka.connect.runtime.distributed.DistributedHerder:2394)

If I try running curl targeting the leaderUrl above (on the root or at /connectors) from any node, using the advertised IP (or localhost in the leader) I end up with {"error_code":404,"message":"HTTP 404 Not Found"}

I can see the busy pod CPU consumption going high as it replicates all messages, while other pods are free.

答案1

得分: 0

最大任务数量产生了所期望的效果,我们开始使用 tasks.max=10 进行测试,这已经产生了巨大的差异(现在我们正在根据我们的需求进行调整)。一些相关的背景信息可以在 此答案 中找到。

英文:

The maximum number of tasks did the desired effect, we started testing with tasks.max=10 and it already made a huge difference (we are now tuning it based on our needs). Some related context can be found on this answer.

huangapple
  • 本文由 发表于 2023年7月27日 22:28:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/76780739.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定