Kafka:从主机机器发布的事件未被在Docker中运行的应用程序消费。

huangapple go评论107阅读模式
英文:

Kafka: events published from the host machine are not consumed by the application running in Docker

问题

我正在为一个应用编写端到端测试。我启动一个应用程序实例,一个Kafka实例和一个Zookeeper(全部在Docker容器中),然后与应用程序API交互以测试其功能。我需要测试此应用程序中事件消费者的功能。我从我的测试中发布事件,预期应用程序会处理它们。

问题: 如果我在本地运行应用程序(而不是在Docker中),并运行会产生事件的测试,则应用程序代码中的消费者会正确处理事件。在这种情况下,消费者和测试的bootstrapServers均设置为localhost:9092。但是,如果将应用程序作为Docker容器实例运行,则它无法看到这些事件。在这种情况下,应用程序中的bootstrapServers设置为kafka:9092,而测试中的设置为localhost:9092,其中kafka是Docker容器的名称。kafka容器将其9092端口暴露给主机,以便可以从Docker容器内部和主机(运行我的测试)访问同一实例的Kafka。

代码中唯一的区别是localhostkafka作为引导服务器的设置。在这两种情况下,消费者和生产者都能够成功启动;事件在没有错误的情况下被发布。问题只是在一种情况下,消费者无法接收事件。

问题: 如何使Docker化的消费者能够看到从主机机器上发布的事件?

注:我有一个正确配置的Docker网络,其中包括应用程序实例、Zookeeper和Kafka。它们都可以相互“看到”。kafkazookeeper的相应端口对主机进行了暴露。
Kafka端口:0.0.0.0:9092->9092/tcp。Zookeeper端口:22/tcp、2888/tcp、3888/tcp、0.0.0.0:2181->2181/tcp

我正在使用 wurstmeister/kafkawurstmeister/zookeeper Docker 镜像(我无法替换它们)。

欢迎分享任何想法/思路。你会如何进行调试?

更新: 问题出在设置了不同端口的KAFKA_ADVERTISED_LISTENERSKAFKA_LISTENERS环境变量上,用于内部和外部通信。解决方案是在Docker容器内部运行应用程序代码时,使用正确的端口。

英文:

I am writing end-to-end tests for an application. I start an instance of an application, a Kafka instance, and a Zookeeper (all Dockerized) and then I interact with the application API to test its functionality. I need to test an event consumer's functionality in this application. I publish events from my tests and the application is expected to handle them.

Problem: If I run the application locally (not in Docker) and run tests that would produce events, the consumer in the application code handles events correctly. In this case, the consumer and the test have bootstrapServers set to localhost:9092. But if the application is run as a Dockerized instance it doesn't see the events. In this case bootstrapServers are set to kafka:9092 in the application and localhost:9092 in the test where kafka is a Docker container name. The kafka container exposes its 9092 port to the host so that the same instance of Kafka can be accessed from inside a Docker container and from the host (running my tests).

The only difference in the code is localhost vs kafka set as bootstrap servers. In both scenarios consumers and producers start successfully; events are published without errors. It is just that in one case the consumer doesn't receive events.

Question: How to make Dockerized consumers see events posted from the host machine?

Note: I have a properly configured Docker network which includes the application instance, Zookeeper, and Kafka. They all "see" each other. The corresponding ports of kafka and zookeeper are exposed to the host.
Kafka ports: 0.0.0.0:9092->9092/tcp. Zookeeper ports: 22/tcp, 2888/tcp, 3888/tcp, 0.0.0.0:2181->2181/tcp.

I am using wurstmeister/kafka and wurstmeister/zookeeper Docker images (I cannot replace them).

Any ideas/thoughts are appreciated. How would you debug it?

UPDATE: The issue was with KAFKA_ADVERTISED_LISTENERS and KAFKA_LISTENERS env variables that were set to different ports for INSIDE and OUTSIDE communications. The solution was to use a correct port in the application code when it is run inside a Docker container.

答案1

得分: 3

这种问题通常与Kafka处理代理地址的方式有关。

当你启动一个Kafka代理时,它会绑定在0.0.0.0:9092上,并在Zookeeper中注册自己的地址为<hostname>:9092。当你使用客户端连接时,Zookeeper会被联系以获取特定代理的地址。

这意味着当你启动一个Kafka容器时,你会遇到以下情况:

  • 容器名称:kafka
  • 网络名称:kafkanet
  • 主机名:kafka
  • 在Zookeeper上的注册:kafka:9092

现在,如果你从kafkanet网络内的容器连接到Kafka客户端,你从Zookeeper获取的地址是kafka:9092,这在kafkanet网络中是可以解析的。

然而,如果你从Docker外部连接到Kafka(即使用由Docker映射的localhost:9092端点),你仍然会得到无法解析的kafka:9092地址。

为了解决这个问题,你可以在代理配置中指定advertised.host.nameadvertised.port,以便地址可以被所有客户端正确解析(参见文档)。

通常的做法是将advertised.host.name设置为<container-name>.<network>(在你的情况下类似于kafka.kafkanet),以便连接到网络的任何容器都能正确解析Kafka代理的IP。

然而,在你的情况下,你有一个混合的网络配置,因为一些组件存在于Docker内部(因此能够解析kafkanet网络),而另一些组件存在于外部。如果这是一个生产系统,我的建议是将advertised.host.name设置为主机机器的DNS/IP,并始终依赖于Docker端口映射来访问Kafka代理。

然而,根据我的理解,你只需要这个设置来测试,所以最简单的方法是“欺骗”Docker外部的系统。使用上面指定的命名,这意味着只需在你的 /etc/hosts(或Windows的等效位置)中添加一行 127.0.0.1 kafka.kafkanet

这样,当你外部Docker的客户端连接到Kafka时,应该会发生以下情况:

  1. 客户端 -> Kafka 通过 localhost:9092
  2. Kafka 查询 Zookeeper 并返回主机 kafka.kafkanet
  3. 客户端将 kafka.kafkanet 解析为 127.0.0.1
  4. 客户端 -> Kafka 通过 127.0.0.1:9092

编辑

正如评论中指出的,更新的Kafka版本现在使用listenersadvertised.listeners的概念,代替了host.nameadvertised.host.name(这两者已被弃用,只有在没有指定上述内容时才使用)。然而,总体思想是相同的:

  • host.name:指定Kafka代理应该绑定的主机(与port一起使用)
  • listeners:指定Kafka代理应该绑定的所有端点(例如PLAINTEXT://0.0.0.0:9092,SSL://0.0.0.0:9091
  • advertised.host.name:指定向客户端公布的代理(即客户端应该使用哪个地址连接)
  • advertised.listeners:指定所有公布的端点(例如PLAINTEXT://kafka.example.com:9092,SSL://kafka.example.com:9091

在这两种情况下,客户端要想成功与Kafka通信,都需要能够解析并连接到advertised主机名和端口。

在这两种情况下,如果未指定,代理会自动使用运行代理的主机机器的主机名派生这些值。

英文:

Thes kind of issues are usually related to the way Kafka handles the broker's address.

When you start a Kafka broker it binds itself on 0.0.0.0:9092 and register itself on Zookeeper with the address &lt;hostname&gt;:9092. When you connect with a client, Zookeeper will be contacted to fetch the address of the specific broker.

This means that when you start a Kafka container you have a situation like the following:

  • container name: kafka
  • network name: kafkanet
  • hostname: kafka
  • registration on zookeeper: kafka:9092

Now if you connect a client to your Kafka from a container inside the kafkanet network, the address you get back from Zookeeper is kafka:9092 which is resolvable through the kafkanet network.

However if you connect to Kafka from outside docker (i.e. using the localhost:9092 endpoint mapped by docker), you still get back the kafka:9092 address which is not resolvable.

In order to address this issue you can specify the advertised.host.name and advertised.port in the broker configuration in such a way that the address is resolvable by all the client (see documentation).

What is usually done is to set advertised.host.name as &lt;container-name&gt;.&lt;network&gt; (in your case something like kafka.kafkanet) so that any container connected to the network is able to correctly resolve the IP of the Kafka broker.

In your case however you have a mixed network configuration, as some components live inside docker (hence able to resolve the kafkanet network) while others live outside it. If it were a production system my suggestion would be to set the advertised.host.name to the DNS/IP of the host machine and always rely on docker port mapping to reach the Kafka broker.

From my understanding however you only need this setup to test things out, so the easiest thing would be to "trick" the system living outside docker. Using the naming specified above, this means simply to add to your /etc/hosts (or windows equivalent) the line 127.0.0.1 kafka.kafkanet.

This way when your client living outside docker connects to Kafka the following should happen:

  1. client -> Kafka via localhost:9092
  2. kafka queries Zookeeper and return the host kafka.kafkanet
  3. client resolves kafka.kafkanet to 127.0.0.1
  4. client -> Kafka via 127.0.0.1:9092

EDIT

As pointed out in a comment, newer Kafka version now use the concept of listeners and advertised.listeners which are used in place of host.name and advertised.host.name (which are deprecated and only used in case the the above ones are not specified). The general idea is the same however:

  • host.name: specifies the host to which the Kafka broker should bind itself to (works in conjunction with port
  • listeners: specifies all the endpoints to which the Kafka broker should bind (for instance PLAINTEXT://0.0.0.0:9092,SSL://0.0.0.0:9091)
  • advertised.host.name: specifies how the broker is advertised to client (i.e. which address client should use to connect to it)
  • avertised.listeners: specifies all the advertised endpoints (for instance PLAINTEXT://kafka.example.com:9092,SSL://kafka.example.com:9091)

In both cases for client to be able to successfully communicate with Kafka they need to be able to resolve and connect to the advertised hostname and port.

In both cases if not specified they are automatically derived by the broker using the hostname of the machine the broker is running on.

答案2

得分: 1

你一直在引用 8092。那是有意的吗?Kafka 运行在 9092。最简单的测试是下载相同版本的 Kafka,并手动运行其 kafka-console-consumerkafka-console-producer 脚本,看看是否可以在您的主机上进行发布订阅操作。

英文:

You kept referencing 8092. Was that intentional? Kafka runs on 9092. Easiest test is to download the same version of Kafka and manually run its kafka-console-consumer and kafka-console-producer scripts to see if you can pub-sub from your host machine.

答案3

得分: 0

你尝试过在 Docker 化的应用程序中使用 "host.docker.internal" 吗?

英文:

did you try "host.docker.internal" in dockerized application?

答案4

得分: 0

你可以为你的容器创建一个 docker 网络,然后容器就能够解析彼此的主机名并进行通信。

注意:这在使用 docker-compose 以及独立容器时同样适用。

英文:

You could create a docker network for your containers and then containers will be able to resolve each other hostnames and communicate.

Note: this is usable with docker-compose as well with standalone containers

huangapple
  • 本文由 发表于 2020年10月10日 00:07:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/64283594.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定