英文:
How much data does kafka send to a consumer?
问题
让我们假设我有一个数据源将数据发送到Kafka集群,这些消息存储在Kafka主题中(为简化起见,只有一个分区),并等待被消费。另一方面,有一个消费者(例如Spark)已设置好接收这些数据。我的问题是,Kafka将向Spark传递多少数据?
我知道我们可以配置Spark监听Kafka一段时间,因此接收X量的数据,但我想了解Kafka的角度,Kafka会一次推送多少数据,我们能否配置它?例如,我们是否可以确保Kafka只发送一条消息(即从数据源接收到的确切消息,例如表中的一行)?
英文:
let's say i have a data source that send data to the kafka cluster, the msgs are stored in a kafka topic (with one partition to simplify) and waiting to be consumed, a consumer in the other side (spark for example) is set to receive that data, my question is how much data kafka will deliver to spark**?**
i know that we can configure spark to listen to kafka for a defined amount of time and hence receive X amount of data , but i want to understand the kafka's perspective, how much data will kafka deliver in one push, can we configure it ? can we for example make sure that kafka send only one msg (the exact msg that has been received from tha data source, it could be one row of a table for example)?
答案1
得分: 1
短答案是可以通过以下 Kafka 消费者配置的组合来控制记录数量:
max.poll.records
fetch.max.bytes
fetch.min.bytes
max.partition.fetch.bytes
max.poll.interval.ms
Kafka 消费者模型建立在轮询模型的基础上,消费者负责在特定的时间间隔内发出轮询操作,所有与字节传输、记录数量、轮询时间、会话管理等配置都在消费者端进行,这是 Kafka 架构的一个最大优势,它完全不依赖于处理它的消费者数量。
我强烈建议阅读更多关于 Kafka 消费者模型的信息,这里有一个很好的入门资料。
您可以在这里查看 Kafka 消费者配置。
英文:
The short answer is yes you can control the number of records via a combination of the below kafka consumer configs:
max.poll.records
fetch.max.bytes
fetch.min.bytes
max.partition.fetch.bytes
max.poll.interval.ms
Kafka consumer model is built on a poll model, where the consumer is the one responsible to issue the polling action every specific internal, and all the configuration related to the bytes transferred, records count, time to poll, session management and much more configuraiton are on the consumer side, and that is one of the biggest advantages of Kafka architecture, that is totally free from being dependent on the number of consumers dealing with it.
I highly recommend reading more about kafka consumer model, here is a good starter
you can check the kafka-consumer configs here
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论