2020年1月3日 13:23:45go评论57阅读模式

英文:

Two different types of partitions in kafka producer

问题

在Kafka生产者中，我正在发送两组不同的数据。我有一个主题的两个分区。第一个带有键，第二个没有键。据我所知，键用于对数据进行分区。如果键不存在，将发送null，并且分区将按照循环调度进行。

但问题是，如果我在某个特定时间段内交替发送带有键和不带键的数据，会发生什么情况？

循环调度会发生在除使用键创建的分区之外的分区，还是会发生在这两个分区的所有分区上？

英文:

In Kafka producer, I am sending two different sets of data. I have two partitions for the topic. The first one is with a key and the second one is without a key. As far as I know the key is used to make partitions for the data. If the key is absent, null will be sent and the partition will be happening by round-robin scheduling.

But the question is if I am sending the data with and without key alternatively for some particular period of time, what will happen?

Will round robin scheduling happen for the partitions excluding the partition made by using key or will it happen for the all the two partitions?

答案1

得分: 3

根据以下规则选择Kafka分区：

如果使用了自定义分区器，则将根据自定义分区器逻辑选择分区器。
如果没有自定义分区器，则Kafka使用DefaultPartitioner
a. 如果键为null，则使用轮询选择分区。
b. 如果键为非null键，则使用Murmur2哈希与取模来识别主题的分区。

因此，带有键（null或非null）的消息将使用Default Partitioner发布到两个分区，没有定义Custom Partitioner。

要实现将消息发布到特定分区，您可以使用以下方法：

在发布消息时明确传递分区

/**
 * 创建要发送到指定主题和分区的记录
 */
public ProducerRecord(String topic, Integer partition, K key, V value) {
    this(topic, partition, null, key, value, null);
}

您可以创建自定义分区器并实现选择分区的逻辑

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/Partitioner.html

英文:

Kafka select partition as per defined below rules

If used Custom Partitioner then partitioner will get selected based on Custom Partitioner logic.
If no Custom Partitioner then Kafka uses DefaultPartitioner

a. if the key is null then partition selected on round-robin.

b. If the key is non-null keys then It uses Murmur2 hash with modulo to identify partitions for the topic.

So message with key (null or not null) would get published on both partitions using Default Partitioner with no Custom Partitioner defined.

To achieve a message publish in a specific partition you can use the below method.

Pass partition explicitly while publishing a message

/**
* Creates a record to be sent to a specified topic and partition
*/
public ProducerRecord(String topic, Integer partition, K key, V value) {
this(topic, partition, null, key, value, null);
}
You can create Custom Partitioner and implement logic to select the partition

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/Partitioner.html

答案2

得分: 1

我想纠正一下你的说法。你说键用于为数据创建分区。带有消息的键基本上用于获取特定字段的消息顺序。

如果键为null，则数据会以循环方式发送（分布式环境中会发送到不同的分区和不同的代理，并且当然会发送到相同的主题）。
如果发送了一个键，那么该键的所有消息将始终发送到同一个分区。

解释和示例

键可以是任何字符串或整数等。以整数员工ID为键的示例。
因此，员工ID 123将始终发送到分区0，员工ID 345将始终发送到分区1。这由键散列算法决定，该算法取决于分区数量。
如果不发送任何键，那么消息可以使用循环方式发送到任何分区。

英文:

I want to correct you. You said that the key is used to make partitions for the data. The key with a message is basically sent to get the message ordering for a specific field.

If key=null, data is sent round-robin (to a different partition and to a different broker in a distributed env. and of course to the same topic.).
If a key is sent, then all messages for that key will always go to the same partition.

Explain and example

key can be any string or integer, etc.. take an example of an integer employee_id as key.
So emplyee_id 123 will always go to partition 0, employee_id 345 will always go to partition 1. This is decided by the key hashing algorithm which depends on the number of partitions.
if you don't send any key then the message can go to any partition using a round-robin technique.

答案3

得分: -1

Kafka在发送和存储记录到分区时有一个非常有组织的场景。正如您所提到的，键用于确保具有相同键的记录进入相同的分区。这有助于维护主题上那些消息的时间顺序。

在您的情况下，两个分区将存储数据如下：

分区1：存储包含特定键的数据。具有此键的记录将始终进入此分区。这是自定义分区的概念。除此之外，具有空值键的记录也将按照轮询方式存储在此分区。
分区2：此分区将包含没有任何键输入的记录，即键为null的记录。

英文:

Kafka has a very organized scenario when it comes to sending and storing the records in the partitions. As you have mentioned, the Key is used for the purpose that the same key records go to the same partition. This helps in maintaining the chronology of those messages on that topic.

In your case, the two partitions will store the data as:

Partition 1: Store the data which contains a particular key with it. The records with this key will always go to this Partition. This is the concept of Custom Partitioning. Apart from this, the key with null values will also go to this partition as it follows the Round Robin Fashion to store the records
Partition 2: This partition will contain records which are entered without any key. i.e the key is null.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在Kafka生产者中有两种不同类型的分区。

问题

答案1

答案2

答案3

Apache Kafka启动错误 – 配置消息格式版本的值无效 3.0-IV1。

如何设置Kafka连接器以在Debezium中使用自定义转换器？

Caused by: org.xerial.snappy.SnappyError: [FAILED_TO_LOAD_NATIVE_LIBRARY] no native library is found for os.name=Linux and os.arch=x86_64

Kafka代理可以通过多个端口进行连接吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论