在Kafka生产者中有两种不同类型的分区。

huangapple go评论51阅读模式
英文:

Two different types of partitions in kafka producer

问题

在Kafka生产者中,我正在发送两组不同的数据。我有一个主题的两个分区。第一个带有键,第二个没有键。据我所知,键用于对数据进行分区。如果键不存在,将发送null,并且分区将按照循环调度进行。

但问题是,如果我在某个特定时间段内交替发送带有键和不带键的数据,会发生什么情况?

循环调度会发生在除使用键创建的分区之外的分区,还是会发生在这两个分区的所有分区上?

英文:

In Kafka producer, I am sending two different sets of data. I have two partitions for the topic. The first one is with a key and the second one is without a key. As far as I know the key is used to make partitions for the data. If the key is absent, null will be sent and the partition will be happening by round-robin scheduling.

But the question is if I am sending the data with and without key alternatively for some particular period of time, what will happen?

Will round robin scheduling happen for the partitions excluding the partition made by using key or will it happen for the all the two partitions?

答案1

得分: 3

根据以下规则选择Kafka分区:

  1. 如果使用了自定义分区器,则将根据自定义分区器逻辑选择分区器。
  2. 如果没有自定义分区器,则Kafka使用DefaultPartitioner
    a. 如果键为null,则使用轮询选择分区。
    b. 如果键为非null键,则使用Murmur2哈希与取模来识别主题的分区。

因此,带有键(null或非null)的消息将使用Default Partitioner发布到两个分区,没有定义Custom Partitioner

要实现将消息发布到特定分区,您可以使用以下方法:

  1. 在发布消息时明确传递分区
/**
 * 创建要发送到指定主题和分区的记录
 */
public ProducerRecord(String topic, Integer partition, K key, V value) {
    this(topic, partition, null, key, value, null);
}
  1. 您可以创建自定义分区器并实现选择分区的逻辑

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/Partitioner.html

英文:

Kafka select partition as per defined below rules

  1. If used Custom Partitioner then partitioner will get selected based on Custom Partitioner logic.
  2. If no Custom Partitioner then Kafka uses DefaultPartitioner

a. if the key is null then partition selected on round-robin.

b. If the key is non-null keys then It uses Murmur2 hash with modulo to identify partitions for the topic.

So message with key (null or not null) would get published on both partitions using Default Partitioner with no Custom Partitioner defined.

To achieve a message publish in a specific partition you can use the below method.

  1. Pass partition explicitly while publishing a message

    /**
    * Creates a record to be sent to a specified topic and partition
    */
    public ProducerRecord(String topic, Integer partition, K key, V value) {
    this(topic, partition, null, key, value, null);
    }

  2. You can create Custom Partitioner and implement logic to select the partition

https://kafka.apache.org/10/javadoc/org/apache/kafka/clients/producer/Partitioner.html

答案2

得分: 1

我想纠正一下你的说法。你说键用于为数据创建分区。带有消息的键基本上用于获取特定字段的消息顺序。

  • 如果键为null,则数据会以循环方式发送(分布式环境中会发送到不同的分区和不同的代理,并且当然会发送到相同的主题)。
  • 如果发送了一个键,那么该键的所有消息将始终发送到同一个分区。

解释和示例

  • 键可以是任何字符串或整数等。以整数员工ID为键的示例。
  • 因此,员工ID 123将始终发送到分区0,员工ID 345将始终发送到分区1。这由键散列算法决定,该算法取决于分区数量。
  • 如果不发送任何键,那么消息可以使用循环方式发送到任何分区。
英文:

I want to correct you. You said that the key is used to make partitions for the data. The key with a message is basically sent to get the message ordering for a specific field.

  • If key=null, data is sent round-robin (to a different partition and to a different broker in a distributed env. and of course to the same topic.).
  • If a key is sent, then all messages for that key will always go to the same partition.

Explain and example

  • key can be any string or integer, etc.. take an example of an integer employee_id as key.
  • So emplyee_id 123 will always go to partition 0, employee_id 345 will always go to partition 1. This is decided by the key hashing algorithm which depends on the number of partitions.
  • if you don't send any key then the message can go to any partition using a round-robin technique.

答案3

得分: -1

Kafka在发送和存储记录到分区时有一个非常有组织的场景。正如您所提到的,键用于确保具有相同键的记录进入相同的分区。这有助于维护主题上那些消息的时间顺序。

在您的情况下,两个分区将存储数据如下:

  1. 分区1:存储包含特定键的数据。具有此键的记录将始终进入此分区。这是自定义分区的概念。除此之外,具有空值键的记录也将按照轮询方式存储在此分区。
  2. 分区2:此分区将包含没有任何键输入的记录,即键为null的记录。
英文:

Kafka has a very organized scenario when it comes to sending and storing the records in the partitions. As you have mentioned, the Key is used for the purpose that the same key records go to the same partition. This helps in maintaining the chronology of those messages on that topic.

In your case, the two partitions will store the data as:

  1. Partition 1: Store the data which contains a particular key with it. The records with this key will always go to this Partition. This is the concept of Custom Partitioning. Apart from this, the key with null values will also go to this partition as it follows the Round Robin Fashion to store the records
  2. Partition 2: This partition will contain records which are entered without any key. i.e the key is null.

huangapple
  • 本文由 发表于 2020年1月3日 13:23:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/59573571.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定