KeyBy Stream(source is Kafka) based on 2 fields where the 2 fields can come in any order but belong to same group

huangapple go评论55阅读模式
英文:

KeyBy Stream(source is Kafka) based on 2 fields where the 2 fields can come in any order but belong to same group

问题

message1: { otherfields: {...}, field1: "abc", field2: "def"; }
message2: { otherfields: {...}, field1: "def", field2: "abc"; }

在Flink中,如果我们按照来自Kafka主题的流进行keyBy,基于field1和field2进行keyBy,那么上面的2条消息将进入不同的键控状态,但我希望它们进入相同的键控状态。

我的当前代码类似于:

streams
.keyBy(GroupByKeyGenerator())
.process(CustomProcessFunction())

我有一个解决方案,就是将这2个字段合并为一个,按升序排列它们,然后将其用作流的键。这样,两条消息都将进入相同的键控状态。(在这两种情况下,键都将是abcdef)

但问题是,这2个键都是UUID,并且它们也是从源头随机生成的。
如果我将它们组合起来并按升序排列,它们是否会映射到同一键组/键控状态的不同类别中?

例如:

message1:
field1: aef,
field2: bcd =>
combinedOrderedKey: abcdef

message2:
field1: abc,
field2: def =>
combinedOrderedKey: abcdef

合并后的键相同,但它们不属于同一组/键控状态。

请帮我找到可以使用的解决方案。(可以使用一些哈希函数吗?)

英文:

message1: { otherfields: {...}, field1: "abc", field2: "def"}
message2: { otherfields: {...}, field1: "def", field2: "abc"}

In Flink, if we keyBy the stream coming from Kafka Topic and if we keyBy based on field1 and field2, then the above 2 messages will go to different keyed-state but I want them to go to same keyed-state.

My current code is something like:

streams
.keyBy(GroupByKeyGenerator())
.process(CustomProcessFunction())

One solution I have is to combine the 2 fields as one, order them ascending order and then use it as one key to keyBy the stream. This way both messages will go to same keyed-state. (key in both the cases will be abcdef)

But the issue is that the 2 keys are uuids and they are generated randomly also from the source.
What are the chances if I combine them and order them in ascending order, then they will be mapped to different category/key-group in same keyed-state?

For example:

message1:
field1: aef,
field2: bcd =>
combinedOrderedKey: abcdef

message2:
field1: abc,
field2: def =>
combinedOrderedKey: abcdef

The combinedKey will be same but they are not part of same group/keyed-state.

Please help me out with the solution that I can use. (some Hash function can be used?)

答案1

得分: 0

最简单的解决方案如下:

  1. 将两个键放入一个列表中
  2. 对列表进行排序
  3. 在排序后的键之间放置任何分隔符
  4. 这将充当状态的公共键。

示例:
message1: {key1: "bcf" , key2: "adf"}
message1: {key1: "adf" , key2: "bcf"}

在这两种情况下,键将为:adf_bcf
步骤:

  1. [bcf, adf]
  2. [adf, bcf]
  3. adf_bcf
  4. 公共键:adf_bcf
英文:

The simplest solution to above problem is:

  1. Put both the keys in a list
  2. sort the list
  3. Put any delimiter between the keys(in sorted order)
  4. This will act as a common key for the states.

Example:
message1: {key1: "bcf" , key2: "adf"}
message1: {key1: "adf" , key2: "bcf"}

In both cases, the key will be: adf_bcf
Steps:

  1. [bcf, adf]
  2. [adf, bcf]
  3. adf_bcf
  4. common key: adf_bcf

huangapple
  • 本文由 发表于 2023年7月17日 18:37:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76703635.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定