英文:
KeyBy Stream(source is Kafka) based on 2 fields where the 2 fields can come in any order but belong to same group
问题
message1: { otherfields: {...}, field1: "abc", field2: "def"; }
message2: { otherfields: {...}, field1: "def", field2: "abc"; }
在Flink中,如果我们按照来自Kafka主题的流进行keyBy,基于field1和field2进行keyBy,那么上面的2条消息将进入不同的键控状态,但我希望它们进入相同的键控状态。
我的当前代码类似于:
streams
.keyBy(GroupByKeyGenerator())
.process(CustomProcessFunction())
我有一个解决方案,就是将这2个字段合并为一个,按升序排列它们,然后将其用作流的键。这样,两条消息都将进入相同的键控状态。(在这两种情况下,键都将是abcdef)
但问题是,这2个键都是UUID,并且它们也是从源头随机生成的。
如果我将它们组合起来并按升序排列,它们是否会映射到同一键组/键控状态的不同类别中?
例如:
message1:
field1: aef,
field2: bcd =>
combinedOrderedKey: abcdef
message2:
field1: abc,
field2: def =>
combinedOrderedKey: abcdef
合并后的键相同,但它们不属于同一组/键控状态。
请帮我找到可以使用的解决方案。(可以使用一些哈希函数吗?)
英文:
message1: { otherfields: {...}, field1: "abc", field2: "def"}
message2: { otherfields: {...}, field1: "def", field2: "abc"}
In Flink, if we keyBy the stream coming from Kafka Topic and if we keyBy based on field1 and field2, then the above 2 messages will go to different keyed-state but I want them to go to same keyed-state.
My current code is something like:
streams
.keyBy(GroupByKeyGenerator())
.process(CustomProcessFunction())
One solution I have is to combine the 2 fields as one, order them ascending order and then use it as one key to keyBy the stream. This way both messages will go to same keyed-state. (key in both the cases will be abcdef)
But the issue is that the 2 keys are uuids and they are generated randomly also from the source.
What are the chances if I combine them and order them in ascending order, then they will be mapped to different category/key-group in same keyed-state?
For example:
message1:
field1: aef,
field2: bcd =>
combinedOrderedKey: abcdef
message2:
field1: abc,
field2: def =>
combinedOrderedKey: abcdef
The combinedKey will be same but they are not part of same group/keyed-state.
Please help me out with the solution that I can use. (some Hash function can be used?)
答案1
得分: 0
最简单的解决方案如下:
- 将两个键放入一个列表中
- 对列表进行排序
- 在排序后的键之间放置任何分隔符
- 这将充当状态的公共键。
示例:
message1: {key1: "bcf" , key2: "adf"}
message1: {key1: "adf" , key2: "bcf"}
在这两种情况下,键将为:adf_bcf
步骤:
- [bcf, adf]
- [adf, bcf]
- adf_bcf
- 公共键:adf_bcf
英文:
The simplest solution to above problem is:
- Put both the keys in a list
- sort the list
- Put any delimiter between the keys(in sorted order)
- This will act as a common key for the states.
Example:
message1: {key1: "bcf" , key2: "adf"}
message1: {key1: "adf" , key2: "bcf"}
In both cases, the key will be: adf_bcf
Steps:
- [bcf, adf]
- [adf, bcf]
- adf_bcf
- common key: adf_bcf
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论