如何在Power Query中对字符串列进行哈希处理

huangapple go评论69阅读模式
英文:

How to hash a string column in power query

问题

我需要一个使用本地Power Query来对文本字符串进行哈希处理的函数。我尝试过使用JavaScript的Web.Page,但它从不等待脚本完成。

我希望它返回一个整数。

有哪些好的方法可以实现这个目标?

英文:

I need a function that uses a native power query to hash a text string. I have tried using Web.Page with javascript but it never waits on the script to complete.

I would like it to return an integer.

What are some good methods to do this?

答案1

得分: 2

使用提供的JavaScript算法javascript版本,我们可以在Power Query中使用列表函数来对字符串进行哈希处理。其目的是将GUID或文件名转换为整数哈希以节省内存。

let
    HashFunction = (input) => 
    let
        ListChars = Text.ToList(input),
        ListNumbers = List.Transform(ListChars, 
            each Character.ToNumber(_)),
        HashNumber = List.Accumulate(ListNumbers, 
            0,
            (state, current) => 
                Number.Mod((state * 31 + current), 9223372036854775807))
    in
        HashNumber
in
    HashFunction

该函数将字符串转换为字符列表,然后将每个字符转换为数字。

计算涉及将当前哈希乘以一个常数,加上当前数字,并确保结果是32位整数。

编辑:上述函数对于相似的字符串具有较高的碰撞率。
这个函数效果更好,需要在其他地方定义名为'prime'的查询,其中包含13、131、1313等等的质数。

let
    BKDRHashFunction = (input, seed) => 
    let
        ListChars = Text.ToList(input),
        ListNumbers = List.Transform(ListChars, each Character.ToNumber(_)),
        HashNumber = List.Accumulate(ListNumbers, 0, (state, current) => Number.Mod((state * seed + current),2147483647))
    in
        HashNumber
in
    BKDRHashFunction

这个函数的碰撞率似乎要好得多。

英文:

Using the algorithm provided in javascript javascript version, we can use list functions in power query to hash a string. The purpose is to convert a guid or file name to an integer hash to save memory.

let
    HashFunction = (input) => 
    let
        ListChars = Text.ToList(input),
        ListNumbers = List.Transform(ListChars, 
            each Character.ToNumber(_)),
        HashNumber = List.Accumulate(ListNumbers, 
            0,
            (state, current) => 
                Number.Mod((state * 31 + current), 9223372036854775807))
    in
        HashNumber
in
    HashFunction
enter code here

The function converts the string to a list of characters and then each character is converted to a number.

The calculation involves multiplying the current hash by a constant, adding the current number, and ensuring the result is a 32-bit integer.

Edit: The function above has a high collision rate for similar strings.
This function works better, with a query called 'prime' defined elsewhere with a prime number in the sequence 13,131,1313...

let
    BKDRHashFunction = (input, seed) => 
    let
        ListChars = Text.ToList(input),
        ListNumbers = List.Transform(ListChars, each Character.ToNumber(_)),
        HashNumber = List.Accumulate(ListNumbers, 0, (state, current) => Number.Mod((state * seed + current),2147483647))
    in
        HashNumber
in
    BKDRHashFunction

The collision rate appears to be much better for this one.

huangapple
  • 本文由 发表于 2023年7月28日 00:44:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/76781866.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定