将移位折叠哈希用于生成数据库记录的索引。

huangapple go评论72阅读模式
英文:

shift fold hashing to generate indices of records in database

问题

I'm here to provide the translation as requested. Here's the translated portion:

我正在处理一个任务,在这个任务中,我被要求实现一个用于为数据库中记录的字符串类型键进行哈希的移位折叠函数,并返回该记录在数据库中的位置。根据我的理解,这意味着 sfold 函数必须生成一个与数据库中该记录的位置相匹配的哈希值。

这是该方法的代码:

public static long sfold(String s, int M) {
    int intLength = s.length() / 4;
    long sum = 0;
    for (int j = 0; j < intLength; j++) {
        char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
        long mult = 1;
        for (int k = 0; k < c.length; k++) {
            sum += c[k] * mult;
            mult *= 256;
        }
    }

    char c[] = s.substring(intLength * 4).toCharArray();
    long mult = 1;
    for (int k = 0; k < c.length; k++) {
        sum += c[k] * mult;
        mult *= 256;
    }
    
    return (Math.abs(sum) % M);
}

这将生成一个介于 0 到记录数-1(M-1)之间的随机数字。问题是这些数字不会是唯一的,可能不会生成所有可能的数字。那么如何使用这些数字来返回数据库中记录的位置呢?

我的想法是从字符串键生成哈希值,获取哈希数字,按该数字对记录进行排序,然后将它们插入数据库中,但正如我所说,这种方法不会生成唯一的数字,也不能保证获取到所有的数字。

英文:

I am working on an assignment where I am asked to implement a shift fold function for hashing String type keys for records in a database and returning the position of that record in the database. To my understanding, this means that the sfold function has to produce a hash that matches the position of that record in the database.

The code for the method is as follows:

    public static long sfold(String s, int M) {
        int intLength = s.length() / 4;
        long sum = 0;
        for (int j = 0; j &lt; intLength; j++) {
            char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
            long mult = 1;
            for (int k = 0; k &lt; c.length; k++) {
                sum += c[k] * mult;
                mult *= 256;
            }
        }

        char c[] = s.substring(intLength * 4).toCharArray();
        long mult = 1;
        for (int k = 0; k &lt; c.length; k++) {
            sum += c[k] * mult;
            mult *= 256;
        }
        
        return (Math.abs(sum) % M);
    }

This generates a random digit between 0 and the number of records-1 (M-1). The problem is the digits will not be unique and all possible digits might not be generated. So how can this be used to return the position of records in the database?

My idea was to generate the hashes from the String keys, get the hash digit, sort records by that digit and then insert them in the database but like I said the method does not produce unique digits and there is no guarantee of getting all the digits.

答案1

得分: 1

"Lack of uniqueness" is called a collision, and there are different ways to solve it. The details are explained well in Wikipedia: https://en.wikipedia.org/wiki/Hash_table#Collision_resolution

There are two main approaches:

  • Separate chaining uses extra storage space: if a string hashes to a number already in the table, it spills over to secondary storage space. In an in-memory data structure, the extra storage is typically a linked list.

  • Open addressing looks for unused space: if a string hashes to a number already in the table, it's stored somewhere else in the same table, according to a strategy you decide beforehand.

You have a pretty serious problem here:

mult *= 256;

Multiplying by a power of 2 means you are throwing away information: after only 8 characters mult = 0 and you're ignoring the rest of the string. Changing the multiplier to a prime number, such as 127, solves this.

You have another problem here:

return (Math.abs(sum) % M);

Math.abs has a special case where it does not return a positive number: Long.MIN_VALUE. One way to fix it is taking the absolute value after the remainder:

return Math.abs(sum % M);

英文:

"Lack of uniqueness" is called a collision, and there are different ways to solve it. The details are explained well in Wikipedia: https://en.wikipedia.org/wiki/Hash_table#Collision_resolution

There are two main approaches:

  • Separate chaining uses extra storage space: if a string hashes to a number already in the table, it spills over to secondary storage space. In an in-memory data structure, the extra storage is typically a linked list.

  • Open addressing looks for unused space: if a string hashes to a number already in the table, it's stored somewhere else in the same table, according to a strategy you decide beforehand.

You have a pretty serious problem here:

mult *= 256;

Multiplying by a power of 2 means you are throwing away information: after only 8 characters mult = 0 and you're ignoring the rest of the string. Changing the multiplier to a prime number, such as 127, solves this.

You have another problem here:

return (Math.abs(sum) % M);

Math.abs has a special case where it does not return a positive number: Long.MIN_VALUE. One way to fix it is taking the absolute value after the remainder:

return Math.abs(sum % M);

huangapple
  • 本文由 发表于 2020年8月13日 23:22:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/63398249.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定