英文:
shift fold hashing to generate indices of records in database
问题
I'm here to provide the translation as requested. Here's the translated portion:
我正在处理一个任务,在这个任务中,我被要求实现一个用于为数据库中记录的字符串类型键进行哈希的移位折叠函数,并返回该记录在数据库中的位置。根据我的理解,这意味着 sfold 函数必须生成一个与数据库中该记录的位置相匹配的哈希值。
这是该方法的代码:
public static long sfold(String s, int M) {
int intLength = s.length() / 4;
long sum = 0;
for (int j = 0; j < intLength; j++) {
char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
}
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
return (Math.abs(sum) % M);
}
这将生成一个介于 0 到记录数-1(M-1)之间的随机数字。问题是这些数字不会是唯一的,可能不会生成所有可能的数字。那么如何使用这些数字来返回数据库中记录的位置呢?
我的想法是从字符串键生成哈希值,获取哈希数字,按该数字对记录进行排序,然后将它们插入数据库中,但正如我所说,这种方法不会生成唯一的数字,也不能保证获取到所有的数字。
英文:
I am working on an assignment where I am asked to implement a shift fold function for hashing String type keys for records in a database and returning the position of that record in the database. To my understanding, this means that the sfold function has to produce a hash that matches the position of that record in the database.
The code for the method is as follows:
public static long sfold(String s, int M) {
int intLength = s.length() / 4;
long sum = 0;
for (int j = 0; j < intLength; j++) {
char c[] = s.substring(j * 4, (j * 4) + 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
}
char c[] = s.substring(intLength * 4).toCharArray();
long mult = 1;
for (int k = 0; k < c.length; k++) {
sum += c[k] * mult;
mult *= 256;
}
return (Math.abs(sum) % M);
}
This generates a random digit between 0 and the number of records-1 (M-1). The problem is the digits will not be unique and all possible digits might not be generated. So how can this be used to return the position of records in the database?
My idea was to generate the hashes from the String keys, get the hash digit, sort records by that digit and then insert them in the database but like I said the method does not produce unique digits and there is no guarantee of getting all the digits.
答案1
得分: 1
"Lack of uniqueness" is called a collision, and there are different ways to solve it. The details are explained well in Wikipedia: https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
There are two main approaches:
-
Separate chaining uses extra storage space: if a string hashes to a number already in the table, it spills over to secondary storage space. In an in-memory data structure, the extra storage is typically a linked list.
-
Open addressing looks for unused space: if a string hashes to a number already in the table, it's stored somewhere else in the same table, according to a strategy you decide beforehand.
You have a pretty serious problem here:
mult *= 256;
Multiplying by a power of 2 means you are throwing away information: after only 8 characters mult = 0 and you're ignoring the rest of the string. Changing the multiplier to a prime number, such as 127, solves this.
You have another problem here:
return (Math.abs(sum) % M);
Math.abs has a special case where it does not return a positive number: Long.MIN_VALUE. One way to fix it is taking the absolute value after the remainder:
return Math.abs(sum % M);
英文:
"Lack of uniqueness" is called a collision, and there are different ways to solve it. The details are explained well in Wikipedia: https://en.wikipedia.org/wiki/Hash_table#Collision_resolution
There are two main approaches:
-
Separate chaining uses extra storage space: if a string hashes to a number already in the table, it spills over to secondary storage space. In an in-memory data structure, the extra storage is typically a linked list.
-
Open addressing looks for unused space: if a string hashes to a number already in the table, it's stored somewhere else in the same table, according to a strategy you decide beforehand.
You have a pretty serious problem here:
mult *= 256;
Multiplying by a power of 2 means you are throwing away information: after only 8 characters mult = 0 and you're ignoring the rest of the string. Changing the multiplier to a prime number, such as 127, solves this.
You have another problem here:
return (Math.abs(sum) % M);
Math.abs
has a special case where it does not return a positive number: Long.MIN_VALUE. One way to fix it is taking the absolute value after the remainder:
return Math.abs(sum % M);
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论