英文:
Map words to single characters
问题
我正在构建一个哈希函数,它应该将任何字符串(最大长度100个字符)映射到一个单个的[A-Z]字符(我将其用于分片的目的)。
我想出了这个简单的Java函数,有没有办法让它运行得更快?
public static final char stringToChar(final String s) {
long counter = 0;
for (char c : s.toCharArray()) {
counter += c;
}
return (char)('A' + (counter % 26));
}
英文:
I'm building an hash function which should map any String (max length 100 characters) to a single [A-Z] character (I'm using it for sharding purposes).
I came up with this simple Java function, is there any way to make it faster?
public static final char stringToChar(final String s) {
long counter = 0;
for (char c : s.toCharArray()) {
counter += c;
}
return (char)('A'+(counter%26));
}
答案1
得分: 6
一个让“碎片”均匀分布的快速技巧是使用哈希函数。
我建议使用默认的Java String.hashCode()
函数的这种方法:
public static char getShardLabel(String string) {
int hash = string.hashCode();
// 使用 Math.floorMod 而不是操作符 %,因为 '%' 可能产生负数输出
int hashMod = Math.floorMod(hash, 26);
return (char)('A' + hashMod);
}
正如在这里指出的,这种方法被认为是“足够均匀的”。
根据快速测试,它似乎比你提出的解决方案更快。在包含各种长度的 80kk 个字符串上:
getShardLabel
花费了 65 毫秒stringToChar
花费了 571 毫秒
英文:
A quick trick to have an even distribution of the "shards" is using an hash function.
I suggest this method that uses the default java String.hashCode()
function
public static char getShardLabel(String string) {
int hash = string.hashCode();
// using Math.flootMod instead of operator % beacause '%' can produce negavive outputs
int hashMod = Math.floorMod(hash, 26);
return (char)('A'+(hashMod));
}
As pointed out here this method is considered "even enough".
Based on a quick test it looks faster than the solution you suggested.
On 80kk strings of various lengths:
getShardLabel
took 65 millisecondsstringToChar
took 571 milliseconds
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论