将单词映射到单个字符

huangapple go评论70阅读模式
英文:

Map words to single characters

问题

我正在构建一个哈希函数,它应该将任何字符串(最大长度100个字符)映射到一个单个的[A-Z]字符(我将其用于分片的目的)。

我想出了这个简单的Java函数,有没有办法让它运行得更快?

public static final char stringToChar(final String s) {
    long counter = 0;
    for (char c : s.toCharArray()) {
        counter += c;
    }
    return (char)('A' + (counter % 26));
}
英文:

I'm building an hash function which should map any String (max length 100 characters) to a single [A-Z] character (I'm using it for sharding purposes).

I came up with this simple Java function, is there any way to make it faster?

public static final char stringToChar(final String s) {
    long counter = 0;
    for (char c : s.toCharArray()) {
        counter += c;
    }
    return (char)('A'+(counter%26));
}

答案1

得分: 6

一个让“碎片”均匀分布的快速技巧是使用哈希函数。

我建议使用默认的Java String.hashCode() 函数的这种方法:

public static char getShardLabel(String string) {
    int hash = string.hashCode();
    // 使用 Math.floorMod 而不是操作符 %,因为 '%' 可能产生负数输出
    int hashMod = Math.floorMod(hash, 26);
    return (char)('A' + hashMod);
}

正如在这里指出的,这种方法被认为是“足够均匀的”。

根据快速测试,它似乎比你提出的解决方案更快。在包含各种长度的 80kk 个字符串上:

  • getShardLabel 花费了 65 毫秒
  • stringToChar 花费了 571 毫秒
英文:

A quick trick to have an even distribution of the "shards" is using an hash function.

I suggest this method that uses the default java String.hashCode() function

public static char getShardLabel(String string) {
	int hash = string.hashCode();
	// using Math.flootMod instead of operator % beacause '%' can produce negavive outputs
	int hashMod = Math.floorMod(hash, 26);
	return (char)('A'+(hashMod));
}

As pointed out here this method is considered "even enough".

Based on a quick test it looks faster than the solution you suggested.
On 80kk strings of various lengths:

  • getShardLabel took 65 milliseconds
  • stringToChar took 571 milliseconds

huangapple
  • 本文由 发表于 2020年8月6日 15:21:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/63278669.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定