Rabin-Karp字符串匹配的实现(滚动哈希)

huangapple go评论62阅读模式
英文:

Implementation of Rabin-Karp string matching (Rolling hash)

问题

I am trying to implement Rabin-Karp string matching algorithm to find String needle in String haystack (Return index of String haystack where a match for String needle was found). I am getting error in trying to find needle c in haystack abc.

Here is how my output looks after running my code for finding c in abc:

hayHash: 0 and needleHash: 2 at i: 0 remove: a and add: b and new hash
is before checking for negatives: 1

hayHash: 1 and needleHash: 2 at i: 1 remove: b and add: c and new hash
is before checking for negatives: -50

hayHash: 51 and needleHash: 2

I can't figure why the last hash recount calculates to -50 and not 2 for hayHash. Here both hayHash and needleHash should be a calculated hash consisting of only char 'c', and should be the same value for both. But my code is recalculating hayHash to 51 (-50 before cancelling negative value) instead of 2.

Any suggestions on what might be wrong in my code?

Here is my code:

private fun find(haystack: String, needle: String): Int {
    if(needle.length > haystack.length) return -1

    val q = 101
    val d = 256
    var needleHash = 0
    var hayHash = 0
    var hash = 1

    for (i in 0..needle.length)
        hash = (hash * d) % q

    for(i in 0..needle.lastIndex) {
        needleHash = (d * needleHash + (needle[i] - 'a')) % q
        hayHash = (d * hayHash + (haystack[i] - 'a')) % q
    }

    for(i in 0..(haystack.length - needle.length)) {
        println("hayHash: $hayHash and needleHash: $needleHash")
        if(hayHash == needleHash) {
            for(j in 0..needle.lastIndex) {
                if(haystack[i + j] != needle[j])
                    break
                if(j == needle.lastIndex)
                    return i
            }
        }
        if(i == haystack.length - needle.length)
            break
        print("at i: $i remove: ${haystack[i]} and add: ${haystack[i + needle.length]}")   
        hayHash = (d * (hayHash - (haystack[i]  - 'a') * hash) + (haystack[i + needle.length]  - 'a')) % q
        println(" and new hash is before checking for negatives: $hayHash")
        if(hayHash < 0)
            hayHash += q
    }

    return -1
}
英文:

I am trying to implement Rabin-Karp string matching algorithm to find String needle in String haystack (Return index of String haystack where a match for String needle was found). I am getting error in trying to find needle c in haystack abc.

Here is how my output looks after running my code for finding c in abc:

> hayHash: 0 and needleHash: 2 at i: 0 remove: a and add: b and new hash
> is before checking for negatives: 1
>
> hayHash: 1 and needleHash: 2 at i: 1 remove: b and add: c and new hash
> is before checking for negatives: -50
>
> hayHash: 51 and needleHash: 2

I can't figure why the last hash recount calculates to -50 and not 2 for hayHash. Here both hayHash and needleHash should be a calculated hash consisting of only char &#39;c&#39;, and should be the same value for both. But my code is recalculating hayHash to 51 (-50 before cancelling negative value) instead of 2.

Any suggestions on what might be wrong in my code?

Here is my code:

private fun find(haystack: String, needle: String): Int {
    if(needle.length &gt; haystack.length) return -1

    val q = 101
    val d = 256
    var needleHash = 0
    var hayHash = 0
    var hash = 1

    for (i in 0..needle.length)
        hash = (hash * d) % q

    for(i in 0..needle.lastIndex) {
        needleHash = (d * needleHash + (needle[i] - &#39;a&#39;)) % q
        hayHash = (d * hayHash + (haystack[i] - &#39;a&#39;)) % q
    }

    for(i in 0..(haystack.length - needle.length)) {
        println(&quot;hayHash: $hayHash and needleHash: $needleHash&quot;)
        if(hayHash == needleHash) {
            for(j in 0..needle.lastIndex) {
                if(haystack[i + j] != needle[j])
                    break
                if(j == needle.lastIndex)
                    return i
            }
        }
        if(i == haystack.length - needle.length)
            break
        print(&quot;at i: $i remove: ${haystack[i]} and add: ${haystack[i + needle.length]}&quot;)   
        hayHash = (d * (hayHash - (haystack[i]  - &#39;a&#39;) * hash) + (haystack[i + needle.length]  - &#39;a&#39;)) % q
        println(&quot; and new hash is before checking for negatives: $hayHash&quot;)
        if(hayHash &lt; 0)
            hayHash += q
    }

    return -1
}

答案1

得分: 2

for (i in 0..needle.length) 在初始化hash时存在一种差一的错误。您应该使用 for(i in 0..needle.lastIndex)

英文:

for (i in 0..needle.length) while initializing hash is an off-by-one error. You want for(i in 0..needle.lastIndex).

答案2

得分: 1

一部分是预先计算错误的 d 次幂。
另一部分是将不仅窗口外的字符与该幂相乘,还有与当前 hayHash 的差异。
尝试 hayHash = (d * hayHash - ((haystack[i] - 'a') * hash) + (haystack[i + needle.length] - 'a')) % q

英文:

One part is pre-computing the wrong power of d.
The other is multiplying not only the character leaving the window with this power, but the difference from the current hayHash.
Try hayHash = (d * hayHash - ((haystack[i] - &#39;a&#39;) * hash) + (haystack[i + needle.length] - &#39;a&#39;)) % q

huangapple
  • 本文由 发表于 2023年3月3日 19:10:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/75626337.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定