英文:
Create a suffix tree in golang
问题
我有一个字符串数组,我需要在Golang中创建一个后缀树。
Golang中的SuffixArray无法满足我的需求,因为它只接受字节数组(即单个字符串的数组)。
有人可以提供实现的指导吗?
提前感谢。
英文:
I have an array of strings and I need to create a suffix tree out of it in Golang.
SuffixArray in Golang does not suffice my needs, because it only accepts byte array (i.e of a single string).
Could anybody provide pointers for implementation.
Thanks in advance.
答案1
得分: 8
这是一个使用后缀数组进行自动补全的示例。注意,我使用了一个以\x00
为前缀将所有字符串连接在一起的方法,这个前缀在字符串中不会出现。
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
words := []string{
"aardvark",
"happy",
"hello",
"hero",
"he",
"hotel",
}
// 使用\x00作为每个字符串的起始符号
joinedStrings := "\x00" + strings.Join(words, "\x00")
sa := suffixarray.New([]byte(joinedStrings))
// 用户输入了"he"
match, err := regexp.Compile("\x00he[^\x00]*")
if err != nil {
panic(err)
}
ms := sa.FindAllIndex(match, -1)
for _, m := range ms {
start, end := m[0], m[1]
fmt.Printf("match = %q\n", joinedStrings[start+1:end])
}
}
输出结果为:
match = "hello"
match = "hero"
match = "he"
英文:
Here is an example of how to use suffix array to do auto completion. (playground).
Note that I joined all the strings together with a prefix of \x00
which can't occur in the strings first.
package main
import (
"fmt"
"index/suffixarray"
"regexp"
"strings"
)
func main() {
words := []string{
"aardvark",
"happy",
"hello",
"hero",
"he",
"hotel",
}
// use \x00 to start each string
joinedStrings := "\x00" + strings.Join(words, "\x00")
sa := suffixarray.New([]byte(joinedStrings))
// User has typed in "he"
match, err := regexp.Compile("\x00he[^\x00]*")
if err != nil {
panic(err)
}
ms := sa.FindAllIndex(match, -1)
for _, m := range ms {
start, end := m[0], m[1]
fmt.Printf("match = %q\n", joinedStrings[start+1:end])
}
}
Prints
match = "hello"
match = "hero"
match = "he"
答案2
得分: 1
你想要的是通用后缀树(Generalized Suffix Tree)。构建这样的树的一种简单方法是为每个字符串附加一个不同的结束标记(在任何字符串中都没有使用的符号),将它们连接起来,并为连接后的字符串构建一个普通的后缀树。所以你只需要将 "hello world" 添加到字符串集合中,并使用以下代码来获取包含 "wor" 的字符串:
match, err := regexp.Compile("[^\x00]*wor[^\x00]*")
注意,正确的字符串是 joinedStrings[start:end]
。
英文:
What you want is called generalized suffix tree. A simple way to build such trees is to append a different end marker(symbols not used in any of the strings) to each strings, concatenate them and build a normal suffix tree for the concatenated string. So you just need to add "hello world" to the string set and use:
match, err := regexp.Compile("[^\x00]*wor[^\x00]*")
to get the strings contain "wor". Note that the correct string is joinedStrings[start:end]
.
答案3
得分: 0
我创建了一个具有O(n)复杂度的后缀树实现,其中n是字符串的长度:https://github.com/twelvedata/searchindex
更多详细信息请参阅我在Medium上的文章:https://medium.com/twelve-data/in-memory-text-search-index-for-quotes-on-go-5243adc62c26
英文:
I created implementation of suffix tree with O(n) complexity, where n is length of string: https://github.com/twelvedata/searchindex
More details in my article on Medium https://medium.com/twelve-data/in-memory-text-search-index-for-quotes-on-go-5243adc62c26
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论