英文:
How to check whether the given word exists in a sentence(string) without using the contains function in golang
问题
我需要一个替代方法来检查给定的单词是否存在于句子(字符串)中,而不是使用strings.contains()。
举个例子,我需要检查单词"can"是否存在于句子"I can run fast"中。如果我使用strings.Contains("can", "I can run fast"),结果会是true。但是,如果我使用strings.Contains("can", "I cannot run fast"),结果仍然是true,因为它包含了"can"。在上述情况中,我该如何准确地检查"can"为true,而"cannot"为false呢?
英文:
I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).
As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast")
this gives true . But strings.Contains("can", "I cannot run fast")
also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?
答案1
得分: 3
只是作为第一次尝试,你可以尝试使用正则表达式:
import "regexp"
var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
func containsCan(s string) bool {
return containsCanRegex.MatchString(s)
}
请注意,这个正则表达式匹配的是首字母大写的情况,所以它会匹配到 "Can I go?"
。
正则表达式中的 \b
匹配的是单词边界。它表示一个边界的一侧是一个单词字符,而另一侧是一个非单词字符、文本的开头或文本的结尾。
请注意,这个正则表达式会匹配到 "can't"
,因为 \b
将 '
视为一个单词边界(因为它是一个非单词字符)。根据你的描述,这似乎不是你想要的结果。为了得到一个更通用的解决方案,你可能需要确定解决方案有多么通用。一个非常基本的方法是先分割单词,然后检查这些单词中是否有任何一个匹配到了 "can"
。你可以使用正则表达式或者使用一个文本分词库来分割单词。
我不知道如何编写一个正则表达式,既能接受 "can"
又能拒绝 "can't"
在一个句子中的情况——因为 regexp
包不支持负向先行断言。
英文:
Just as a first attempt, you can try using a regular expression:
import "regexp"
var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
func containsCan(s string) bool {
return containsCanRegex.MatchString(s)
}
Note that this matches title-case, so it matches "Can I go?"
.
The \b
in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.
Note that this will match "can't"
because \b
treats '
as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can"
. You could split the words with a regular expression or by using a text segmentation library.
I don't know how to write a regular expression that would accept "can"
but reject "can't"
in a sentence--the "regexp"
package does not support negative lookahead.
答案2
得分: 1
我需要一个替代方法来检查给定的单词是否存在于句子(字符串)中,而不是使用strings.contains()
。
这里有一个使用简单算法的解决方案,可以区分“can”、“cannot”和“can't”。
package main
import (
"fmt"
"strings"
"unicode"
)
func newFilter(words []string) map[string]struct{} {
filter := make(map[string]struct{}, len(words))
for _, word := range words {
word = strings.TrimSpace(word)
word = strings.ToLower(word)
if len(word) > 0 {
filter[word] = struct{}{}
}
}
return filter
}
func applyFilter(text string, filter map[string]struct{}) bool {
const (
rApostrophe = '\u0027'
sApostrophe = string(rApostrophe)
sApostropheS = string(rApostrophe) + "s"
rSoftHyphen = '\u00AD'
sSoftHyphen = string(rSoftHyphen)
sHyphenLF = "-\n"
sHyphenCRLF = "-\r\n"
)
split := func(r rune) bool {
return !unicode.IsLetter(r) && r != rApostrophe
}
text = strings.ToLower(text)
if strings.Contains(text, sSoftHyphen) {
text = strings.ReplaceAll(text, sSoftHyphen, "")
}
if strings.Contains(text, sHyphenLF) {
text = strings.ReplaceAll(text, sHyphenLF, "")
} else if strings.Contains(text, sHyphenCRLF) {
text = strings.ReplaceAll(text, sHyphenCRLF, "")
}
words := strings.FieldsFunc(text, split)
for _, word := range words {
if strings.HasSuffix(word, sApostrophe) {
word = word[:len(word)-len(sApostrophe)]
} else if strings.HasSuffix(word, sApostropheS) {
word = word[:len(word)-len(sApostropheS)]
}
if _, ok := filter[word]; ok {
return true
}
}
return false
}
func main() {
filter := newFilter([]string{"can"})
text := "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
filter = newFilter([]string{"cannot", "can't"})
text = "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
}
你可以在这里运行代码:https://go.dev/play/p/sQpTt5JY8Qt
英文:
> I need an alternative method instead of strings.contains() to to check
> whether the given word exists in a sentence(string).
>
> I'm trying to implement a filter for given set of words.
Here's a solution which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".
package main
import (
"fmt"
"strings"
"unicode"
)
func newFilter(words []string) map[string]struct{} {
filter := make(map[string]struct{}, len(words))
for _, word := range words {
word = strings.TrimSpace(word)
word = strings.ToLower(word)
if len(word) > 0 {
filter[word] = struct{}{}
}
}
return filter
}
func applyFilter(text string, filter map[string]struct{}) bool {
const (
rApostrophe = '\u0027'
sApostrophe = string(rApostrophe)
sApostropheS = string(rApostrophe) + "s"
rSoftHyphen = '\u00AD'
sSoftHyphen = string(rSoftHyphen)
sHyphenLF = "-\n"
sHyphenCRLF = "-\r\n"
)
split := func(r rune) bool {
return !unicode.IsLetter(r) && r != rApostrophe
}
text = strings.ToLower(text)
if strings.Contains(text, sSoftHyphen) {
text = strings.ReplaceAll(text, sSoftHyphen, "")
}
if strings.Contains(text, sHyphenLF) {
text = strings.ReplaceAll(text, sHyphenLF, "")
} else if strings.Contains(text, sHyphenCRLF) {
text = strings.ReplaceAll(text, sHyphenCRLF, "")
}
words := strings.FieldsFunc(text, split)
for _, word := range words {
if strings.HasSuffix(word, sApostrophe) {
word = word[:len(word)-len(sApostrophe)]
} else if strings.HasSuffix(word, sApostropheS) {
word = word[:len(word)-len(sApostropheS)]
}
if _, ok := filter[word]; ok {
return true
}
}
return false
}
func main() {
filter := newFilter([]string{"can"})
text := "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
filter = newFilter([]string{"cannot", "can't"})
text = "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论