2022年3月7日 14:45:46go评论116阅读模式

英文:

How to check whether the given word exists in a sentence(string) without using the contains function in golang

问题

我需要一个替代方法来检查给定的单词是否存在于句子（字符串）中，而不是使用strings.contains()。

举个例子，我需要检查单词"can"是否存在于句子"I can run fast"中。如果我使用strings.Contains("can", "I can run fast")，结果会是true。但是，如果我使用strings.Contains("can", "I cannot run fast")，结果仍然是true，因为它包含了"can"。在上述情况中，我该如何准确地检查"can"为true，而"cannot"为false呢？

英文:

I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).

As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast") this gives true . But strings.Contains("can", "I cannot run fast") also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?

答案1

得分: 3

只是作为第一次尝试，你可以尝试使用正则表达式：

import "regexp"
var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
func containsCan(s string) bool {
    return containsCanRegex.MatchString(s)
}

请注意，这个正则表达式匹配的是首字母大写的情况，所以它会匹配到 "Can I go?"。

正则表达式中的 \b 匹配的是单词边界。它表示一个边界的一侧是一个单词字符，而另一侧是一个非单词字符、文本的开头或文本的结尾。

请注意，这个正则表达式会匹配到 "can't"，因为 \b 将 ' 视为一个单词边界（因为它是一个非单词字符）。根据你的描述，这似乎不是你想要的结果。为了得到一个更通用的解决方案，你可能需要确定解决方案有多么通用。一个非常基本的方法是先分割单词，然后检查这些单词中是否有任何一个匹配到了 "can"。你可以使用正则表达式或者使用一个文本分词库来分割单词。

我不知道如何编写一个正则表达式，既能接受 "can" 又能拒绝 "can't" 在一个句子中的情况——因为 regexp 包不支持负向先行断言。

英文:

Just as a first attempt, you can try using a regular expression:

import &quot;regexp&quot;
var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
func containsCan(s string) bool {
    return containsCanRegex.MatchString(s)
}

Note that this matches title-case, so it matches "Can I go?".

The \b in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.

Note that this will match "can't" because \b treats ' as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can". You could split the words with a regular expression or by using a text segmentation library.

I don't know how to write a regular expression that would accept "can" but reject "can't" in a sentence--the "regexp" package does not support negative lookahead.

答案2

得分: 1

我需要一个替代方法来检查给定的单词是否存在于句子（字符串）中，而不是使用strings.contains()。

这里有一个使用简单算法的解决方案，可以区分“can”、“cannot”和“can't”。

package main
import (
	"fmt"
	"strings"
	"unicode"
)
func newFilter(words []string) map[string]struct{} {
	filter := make(map[string]struct{}, len(words))
	for _, word := range words {
		word = strings.TrimSpace(word)
		word = strings.ToLower(word)
		if len(word) > 0 {
			filter[word] = struct{}{}
		}
	}
	return filter
}
func applyFilter(text string, filter map[string]struct{}) bool {
	const (
		rApostrophe  = '\u0027'
		sApostrophe  = string(rApostrophe)
		sApostropheS = string(rApostrophe) + "s"
		rSoftHyphen  = '\u00AD'
		sSoftHyphen  = string(rSoftHyphen)
		sHyphenLF    = "-\n"
		sHyphenCRLF  = "-\r\n"
	)
	split := func(r rune) bool {
		return !unicode.IsLetter(r) && r != rApostrophe
	}
	text = strings.ToLower(text)
	if strings.Contains(text, sSoftHyphen) {
		text = strings.ReplaceAll(text, sSoftHyphen, "")
	}
	if strings.Contains(text, sHyphenLF) {
		text = strings.ReplaceAll(text, sHyphenLF, "")
	} else if strings.Contains(text, sHyphenCRLF) {
		text = strings.ReplaceAll(text, sHyphenCRLF, "")
	}
	words := strings.FieldsFunc(text, split)
	for _, word := range words {
		if strings.HasSuffix(word, sApostrophe) {
			word = word[:len(word)-len(sApostrophe)]
		} else if strings.HasSuffix(word, sApostropheS) {
			word = word[:len(word)-len(sApostropheS)]
		}
		if _, ok := filter[word]; ok {
			return true
		}
	}
	return false
}
func main() {
	filter := newFilter([]string{"can"})
	text := "I can run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I cannot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can-\nnot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can't run fast"
	fmt.Println(applyFilter(text, filter))
	filter = newFilter([]string{"cannot", "can't"})
	text = "I can run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I cannot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can-\nnot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can't run fast"
	fmt.Println(applyFilter(text, filter))
}

你可以在这里运行代码：https://go.dev/play/p/sQpTt5JY8Qt

英文:

> I need an alternative method instead of strings.contains() to to check
> whether the given word exists in a sentence(string).
>
> I'm trying to implement a filter for given set of words.

Here's a solution which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".

package main
import (
&quot;fmt&quot;
&quot;strings&quot;
&quot;unicode&quot;
)
func newFilter(words []string) map[string]struct{} {
filter := make(map[string]struct{}, len(words))
for _, word := range words {
word = strings.TrimSpace(word)
word = strings.ToLower(word)
if len(word) &gt; 0 {
filter[word] = struct{}{}
}
}
return filter
}
func applyFilter(text string, filter map[string]struct{}) bool {
const (
rApostrophe  = &#39;\u0027&#39;
sApostrophe  = string(rApostrophe)
sApostropheS = string(rApostrophe) + &quot;s&quot;
rSoftHyphen  = &#39;\u00AD&#39;
sSoftHyphen  = string(rSoftHyphen)
sHyphenLF    = &quot;-\n&quot;
sHyphenCRLF  = &quot;-\r\n&quot;
)
split := func(r rune) bool {
return !unicode.IsLetter(r) &amp;&amp; r != rApostrophe
}
text = strings.ToLower(text)
if strings.Contains(text, sSoftHyphen) {
text = strings.ReplaceAll(text, sSoftHyphen, &quot;&quot;)
}
if strings.Contains(text, sHyphenLF) {
text = strings.ReplaceAll(text, sHyphenLF, &quot;&quot;)
} else if strings.Contains(text, sHyphenCRLF) {
text = strings.ReplaceAll(text, sHyphenCRLF, &quot;&quot;)
}
words := strings.FieldsFunc(text, split)
for _, word := range words {
if strings.HasSuffix(word, sApostrophe) {
word = word[:len(word)-len(sApostrophe)]
} else if strings.HasSuffix(word, sApostropheS) {
word = word[:len(word)-len(sApostropheS)]
}
if _, ok := filter[word]; ok {
return true
}
}
return false
}
func main() {
filter := newFilter([]string{&quot;can&quot;})
text := &quot;I can run fast&quot;
fmt.Println(applyFilter(text, filter))
text = &quot;I cannot run fast&quot;
fmt.Println(applyFilter(text, filter))
text = &quot;I can-\nnot run fast&quot;
fmt.Println(applyFilter(text, filter))
text = &quot;I can&#39;t run fast&quot;
fmt.Println(applyFilter(text, filter))
filter = newFilter([]string{&quot;cannot&quot;, &quot;can&#39;t&quot;})
text = &quot;I can run fast&quot;
fmt.Println(applyFilter(text, filter))
text = &quot;I cannot run fast&quot;
fmt.Println(applyFilter(text, filter))
text = &quot;I can-\nnot run fast&quot;
fmt.Println(applyFilter(text, filter))
text = &quot;I can&#39;t run fast&quot;
fmt.Println(applyFilter(text, filter))
}

https://go.dev/play/p/sQpTt5JY8Qt

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

How to check whether the given word exists in a sentence(string) without using the contains function in golang

问题

答案1

答案2

Golang GraphQL 总是将一个变量返回为 null。

Golang Equivalent of `is` Operator in Python

有没有一个用于获取大整数立方根的Go函数？

How to write *PrivateKey type variable to a file in golang?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。