How to check whether the given word exists in a sentence(string) without using the contains function in golang

huangapple go评论91阅读模式
英文:

How to check whether the given word exists in a sentence(string) without using the contains function in golang

问题

我需要一个替代方法来检查给定的单词是否存在于句子(字符串)中,而不是使用strings.contains()。

举个例子,我需要检查单词"can"是否存在于句子"I can run fast"中。如果我使用strings.Contains("can", "I can run fast"),结果会是true。但是,如果我使用strings.Contains("can", "I cannot run fast"),结果仍然是true,因为它包含了"can"。在上述情况中,我该如何准确地检查"can"为true,而"cannot"为false呢?

英文:

I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).

As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast") this gives true . But strings.Contains("can", "I cannot run fast") also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?

答案1

得分: 3

只是作为第一次尝试,你可以尝试使用正则表达式:

import "regexp"

var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)

func containsCan(s string) bool {
    return containsCanRegex.MatchString(s)
}

请注意,这个正则表达式匹配的是首字母大写的情况,所以它会匹配到 "Can I go?"

正则表达式中的 \b 匹配的是单词边界。它表示一个边界的一侧是一个单词字符,而另一侧是一个非单词字符、文本的开头或文本的结尾。

请注意,这个正则表达式会匹配到 "can't",因为 \b' 视为一个单词边界(因为它是一个非单词字符)。根据你的描述,这似乎不是你想要的结果。为了得到一个更通用的解决方案,你可能需要确定解决方案有多么通用。一个非常基本的方法是先分割单词,然后检查这些单词中是否有任何一个匹配到了 "can"。你可以使用正则表达式或者使用一个文本分词库来分割单词。

我不知道如何编写一个正则表达式,既能接受 "can" 又能拒绝 "can't" 在一个句子中的情况——因为 regexp 包不支持负向先行断言。

英文:

Just as a first attempt, you can try using a regular expression:

import "regexp"

var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)

func containsCan(s string) bool {
    return containsCanRegex.MatchString(s)
}

Note that this matches title-case, so it matches "Can I go?".

The \b in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.

Note that this will match "can't" because \b treats ' as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can". You could split the words with a regular expression or by using a text segmentation library.

I don't know how to write a regular expression that would accept "can" but reject "can't" in a sentence--the "regexp" package does not support negative lookahead.

答案2

得分: 1

我需要一个替代方法来检查给定的单词是否存在于句子(字符串)中,而不是使用strings.contains()

这里有一个使用简单算法的解决方案,可以区分“can”、“cannot”和“can't”。

package main

import (
	"fmt"
	"strings"
	"unicode"
)

func newFilter(words []string) map[string]struct{} {
	filter := make(map[string]struct{}, len(words))
	for _, word := range words {
		word = strings.TrimSpace(word)
		word = strings.ToLower(word)
		if len(word) > 0 {
			filter[word] = struct{}{}
		}
	}
	return filter
}

func applyFilter(text string, filter map[string]struct{}) bool {
	const (
		rApostrophe  = '\u0027'
		sApostrophe  = string(rApostrophe)
		sApostropheS = string(rApostrophe) + "s"
		rSoftHyphen  = '\u00AD'
		sSoftHyphen  = string(rSoftHyphen)
		sHyphenLF    = "-\n"
		sHyphenCRLF  = "-\r\n"
	)

	split := func(r rune) bool {
		return !unicode.IsLetter(r) && r != rApostrophe
	}

	text = strings.ToLower(text)
	if strings.Contains(text, sSoftHyphen) {
		text = strings.ReplaceAll(text, sSoftHyphen, "")
	}
	if strings.Contains(text, sHyphenLF) {
		text = strings.ReplaceAll(text, sHyphenLF, "")
	} else if strings.Contains(text, sHyphenCRLF) {
		text = strings.ReplaceAll(text, sHyphenCRLF, "")
	}

	words := strings.FieldsFunc(text, split)
	for _, word := range words {
		if strings.HasSuffix(word, sApostrophe) {
			word = word[:len(word)-len(sApostrophe)]
		} else if strings.HasSuffix(word, sApostropheS) {
			word = word[:len(word)-len(sApostropheS)]
		}
		if _, ok := filter[word]; ok {
			return true
		}
	}
	return false
}

func main() {
	filter := newFilter([]string{"can"})
	text := "I can run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I cannot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can-\nnot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can't run fast"
	fmt.Println(applyFilter(text, filter))

	filter = newFilter([]string{"cannot", "can't"})
	text = "I can run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I cannot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can-\nnot run fast"
	fmt.Println(applyFilter(text, filter))
	text = "I can't run fast"
	fmt.Println(applyFilter(text, filter))
}

你可以在这里运行代码:https://go.dev/play/p/sQpTt5JY8Qt

英文:

> I need an alternative method instead of strings.contains() to to check
> whether the given word exists in a sentence(string).
>
> I'm trying to implement a filter for given set of words.

Here's a solution which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".

package main
import (
"fmt"
"strings"
"unicode"
)
func newFilter(words []string) map[string]struct{} {
filter := make(map[string]struct{}, len(words))
for _, word := range words {
word = strings.TrimSpace(word)
word = strings.ToLower(word)
if len(word) > 0 {
filter[word] = struct{}{}
}
}
return filter
}
func applyFilter(text string, filter map[string]struct{}) bool {
const (
rApostrophe  = '\u0027'
sApostrophe  = string(rApostrophe)
sApostropheS = string(rApostrophe) + "s"
rSoftHyphen  = '\u00AD'
sSoftHyphen  = string(rSoftHyphen)
sHyphenLF    = "-\n"
sHyphenCRLF  = "-\r\n"
)
split := func(r rune) bool {
return !unicode.IsLetter(r) && r != rApostrophe
}
text = strings.ToLower(text)
if strings.Contains(text, sSoftHyphen) {
text = strings.ReplaceAll(text, sSoftHyphen, "")
}
if strings.Contains(text, sHyphenLF) {
text = strings.ReplaceAll(text, sHyphenLF, "")
} else if strings.Contains(text, sHyphenCRLF) {
text = strings.ReplaceAll(text, sHyphenCRLF, "")
}
words := strings.FieldsFunc(text, split)
for _, word := range words {
if strings.HasSuffix(word, sApostrophe) {
word = word[:len(word)-len(sApostrophe)]
} else if strings.HasSuffix(word, sApostropheS) {
word = word[:len(word)-len(sApostropheS)]
}
if _, ok := filter[word]; ok {
return true
}
}
return false
}
func main() {
filter := newFilter([]string{"can"})
text := "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
filter = newFilter([]string{"cannot", "can't"})
text = "I can run fast"
fmt.Println(applyFilter(text, filter))
text = "I cannot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can-\nnot run fast"
fmt.Println(applyFilter(text, filter))
text = "I can't run fast"
fmt.Println(applyFilter(text, filter))
}

https://go.dev/play/p/sQpTt5JY8Qt

huangapple
  • 本文由 发表于 2022年3月7日 14:45:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/71377313.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定