How to check whether the given word exists in a sentence(string) without using the contains function in golang

huangapple go评论116阅读模式
英文:

How to check whether the given word exists in a sentence(string) without using the contains function in golang

问题

我需要一个替代方法来检查给定的单词是否存在于句子(字符串)中,而不是使用strings.contains()。

举个例子,我需要检查单词"can"是否存在于句子"I can run fast"中。如果我使用strings.Contains("can", "I can run fast"),结果会是true。但是,如果我使用strings.Contains("can", "I cannot run fast"),结果仍然是true,因为它包含了"can"。在上述情况中,我该如何准确地检查"can"为true,而"cannot"为false呢?

英文:

I need an alternative method instead of strings.contains() to to check whether the given word exists in a sentence(string).

As an example I need to check the word "can" is inside the sentence "I can run fast" . If I use strings strings.Contains("can", "I can run fast") this gives true . But strings.Contains("can", "I cannot run fast") also gives true as it contains can . How can I check exactly the word can gives true and cannot gives false in the above mentioned scenario ?

答案1

得分: 3

只是作为第一次尝试,你可以尝试使用正则表达式:

  1. import "regexp"
  2. var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
  3. func containsCan(s string) bool {
  4. return containsCanRegex.MatchString(s)
  5. }

请注意,这个正则表达式匹配的是首字母大写的情况,所以它会匹配到 "Can I go?"

正则表达式中的 \b 匹配的是单词边界。它表示一个边界的一侧是一个单词字符,而另一侧是一个非单词字符、文本的开头或文本的结尾。

请注意,这个正则表达式会匹配到 "can't",因为 \b' 视为一个单词边界(因为它是一个非单词字符)。根据你的描述,这似乎不是你想要的结果。为了得到一个更通用的解决方案,你可能需要确定解决方案有多么通用。一个非常基本的方法是先分割单词,然后检查这些单词中是否有任何一个匹配到了 "can"。你可以使用正则表达式或者使用一个文本分词库来分割单词。

我不知道如何编写一个正则表达式,既能接受 "can" 又能拒绝 "can't" 在一个句子中的情况——因为 regexp 包不支持负向先行断言。

英文:

Just as a first attempt, you can try using a regular expression:

  1. import "regexp"
  2. var containsCanRegex = regexp.MustCompile(`\b[Cc]an\b`)
  3. func containsCan(s string) bool {
  4. return containsCanRegex.MatchString(s)
  5. }

Note that this matches title-case, so it matches "Can I go?".

The \b in a regular expression matches a "word boundary". It just means there is a word character on one side, and a non-word character, beginning of text, or end of text on the other side.

Note that this will match "can't" because \b treats ' as a word boundary (since it's a non-word character). It sounds like this is not what you want. In order to come up with a more general solution, you may want to know just how general you want the solution to be. A very basic approach would be to split the words first, and then check if any of those words match "can". You could split the words with a regular expression or by using a text segmentation library.

I don't know how to write a regular expression that would accept "can" but reject "can't" in a sentence--the "regexp" package does not support negative lookahead.

答案2

得分: 1

我需要一个替代方法来检查给定的单词是否存在于句子(字符串)中,而不是使用strings.contains()

这里有一个使用简单算法的解决方案,可以区分“can”、“cannot”和“can't”。

  1. package main
  2. import (
  3. "fmt"
  4. "strings"
  5. "unicode"
  6. )
  7. func newFilter(words []string) map[string]struct{} {
  8. filter := make(map[string]struct{}, len(words))
  9. for _, word := range words {
  10. word = strings.TrimSpace(word)
  11. word = strings.ToLower(word)
  12. if len(word) > 0 {
  13. filter[word] = struct{}{}
  14. }
  15. }
  16. return filter
  17. }
  18. func applyFilter(text string, filter map[string]struct{}) bool {
  19. const (
  20. rApostrophe = '\u0027'
  21. sApostrophe = string(rApostrophe)
  22. sApostropheS = string(rApostrophe) + "s"
  23. rSoftHyphen = '\u00AD'
  24. sSoftHyphen = string(rSoftHyphen)
  25. sHyphenLF = "-\n"
  26. sHyphenCRLF = "-\r\n"
  27. )
  28. split := func(r rune) bool {
  29. return !unicode.IsLetter(r) && r != rApostrophe
  30. }
  31. text = strings.ToLower(text)
  32. if strings.Contains(text, sSoftHyphen) {
  33. text = strings.ReplaceAll(text, sSoftHyphen, "")
  34. }
  35. if strings.Contains(text, sHyphenLF) {
  36. text = strings.ReplaceAll(text, sHyphenLF, "")
  37. } else if strings.Contains(text, sHyphenCRLF) {
  38. text = strings.ReplaceAll(text, sHyphenCRLF, "")
  39. }
  40. words := strings.FieldsFunc(text, split)
  41. for _, word := range words {
  42. if strings.HasSuffix(word, sApostrophe) {
  43. word = word[:len(word)-len(sApostrophe)]
  44. } else if strings.HasSuffix(word, sApostropheS) {
  45. word = word[:len(word)-len(sApostropheS)]
  46. }
  47. if _, ok := filter[word]; ok {
  48. return true
  49. }
  50. }
  51. return false
  52. }
  53. func main() {
  54. filter := newFilter([]string{"can"})
  55. text := "I can run fast"
  56. fmt.Println(applyFilter(text, filter))
  57. text = "I cannot run fast"
  58. fmt.Println(applyFilter(text, filter))
  59. text = "I can-\nnot run fast"
  60. fmt.Println(applyFilter(text, filter))
  61. text = "I can't run fast"
  62. fmt.Println(applyFilter(text, filter))
  63. filter = newFilter([]string{"cannot", "can't"})
  64. text = "I can run fast"
  65. fmt.Println(applyFilter(text, filter))
  66. text = "I cannot run fast"
  67. fmt.Println(applyFilter(text, filter))
  68. text = "I can-\nnot run fast"
  69. fmt.Println(applyFilter(text, filter))
  70. text = "I can't run fast"
  71. fmt.Println(applyFilter(text, filter))
  72. }

你可以在这里运行代码:https://go.dev/play/p/sQpTt5JY8Qt

英文:

> I need an alternative method instead of strings.contains() to to check
> whether the given word exists in a sentence(string).
>
> I'm trying to implement a filter for given set of words.

Here's a solution which uses simple algorithms for a word. The algorithms can distinguish between "can", "cannot", and "can't".

  1. package main
  2. import (
  3. "fmt"
  4. "strings"
  5. "unicode"
  6. )
  7. func newFilter(words []string) map[string]struct{} {
  8. filter := make(map[string]struct{}, len(words))
  9. for _, word := range words {
  10. word = strings.TrimSpace(word)
  11. word = strings.ToLower(word)
  12. if len(word) > 0 {
  13. filter[word] = struct{}{}
  14. }
  15. }
  16. return filter
  17. }
  18. func applyFilter(text string, filter map[string]struct{}) bool {
  19. const (
  20. rApostrophe = '\u0027'
  21. sApostrophe = string(rApostrophe)
  22. sApostropheS = string(rApostrophe) + "s"
  23. rSoftHyphen = '\u00AD'
  24. sSoftHyphen = string(rSoftHyphen)
  25. sHyphenLF = "-\n"
  26. sHyphenCRLF = "-\r\n"
  27. )
  28. split := func(r rune) bool {
  29. return !unicode.IsLetter(r) && r != rApostrophe
  30. }
  31. text = strings.ToLower(text)
  32. if strings.Contains(text, sSoftHyphen) {
  33. text = strings.ReplaceAll(text, sSoftHyphen, "")
  34. }
  35. if strings.Contains(text, sHyphenLF) {
  36. text = strings.ReplaceAll(text, sHyphenLF, "")
  37. } else if strings.Contains(text, sHyphenCRLF) {
  38. text = strings.ReplaceAll(text, sHyphenCRLF, "")
  39. }
  40. words := strings.FieldsFunc(text, split)
  41. for _, word := range words {
  42. if strings.HasSuffix(word, sApostrophe) {
  43. word = word[:len(word)-len(sApostrophe)]
  44. } else if strings.HasSuffix(word, sApostropheS) {
  45. word = word[:len(word)-len(sApostropheS)]
  46. }
  47. if _, ok := filter[word]; ok {
  48. return true
  49. }
  50. }
  51. return false
  52. }
  53. func main() {
  54. filter := newFilter([]string{"can"})
  55. text := "I can run fast"
  56. fmt.Println(applyFilter(text, filter))
  57. text = "I cannot run fast"
  58. fmt.Println(applyFilter(text, filter))
  59. text = "I can-\nnot run fast"
  60. fmt.Println(applyFilter(text, filter))
  61. text = "I can't run fast"
  62. fmt.Println(applyFilter(text, filter))
  63. filter = newFilter([]string{"cannot", "can't"})
  64. text = "I can run fast"
  65. fmt.Println(applyFilter(text, filter))
  66. text = "I cannot run fast"
  67. fmt.Println(applyFilter(text, filter))
  68. text = "I can-\nnot run fast"
  69. fmt.Println(applyFilter(text, filter))
  70. text = "I can't run fast"
  71. fmt.Println(applyFilter(text, filter))
  72. }

https://go.dev/play/p/sQpTt5JY8Qt

huangapple
  • 本文由 发表于 2022年3月7日 14:45:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/71377313.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定