在Go语言中进行不区分大小写的字符串搜索。

huangapple go评论136阅读模式
英文:

Case insensitive string search in golang

问题

如何以不区分大小写的方式在文件中搜索单词?

例如

如果我在文件中搜索UpdaTe,如果文件中包含update,搜索应该将其选中并计为匹配项。

英文:

How do I search through a file for a word in a case insensitive manner?

For example

If I'm searching for UpdaTe in the file, if the file contains update, the search should pick it and count it as a match.

答案1

得分: 70

strings.EqualFold()函数可以检查两个字符串是否相等,而忽略大小写。它甚至可以处理Unicode字符。更多信息请参考http://golang.org/pkg/strings/#EqualFold。

package main

import (
	"fmt"
	"strings"
)

func main() {
	fmt.Println(strings.EqualFold("HELLO", "hello"))
	fmt.Println(strings.EqualFold("ÑOÑO", "ñoño"))
}

两者都返回true。

英文:

strings.EqualFold() can check if two strings are equal, while ignoring case. It even works with Unicode. See http://golang.org/pkg/strings/#EqualFold for more info.

http://play.golang.org/p/KDdIi8c3Ar

package main

import (
	"fmt"
	"strings"
)

func main() {
	fmt.Println(strings.EqualFold("HELLO", "hello"))
	fmt.Println(strings.EqualFold("ÑOÑO", "ñoño"))
}

Both return true.

答案2

得分: 18

你的问题中重要的部分应该是搜索,而不是关于从文件中读取的部分,所以我只回答搜索部分。

可能最简单的方法是将要搜索的字符串和要搜索的子字符串都转换为全大写或全小写,然后进行搜索。例如:

func CaseInsensitiveContains(s, substr string) bool {
    s, substr = strings.ToUpper(s), strings.ToUpper(substr)
    return strings.Contains(s, substr)
}

你可以在这里看到它的实际应用:链接

英文:

Presumably the important part of your question is the search, not the part about reading from a file, so I'll just answer that part.

Probably the simplest way to do this is to convert both strings (the one you're searching through and the one that you're searching for) to all upper case or all lower case, and then search. For example:

func CaseInsensitiveContains(s, substr string) bool {
    s, substr = strings.ToUpper(s), strings.ToUpper(substr)
    return strings.Contains(s, substr)
}

You can see it in action here.

答案3

得分: 13

请稍等,我会尽快为您翻译。

英文:

Do not use strings.Contains unless you need exact matching rather than language-correct string searches

None of the current answers are correct unless you are only searching <strike>ASCII characters</strike> the minority of languages (like english) without certain diaeresis / umlauts or other unicode glyph modifiers (the more "correct" way to define it as mentioned by @snap). The standard google phrase is "searching non-ASCII characters".

For proper support for language searching you need to use http://golang.org/x/text/search.

func SearchForString(str string, substr string) (int, int) {
    m := search.New(language.English, search.IgnoreCase)
    return = m.IndexString(str, substr)
}

start, end := SearchForString(&#39;foobar&#39;, &#39;bar&#39;);
if start != -1 &amp;&amp; end != -1 {
    fmt.Println(&quot;found at&quot;, start, end);
}

Or if you just want the starting index:

func SearchForStringIndex(str string, substr string) (int, bool) {
    m := search.New(language.English, search.IgnoreCase)
    start, _ := m.IndexString(str, substr)
    if start == -1 {
        return 0, false
    }
    return start, true
}

index, found := SearchForStringIndex(&#39;foobar&#39;, &#39;bar&#39;);
if found {
    fmt.Println(&quot;match starts at&quot;, index);
}

Search the language.Tag structs here to find the language you wish to search with or use language.Und if you are not sure.

Update

There seems to be some confusion so this following example should help clarify things.

package main

import (
	&quot;fmt&quot;
	&quot;strings&quot;

	&quot;golang.org/x/text/language&quot;
	&quot;golang.org/x/text/search&quot;
)

var s = `&#198;`
var s2 = `&#196;`

func main() {
	m := search.New(language.Finnish, search.IgnoreDiacritics)
	fmt.Println(m.IndexString(s, s2))
	fmt.Println(CaseInsensitiveContains(s, s2))
}

// CaseInsensitiveContains in string
func CaseInsensitiveContains(s, substr string) bool {
	s, substr = strings.ToUpper(s), strings.ToUpper(substr)
	return strings.Contains(s, substr)
}

答案4

得分: 9

如果您的文件很大,您可以使用regexp和bufio:

//创建一个正则表达式(?i)update,将匹配包含"update"的字符串,不区分大小写
reg := regexp.MustCompile((?i)update)
f, err := os.Open("test.txt")
if err != nil {
log.Fatal(err)
}
defer f.Close()

//进行匹配操作
//MatchReader函数将逐字节扫描整个文件,直到找到匹配项
//在这里使用bufio可以避免将整个文件加载到内存中
println(reg.MatchReader(bufio.NewReader(f)))

关于bufio

bufio包实现了一个带缓冲的读取器,它可能对于许多小读取的效率很高,并且提供了额外的读取方法,这可能会很有用。

英文:

If your file is large, you can use regexp and bufio:

//create a regex `(?i)update` will match string contains &quot;update&quot; case insensitive
reg := regexp.MustCompile(&quot;(?i)update&quot;)
f, err := os.Open(&quot;test.txt&quot;)
if err != nil {
	log.Fatal(err)
}
defer f.Close()

//Do the match operation
//MatchReader function will scan entire file byte by byte until find the match
//use bufio here avoid load enter file into memory
println(reg.MatchReader(bufio.NewReader(f)))

About bufio
> The bufio package implements a buffered reader that may be useful both
> for its efficiency with many small reads and because of the additional
> reading methods it provides.

huangapple
  • 本文由 发表于 2014年7月19日 10:00:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/24836044.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定