如何在Golang中使用UTF-8字符串检查字符的值?

huangapple go评论114阅读模式
英文:

How to check value of character in golang with UTF-8 strings?

问题

我正在尝试检查字符串的第一个字符是否与以下字符匹配,注意UTF-8引号字符:

c := t.Content[0]
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {

由于最后两个检查中的特殊字符,这段代码无法正常工作。

正确的方法是什么?

英文:

I'm attempting to check if the first character in a string matches the following, note the UTF-8 quote characters:

c := t.Content[0]
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”'{

This code does not work due to the special characters in the last two checks.

What is the correct way to do this?

答案1

得分: 8

索引一个字符串会索引它的字节(以UTF-8编码方式,这是Go在内存中存储字符串的方式),但你想要测试第一个字符。

所以你应该获取第一个rune而不是第一个字节。为了提高效率,你可以使用utf8.DecodeRuneInString(),它只解码第一个rune。如果你需要字符串的所有runes,你可以使用类型转换,比如all := []rune("I'm a string")

看下面的例子:

for _, s := range []string{"asdf", ".asdf", "”asdf"} {
    c, _ := utf8.DecodeRuneInString(s)
    if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
        fmt.Println("Ok:", s)
    } else {
        fmt.Println("Not ok:", s)
    }
}

输出结果(在Go Playground上尝试):

Ok: asdf
Not ok: .asdf
Not ok: ”asdf
英文:

Indexing a string indexes its bytes (in UTF-8 encoding - this is how Go stores strings in memory), but you want to test the first character.

So you should get the first rune and not the first byte. For efficiency you may use utf8.DecodeRuneInString() which only decodes the first rune. If you need all the runes of the string, you may use type conversion like all := []rune("I'm a string").

See this example:

for _, s := range []string{"asdf", ".asdf", "”asdf"} {
	c, _ := utf8.DecodeRuneInString(s)
	if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
		fmt.Println("Ok:", s)
	} else {
		fmt.Println("Not ok:", s)
	}
}

Output (try it on the Go Playground):

Ok: asdf
Not ok: .asdf
Not ok: ”asdf

答案2

得分: 5

添加到@icza的很好的答案中:值得注意的是,虽然字符串的索引是按字节计算的,但字符串的范围是按字符计算的。因此,以下代码也可以工作:

for _, s := range []string{"asdf", ".asdf", "”asdf"} {
    for _, c := range s {
        if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
            fmt.Println("Ok:", s)
        } else {
            fmt.Println("Not ok:", s)
        }
        break // 无论如何,我们都在第一个字符后面中断循环
    }
}

请注意,这段代码用于检查字符串中的第一个字符是否为特定字符('.'、','、'?'、'“'、'”')。如果是这些字符之一,它将打印"Not ok:",否则将打印"Ok:"。

英文:

Adding to @icza's great answer: It's worth noting that while indexing of strings is in bytes, range of strings is in runes. So the following also works:

for _, s := range []string{"asdf", ".asdf", "”asdf"} {
	for _, c := range s {
		if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
			fmt.Println("Ok:", s)
		} else {
			fmt.Println("Not ok:", s)
		}
		break // we break after the first character regardless
	}
}

huangapple
  • 本文由 发表于 2016年4月21日 14:58:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/36761962.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定