英文:
How to check value of character in golang with UTF-8 strings?
问题
我正在尝试检查字符串的第一个字符是否与以下字符匹配,注意UTF-8引号字符:
c := t.Content[0]
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
由于最后两个检查中的特殊字符,这段代码无法正常工作。
正确的方法是什么?
英文:
I'm attempting to check if the first character in a string matches the following, note the UTF-8 quote characters:
c := t.Content[0]
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”'{
This code does not work due to the special characters in the last two checks.
What is the correct way to do this?
答案1
得分: 8
索引一个字符串会索引它的字节(以UTF-8编码方式,这是Go在内存中存储字符串的方式),但你想要测试第一个字符。
所以你应该获取第一个rune而不是第一个字节。为了提高效率,你可以使用utf8.DecodeRuneInString()
,它只解码第一个rune。如果你需要字符串的所有runes,你可以使用类型转换,比如all := []rune("I'm a string")
。
看下面的例子:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
c, _ := utf8.DecodeRuneInString(s)
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
}
输出结果(在Go Playground上尝试):
Ok: asdf
Not ok: .asdf
Not ok: ”asdf
英文:
Indexing a string
indexes its bytes (in UTF-8 encoding - this is how Go stores strings in memory), but you want to test the first character.
So you should get the first rune
and not the first byte
. For efficiency you may use utf8.DecodeRuneInString()
which only decodes the first rune
. If you need all the runes of the string
, you may use type conversion like all := []rune("I'm a string")
.
See this example:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
c, _ := utf8.DecodeRuneInString(s)
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
}
Output (try it on the Go Playground):
Ok: asdf
Not ok: .asdf
Not ok: ”asdf
答案2
得分: 5
添加到@icza的很好的答案中:值得注意的是,虽然字符串的索引是按字节计算的,但字符串的范围是按字符计算的。因此,以下代码也可以工作:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
for _, c := range s {
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
break // 无论如何,我们都在第一个字符后面中断循环
}
}
请注意,这段代码用于检查字符串中的第一个字符是否为特定字符('.'、','、'?'、'“'、'”')。如果是这些字符之一,它将打印"Not ok:",否则将打印"Ok:"。
英文:
Adding to @icza's great answer: It's worth noting that while indexing of strings is in bytes, range
of strings is in runes. So the following also works:
for _, s := range []string{"asdf", ".asdf", "”asdf"} {
for _, c := range s {
if c != '.' && c != ',' && c != '?' && c != '“' && c != '”' {
fmt.Println("Ok:", s)
} else {
fmt.Println("Not ok:", s)
}
break // we break after the first character regardless
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论