问题

以下是要翻译的内容：

t := "👾️"
fmt.Println(utf8.RuneCountInString(t))

我认为打印计数等于1更好。为什么它返回2？

英文:

enter image description here

t := &quot;&#127542;️&quot;
fmt.Println(utf8.RuneCountInString(t))

i think print count == 1 is better. why it return 2

答案1

得分: 2

字符🈶️由2个代码点（U+1F236和U+FE0F）表示。"代码点"这个词有点啰嗦，所以Go引入了一个更短的术语来表示这个概念：rune。utf8.RuneCountInString返回runes的数量为2，并且按预期工作。

如果你想计算字符的数量，可以尝试使用github.com/rivo/uniseg包。

下面的示例代码应该能更好地解释它：

package main

import (
	"fmt"
	"unicode/utf8"

	"github.com/rivo/uniseg"
)

func main() {
	s1 := "&#127542;️"                           // UTF-8输入文本
	s2 := "\U0001f236\ufe0f"             // &lt;== 显式的Unicode代码点
	s3 := "\xf0\x9f\x88\xb6\xef\xb8\x8f" // 显式的UTF-8字节
	fmt.Println("s1:", s1)
	fmt.Println("s1 == s2:", s1 == s2)
	fmt.Println("s2 == s3:", s2 == s3)
	fmt.Println("len(s1):", len(s1), "bytes")
	fmt.Println("runes:")
	for pos, r := range s1 {
		fmt.Printf("  %d: %X\n", pos, r)
	}
	fmt.Println("utf8.RuneCount:", utf8.RuneCount([]byte(s1)))
	fmt.Println("utf8.RuneCountInString:", utf8.RuneCountInString(s1))

	// GraphemeClusterCount返回给定字符串的用户感知字符（图形簇）的数量。
	fmt.Println("uniseg.GraphemeClusterCount:", uniseg.GraphemeClusterCount(s1))
}

输出：

s1: &#127542;️
s1 == s2: true
s2 == s3: true
len(s1): 7 bytes
runes:
  0: 1F236
  4: FE0F
utf8.RuneCount: 2
utf8.RuneCountInString: 2
uniseg.GraphemeClusterCount: 1

参考资料：

Rob Pike的优秀文章Go中的字符串、字节、runes和字符。
Go编程语言规范中的"字符串字面量"部分。
Henrique Vicente的博文：Go中的UTF-8字符串：len(s)不够用。

英文:

The character 🈶️ is represented by 2 code points (U+1F236 and U+FE0F). “Code point” is a bit of a mouthful, so Go introduces a shorter term for the concept: rune. utf8.RuneCountInString returns the number of runes 2 and works as expected.

Try the package github.com/rivo/uniseg if you want to count the number of characters.

The demo below should explain it better:

package main

import (
	&quot;fmt&quot;
	&quot;unicode/utf8&quot;

	&quot;github.com/rivo/uniseg&quot;
)

func main() {
	s1 := &quot;&#127542;️&quot;                           // UTF-8 input text
	s2 := &quot;\U0001f236\ufe0f&quot;             // &lt;== the explicit Unicode code points
	s3 := &quot;\xf0\x9f\x88\xb6\xef\xb8\x8f&quot; // the explicit UTF-8 bytes
	fmt.Println(&quot;s1:&quot;, s1)
	fmt.Println(&quot;s1 == s2:&quot;, s1 == s2)
	fmt.Println(&quot;s2 == s3:&quot;, s2 == s3)
	fmt.Println(&quot;len(s1):&quot;, len(s1), &quot;bytes&quot;)
	fmt.Println(&quot;runes:&quot;)
	for pos, r := range s1 {
		fmt.Printf(&quot;  %d: %X\n&quot;, pos, r)
	}
	fmt.Println(&quot;utf8.RuneCount:&quot;, utf8.RuneCount([]byte(s1)))
	fmt.Println(&quot;utf8.RuneCountInString:&quot;, utf8.RuneCountInString(s1))

	// GraphemeClusterCount returns the number of user-perceived characters
	// (grapheme clusters) for the given string.
	fmt.Println(&quot;uniseg.GraphemeClusterCount:&quot;, uniseg.GraphemeClusterCount(s1))
}

Output:

s1: &#127542;️
s1 == s2: true
s2 == s3: true
len(s1): 7 bytes
runes:
  0: 1F236
  4: FE0F
utf8.RuneCount: 2
utf8.RuneCountInString: 2
uniseg.GraphemeClusterCount: 1

References:

Rob Pike's excellent article Strings, bytes, runes and characters in Go.
The "String literals" section in The Go Programming Language Specification.
Henrique Vicente's blog post: UTF-8 strings with Go: len(s) isn't enough.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

当字符包含「U+FE0F」时，RuneCountInString函数返回无效计数。

问题

答案1

隐式私有函数导入？

你可以通过Asterisk管理接口事件来获取活动呼叫的数量。

更新数据存储实体 – 更改祖先而不更改实体键

同时选择一个发送和接收通道

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论