2013年10月18日 00:33:52go评论110阅读模式

英文:

Umlauts and slices

问题

我在阅读一个具有固定列长度格式的文件时遇到了一些问题。有些列可能包含umlauts（德语中的变音符号）。

Umlauts似乎使用2个字节而不是一个字节。这不是我预期的行为。是否有任何返回子字符串的函数？在这种情况下，Slice似乎不起作用。

以下是一些示例代码：

package main

import (
	"fmt"
)

func main() {
	umlautsString := "Rhön"
	fmt.Println(len(umlautsString))
	fmt.Println(umlautsString[0:4])
}

输出结果：

5
Rhö

请注意，这里的输出结果是正确的。len(umlautsString)返回的是字符串的字节数，而不是字符数。由于ö是一个umlaut字符，它占用了2个字节。因此，Rhö实际上是字符串的前4个字节，而不是前3个字符。

英文:

I'm having some trouble while reading a file which has a fixed column length format. Some columns may contain umlauts.

Umlauts seem to use 2 bytes instead of one. This is not the behaviour I was expecting. Is there any kind of function which returns a substring? Slice does not seem to work in this case.

Here's some sample code:

http://play.golang.org/p/ZJ1axy7UXe

umlautsString := &quot;Rh&#246;n&quot;
fmt.Println(len(umlautsString))
fmt.Println(umlautsString[0:4])

Prints:

5
Rh&#246;

答案1

得分: 12

在Go语言中，字符串的切片是按字节计数的，而不是按rune计数。这就是为什么"Rhön"[0:3]会得到Rh和ö的第一个字节。

UTF-8编码的字符被表示为rune，因为UTF-8将字符编码为多个字节（最多四个字节），以提供更大范围的字符。

如果你想使用[]语法对字符串进行切片，请先将字符串转换为[]rune。示例（在play中查看）：

umlautsString := "Rhön"
runes := []rune(umlautsString)
fmt.Println(string(runes[0:3])) // Rhö

值得注意的是：这篇关于Go语言中字符串表示的博文。

英文:

In go, a slice of a string counts bytes, not runes. This is why "Rhön"[0:3] gives you Rh and the first byte of ö.

Characters encoded in UTF-8 are represented as runes because UTF-8 encodes characters in more than one
byte (up to four bytes) to provide a bigger range of characters.

If you want to slice a string with the [] syntax, convert the string to []rune before.
Example (on play):

umlautsString := &quot;Rh&#246;n&quot;
runes = []rune(umlautsString)
fmt.Println(string(runes[0:3])) // Rh&#246;

Noteworthy: This golang blog post about string representation in go.

答案2

得分: 3

你可以将string转换为[]rune并进行操作：

package main

import "fmt"

func main() {
  umlautsString := "Rhön"
  
  fmt.Println(len(umlautsString))
  
  subStrRunes:= []rune(umlautsString)
    
  fmt.Println(len(subStrRunes))
  
  fmt.Println(string(subStrRunes[0:4]))
}

希望对你有帮助！

英文:

You can convert string to []rune and work with it:

package main

import &quot;fmt&quot;

func main() {
  umlautsString := &quot;Rh&#246;n&quot;
  
  fmt.Println(len(umlautsString))
  
  subStrRunes:= []rune(umlautsString)
    
  fmt.Println(len(subStrRunes))
  
  fmt.Println(string(subStrRunes[0:4]))
}

http://play.golang.org/p/__WfitzMOJ

Hope that helps!

答案3

得分: 0

另一个选项是utf8string包：

package main
import "golang.org/x/exp/utf8string"

func main() {
   s := utf8string.NewString("🌹🌞🌚🌙🌜")
   // 示例 1
   n := s.RuneCount()
   println(n == 5)
   // 示例 2
   t := s.Slice(0, 2)
   println(t == "🌹🌞")
}

https://pkg.go.dev/golang.org/x/exp/utf8string

英文:

Another option is the utf8string package:

package main
import &quot;golang.org/x/exp/utf8string&quot;

func main() {
   s := utf8string.NewString(&quot;&#129505;&#128155;&#128154;&#128153;&#128156;&quot;)
   // example 1
   n := s.RuneCount()
   println(n == 5)
   // example 2
   t := s.Slice(0, 2)
   println(t == &quot;&#129505;&#128155;&quot;)
}

https://pkg.go.dev/golang.org/x/exp/utf8string

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Umlauts and slices（音译：Umlauts和slices）

问题

答案1

答案2

答案3

How can I detect a connection failure in gorm?

在Go语言中解析JSON中的JSON。

在Go语言中，是有uint64字面量的。

How can I create a file read/write permissions in /mnt/?

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论