Golang复数折叠大小

huangapple go评论86阅读模式
英文:

Golang complex fold grüßen

问题

我正在尝试使三种语言(C++、Python和Golang)之间的大小写折叠一致,因为我需要能够检查字符串是否与保存的字符串匹配,无论使用哪种语言。

一个例子是德语单词"grüßen",其大写形式为"GRÜSSEN"(注意,"ß"变成了两个字符"SS")。

  • C++使用boost::locale可以很好地实现文本转换。文本转换文档
  • Python 3也可以通过str.casefold()实现。casefold文档
  • 然而,Golang似乎没有合适的方法来进行正确的大小写折叠。Golang示例

我是否遗漏了某种方法来实现这一点,或者unicode文档末尾提到的bug是否适用于Golang中所有的文本转换用法?如果是这样,除了使用cgo编写之外,我还有哪些选项可以进行大小写折叠?

英文:

I'm trying to get case folding to be consistent between three languages (C++, Python and Golang) because I need to be able to check if a string matches the one saved no matter the language.

An example problematic word is the German word "grüßen" which in uppercase is "GRÜSSEN" (Note the 'ß' becomes two characters as 'SS').

Is there some way to do this that I'm missing, or does this bug at the end of unicode's documentation apply to all usages of text conversion in golang? If so, what are my options for case folding other than writing it in cgo?

1: http://www.boost.org/doc/libs/1_63_0/libs/locale/doc/html/conversions.html "boost locale grüßen"
2: https://docs.python.org/3/library/stdtypes.html#str.casefold
3: https://play.golang.org/p/eYku0fCIpu
4: https://golang.org/pkg/unicode/#pkg-note-BUG

答案1

得分: 10

高级(支持Unicode)文本处理不是Go标准库的一部分,而是以许多(“受保护的”)第三方包的形式存在于golang.org/x/text/下的伞下。

正如Shawn自己发现的那样,可以这样做:

import (
  "golang.org/x/text/cases"
)

c := cases.Fold()
c.String("grüßen")

就可以得到"grüssen"。

这是因为无论在标准库中发货的是什么,都受到Go 1兼容性承诺的约束,在Go 1发布时,某些功能不可用或不完整,或者其API处于不稳定状态等等,因此这些部分被排除在核心之外,以便让它们成熟。

英文:

Advanced (Unicode-enabled) text processing is not part of the Go stdlib,¹
and exists in the form of a host of ("blessed") third-party packages
under the golang.org/x/text/ umbrella.

As Shawn figured out by himself, one can do

import (
  "golang.org/x/text/cases"
)

c := cases.Fold()
c.String("grüßen")

to get "grüssen" back.


¹ That's because whatever is shipped in the stdlib is subject to the
Go 1 compatibility promise,
and at the time Go 1 was shipped certain functionality wasn't available
or was incomplete or its APIs were in flux etc, so such bits were kept out
of the core to let them mature.

huangapple
  • 本文由 发表于 2017年3月28日 10:59:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/43059909.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定