norm包:如何将分开的字符组合起来?

huangapple go评论96阅读模式
英文:

norm package: how do I combine separate characters?

问题

我期望得到的是最后两个输出是"밥"和"좋은"。

但是这个代码并没有"组合"任何内容。

这个包并没有做任何事情。

有人能告诉我在这段代码中我做错了什么吗?我会非常感激。

import "code.google.com/p/go.text/unicode/norm"
import "fmt"

func main() {
  str := "ㅈㅗㅎㅇㅡㄴ"
  fmt.Println( string( norm.NFD.AppendString(nil, "앉") ) )
  fmt.Println( string( norm.NFC.AppendString(nil, "바ㅂ") ) )
  fmt.Println( string( norm.NFC.AppendString(nil, str) ) )
}

这个包可以从这里获取:

go get -u code.google.com/p/go.text/unicode/norm
http://godoc.org/code.google.com/p/go.text/unicode/norm
英文:

I was expecting to get 밥 and 좋은, for the last two output.

for the output but This does not "combine" any of the following.

This package does not do anything.

Could anybody tell what I did wrong in this code? I would greatly appreciate it.

import "code.google.com/p/go.text/unicode/norm"
import "fmt"

func main() {
  str := "ㅈㅗㅎㅇㅡㄴ"
  fmt.Println( string( norm.NFD.AppendString(nil, "앉") ) )
  fmt.Println( string( norm.NFC.AppendString(nil, "바ㅂ") ) )
  fmt.Println( string( norm.NFC.AppendString(nil, str) ) )
}

The package is from here

go get -u code.google.com/p/go.text/unicode/norm
http://godoc.org/code.google.com/p/go.text/unicode/norm

答案1

得分: 4

是的,它确实有作用。如果你观察第一个操作的输出:

fmt.Println( string( norm.NFD.AppendString(nil, "앉") ) )

你会发现它成功地对你的字符串进行了分解,用三个代码点代替了原来的 字符。第一个代码点是

U+110B (HANGUL CHOSEONG IEUNG)

虽然在可见的范围内,它与你的 str 变量中的 字符不同:

U+3147 (HANGUL LETTER IEUNG)

如果你将从 NFD 得到的字符组合起来,你确实会得到

编辑

你的 str 变量中的字母使用的是Hangul Compatibility Jamo字符,这些字符只是为了向后兼容而存在,缺乏语义属性。如果你想让它起作用,你应该使用Hangul Jamo块。

英文:

Yes, it does something. If you observe the output from your first operation:

fmt.Println( string( norm.NFD.AppendString(nil, "앉") ) )

You can see that it has successfully made a decomposition of your string, returning three code points in place of your original character. The first being :

U+110B (HANGUL CHOSEONG IEUNG)

While not visibly, this differs from the character in your str variable:

U+3147 (HANGUL LETTER IEUNG)

If you would make a composition of the characters you get as an output from the NFD, you would indeed end up with 앉 again.

EDIT

The letters in your str variable uses Hangul Compatibility Jamo characters which are only meant for backwards compatibility, but lacks semantic properties. If you want it to work, you should use the Hangul Jamo block instead.

huangapple
  • 本文由 发表于 2013年11月7日 21:30:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/19837256.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定