需要使用Unicode来识别不同的书写系统吗?

huangapple go评论85阅读模式
英文:

Do I need unicode to identify different writing system

问题

无论是否最优,我正在尝试使用十六进制代码来识别特定字符。(有没有更好的方法来识别字母、阿拉伯、中文或日文字符?)

http://play.golang.org/p/b81_rgXr3G

   fmt.Printf("%x \n", "가") //eab080
   fmt.Printf("%x \n", "ㅎ") //e3858e

所以在韩文中,eab080 < e3858e

那么我的问题是,我们是否有任何表格或图表来表示每种语言的十六进制边界?

我的意思是,对于英语

 fmt.Printf("%x \n", "A") //41
 fmt.Printf("%x \n", "z") //7a

然后 41 < 7a

如上所示,字母应该被限制在41和7a之间。
我正在尝试对另一种非字母的书写系统进行相同的操作。

我需要使用Unicode来识别不同的书写系统吗?Unicode标准库似乎只提供对英文字母的编码和解码。

提前感谢。

英文:

Whether it is optimal or not, I am trying to identify specific characters using its hexadecimal code. (Is there better way to identify alphabets, Arabic, Chinese, or Japanese characters?)

http://play.golang.org/p/b81_rgXr3G

   fmt.Printf(&quot;%x \n&quot;, &quot;가&quot;) //eab080
   fmt.Printf(&quot;%x \n&quot;, &quot;ㅎ&quot;) //e3858e

So it is true that in Korean
eab080 < e3858e

Then my question is
do we have any table or chart for each language's hexadecimal boundary?

I mean, for English

 fmt.Printf(&quot;%x \n&quot;, &quot;A&quot;) //41
 fmt.Printf(&quot;%x \n&quot;, &quot;z&quot;) //7a

Then 41 < 7a

As you see above, the alphabet is to be bounded between 41 and 7a.
I am trying out the same thing for another writing system that is not in alphabet.

Do I need unicode to identify different writing system? The unicode standard library seems only to provide encode and decode English alphabets.

Thanks in advance.

答案1

得分: 3

不,我们没有每种语言的十六进制边界的表格或图表。有一些关于各种语言中通常使用的字符的数据。

这回答了你提出的问题,但你应该考虑这是否是你真正的问题。该问题将书写系统、字母表和语言视为一个整体,但它们是不同的概念。你应该明确你的实际问题:你真正需要什么信息?在某种语言的文本中,任何 Unicode 字符都可能出现。

顺便说一下,英语(至少在某些形式的语言中)也有像 fiancé、coöperation、rôle、anæmia、belovèd 等单词。

英文:

No, we do not have any table or chart for each language’s hexadecimal boundary. There is some data about characters typically used in various languages.

This answers the question asked, but you should consider whether that was your real problem. The question refers to writing systems, alphabets, and languages as if they were one thing; they are separate concepts. You should define your practical problem: what information do you really need? In a text in some language, any Unicode character may appear.

By the way, English has (at least in some forms of the language) also words like fiancé, coöoperation, rôle, anæmia, belovèd, etc.

huangapple
  • 本文由 发表于 2013年11月5日 05:57:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/19777979.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定