2013年11月5日 05:57:13go评论134阅读模式

英文:

Do I need unicode to identify different writing system

问题

无论是否最优，我正在尝试使用十六进制代码来识别特定字符。（有没有更好的方法来识别字母、阿拉伯、中文或日文字符？）

http://play.golang.org/p/b81_rgXr3G

   fmt.Printf("%x \n", "가") //eab080
   fmt.Printf("%x \n", "ㅎ") //e3858e

所以在韩文中，eab080 < e3858e

那么我的问题是，我们是否有任何表格或图表来表示每种语言的十六进制边界？

我的意思是，对于英语

 fmt.Printf("%x \n", "A") //41
 fmt.Printf("%x \n", "z") //7a

然后 41 < 7a

如上所示，字母应该被限制在41和7a之间。
我正在尝试对另一种非字母的书写系统进行相同的操作。

我需要使用Unicode来识别不同的书写系统吗？Unicode标准库似乎只提供对英文字母的编码和解码。

提前感谢。

英文:

Whether it is optimal or not, I am trying to identify specific characters using its hexadecimal code. (Is there better way to identify alphabets, Arabic, Chinese, or Japanese characters?)

http://play.golang.org/p/b81_rgXr3G

   fmt.Printf(&quot;%x \n&quot;, &quot;가&quot;) //eab080
   fmt.Printf(&quot;%x \n&quot;, &quot;ㅎ&quot;) //e3858e

So it is true that in Korean
eab080 < e3858e

Then my question is
do we have any table or chart for each language's hexadecimal boundary?

I mean, for English

 fmt.Printf(&quot;%x \n&quot;, &quot;A&quot;) //41
 fmt.Printf(&quot;%x \n&quot;, &quot;z&quot;) //7a

Then 41 < 7a

As you see above, the alphabet is to be bounded between 41 and 7a.
I am trying out the same thing for another writing system that is not in alphabet.

Do I need unicode to identify different writing system? The unicode standard library seems only to provide encode and decode English alphabets.

Thanks in advance.

答案1

得分: 3

不，我们没有每种语言的十六进制边界的表格或图表。有一些关于各种语言中通常使用的字符的数据。

这回答了你提出的问题，但你应该考虑这是否是你真正的问题。该问题将书写系统、字母表和语言视为一个整体，但它们是不同的概念。你应该明确你的实际问题：你真正需要什么信息？在某种语言的文本中，任何 Unicode 字符都可能出现。

顺便说一下，英语（至少在某些形式的语言中）也有像 fiancé、coöperation、rôle、anæmia、belovèd 等单词。

英文:

No, we do not have any table or chart for each language’s hexadecimal boundary. There is some data about characters typically used in various languages.

This answers the question asked, but you should consider whether that was your real problem. The question refers to writing systems, alphabets, and languages as if they were one thing; they are separate concepts. You should define your practical problem: what information do you really need? In a text in some language, any Unicode character may appear.

By the way, English has (at least in some forms of the language) also words like fiancé, coöoperation, rôle, anæmia, belovèd, etc.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

需要使用Unicode来识别不同的书写系统吗？

问题

答案1

阿尔派（Alpine）Docker镜像__isnan：找不到符号

VScode golang调试错误 __debug_bin: 权限被拒绝

Coinbase的“发送资金”API返回HTML BAD REQUEST响应（Go）

Indentation in Go: tabs or spaces?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。