英文:
UTF-8 range table in Go
问题
我一直在阅读 Unicode Go 页面,并想知道 range tables 的用例是什么。它们可以用来做什么?是否有一种函数可以获取单个字符所在的范围?
英文:
I have been reading the unicode Go page and I'm wondering what the use case of the range tables are. What can they be used for? Is there a function to retrieve the range that a single character can be found in.
答案1
得分: 3
范围表的目的是以一种高效的方式描述一组字符。由于字符是按照 Unicode 标准添加的方式,具有相似属性的字符通常会被放在一起。因此,通常更节省空间的做法是列出特定字符集存在的范围,而不是列出每个单独的字符。
这样可以通过执行一系列范围检查来查找给定字符是否存在于特定字符集中。如果字符的 Unicode 代码点在范围表中的任何范围内,那么该字符被认为是范围表描述的字符集的元素。
没有通用的函数可以检索单个字符所在的范围,因为字符 -> 范围
在一般情况下不是唯一的,也不是特别有用的关系。例如,以字母A
为例。它存在于范围[65, 90]
(ASCII 大写字母),但它也存在于范围[0, 127]
(所有 ASCII 字符)以及范围[9, 9999]
、[60, 70]
等等。
如果你想知道一个字符是否在特定的范围表集合中,你可以使用unicode.In
函数。
package main
import (
"fmt"
"unicode"
)
func main() {
found := unicode.In('A', unicode.Latin)
fmt.Println(found)
}
true
这将检查A
是否存在于给定的范围表unicode.Latin
中,或者说“Unicode 中属于拉丁文字符集的字符”。
英文:
The purpose of a range table is that it is an efficient way to describe a set of characters. Due to the way that characters are added to the Unicode standard, characters with similar properties will often be found together. So, it's usually more space-efficient to list the ranges where a specific set of characters exist, rather than listing every individual character.
This allows you to look up if a given character exists within a specific character set by performing a series of range checks. If the character's Unicode code point is within any of the ranges in the range table, then that character is considered to be an element of the character set that the range table describes.
There isn't a general function to retrieve the range that a single character can be found in, because character -> range
isn't a unique, or particularly useful relationship in the general case. For example, take the letter A
. It exists in the range [65, 90]
(ASCII uppercase letters), but it also exists in the range [0, 127]
(all ASCII characters), and the ranges [9, 9999]
, [60, 70]
, etc..
If you want to know if a character is in a particular set of range tables, you can use the unicode.In
function.
package main
import (
"fmt"
"unicode"
)
func main() {
found := unicode.In('A', unicode.Latin)
fmt.Println(found)
}
true
This checks if A
exists within any of the given range table unicode.Latin
, or "the set of Unicode characters in script Latin"
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论