如何查看Unicode类别中的所有字符?

huangapple go评论84阅读模式
英文:

How can I see all characters in a unicode category?

问题

我已阅读文档,但找不到任何示例。

http://golang.org/pkg/unicode/#IsPunct

文档中是否有明确列出这些类别中的所有字符的地方?我想看看类别 P 或类别 M 中包含哪些字符。

英文:

I've read the documentation and can't find any examples.

http://golang.org/pkg/unicode/#IsPunct

Is there a place in the documentation that explicitly lists all characters in these categories? I'd like to see what characters are contained in category P or category M.

答案1

得分: 1

这不在文档中,但你仍然可以阅读源代码。你所说的类别在这个文件中定义:http://golang.org/src/pkg/unicode/tables.go

例如,P 类别是这样定义的:

2029	var _P = &RangeTable{
2030		R16: []Range16{
2031			{0x0021, 0x0023, 1},
2032			{0x0025, 0x002a, 1},
2033			{0x002c, 0x002f, 1},
2034			{0x003a, 0x003b, 1},
2035			{0x003f, 0x0040, 1},
2036			{0x005b, 0x005d, 1},
2037			{0x005f, 0x007b, 28},
                ...
2141			{0xff5d, 0xff5f, 2},
2142			{0xff60, 0xff65, 1},
2143		},
2144		R32: []Range32{
2145			{0x10100, 0x10102, 1},
2146			{0x1039f, 0x103d0, 49},
2147			{0x10857, 0x1091f, 200},
                ...
2157			{0x12470, 0x12473, 1},
2158		},
2159		LatinOffset: 11,
2160	}

以下是打印它们的简单方法:

var p = unicode.Punct.R16
for _, r := range p {
	for c := r.Lo; c <= r.Hi; c += r.Stride {
		fmt.Print(string(c))
	}
}
英文:

It's not in the documentation, but you can still read the source code. The categories you're talking about are defined in this file: http://golang.org/src/pkg/unicode/tables.go

For example, the P category is defined this way:

2029	var _P = &RangeTable{
2030		R16: []Range16{
2031			{0x0021, 0x0023, 1},
2032			{0x0025, 0x002a, 1},
2033			{0x002c, 0x002f, 1},
2034			{0x003a, 0x003b, 1},
2035			{0x003f, 0x0040, 1},
2036			{0x005b, 0x005d, 1},
2037			{0x005f, 0x007b, 28},
                ...
2141			{0xff5d, 0xff5f, 2},
2142			{0xff60, 0xff65, 1},
2143		},
2144		R32: []Range32{
2145			{0x10100, 0x10102, 1},
2146			{0x1039f, 0x103d0, 49},
2147			{0x10857, 0x1091f, 200},
                ...
2157			{0x12470, 0x12473, 1},
2158		},
2159		LatinOffset: 11,
2160	}

And here is a simple way to print all of them:

var p = unicode.Punct.R16
for _, r := range p {
	for c := r.Lo; c <= r.Hi; c += r.Stride {
		fmt.Print(string(c))
	}
}

答案2

得分: 0

有许多网站提供Unicode字符数据库的接口。例如,可以在http://www.fileformat.info/info/unicode/category/上查看“标点符号...”类别。

英文:

There are a number of web sites that present an interface to the Unicode character database. For example see the “Punctuation, ...” categories at http://www.fileformat.info/info/unicode/category/.

huangapple
  • 本文由 发表于 2014年12月9日 04:20:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/27366196.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定