英文:
How does Stride in unicode.RangeTable work?
问题
我想帮助你理解unicode包中的RangeTable。
使用这个(据说有帮助的)函数:
func printChars(ranges []unicode.Range16) {
for _, r := range ranges {
if r.Hi >= 0x80 { // 只显示ASCII字符
break
}
fmt.Println("Lo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
for c := r.Lo; c <= r.Hi; c++ {
fmt.Print(string(c) + " ")
}
}
fmt.Println()
}
对于数字,你可以使用printChars(unicode.Digit.R16)
,数字的序列对我来说是有意义的。
// Lo: 48 Hi: 57 Stride: 1
// 0 1 2 3 4 5 6 7 8 9
然而,对于标点符号printChars(unicode.Punct.R16)
的结果是:
// Lo: 33 Hi: 35 Stride: 1
// ! " #
// Lo: 37 Hi: 42 Stride: 1
// % & ' ( ) *
// Lo: 44 Hi: 47 Stride: 1
// , - . /
// Lo: 58 Hi: 59 Stride: 1
// : ;
// Lo: 63 Hi: 64 Stride: 1
// ? @
// Lo: 91 Hi: 93 Stride: 1
// [ \ ]
// Lo: 95 Hi: 123 Stride: 28
// _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {
我对小写字母也被包括在内感到惊讶。此外,"Stride"是什么意思?除了最后一个之外,它们都是1,但是hi-lo的差异是不同的。
另一个例子是printChars(unicode.Pe.R16)
。我认为这应该只给出结束标点符号:
- ) 右括号 (U+0029, Pe)
- ] 右方括号 (U+005D, Pe)
- } 右花括号 (U+007D, Pe)
但是,我的函数却打印出:
// Lo: 41 Hi: 93 Stride: 52
// ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]
我可能完全误解了这个函数的工作方式。
我应该如何正确地获取给定类别(例如上面的标点符号结束Pe)中的字符列表?
英文:
I'd like some help on understanding the unicode package's RangeTable.
Using this (supposedly helping) function:
func printChars(ranges []unicode.Range16) {
for _, r := range ranges {
if r.Hi >= 0x80 { // show only ascii
break
}
fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
for c := r.Lo; c <= r.Hi; c++ {
fmt.Print(string(c) + " ")
}
}
fmt.Println()
}
For digits, I can do printChars(unicode.Digit.R16)
, and the sequence of digits make sense to me.
// Lo: 48 Hi: 57 Stride: 1
// 0 1 2 3 4 5 6 7 8 9
However, to get punctuation printChars(unicode.Punct.R16)
results in
// Lo: 33 Hi: 35 Stride: 1
// ! " #
// Lo: 37 Hi: 42 Stride: 1
// % & ' ( ) *
// Lo: 44 Hi: 47 Stride: 1
// , - . /
// Lo: 58 Hi: 59 Stride: 1
// : ;
// Lo: 63 Hi: 64 Stride: 1
// ? @
// Lo: 91 Hi: 93 Stride: 1
// [ \ ]
// Lo: 95 Hi: 123 Stride: 28
// _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {
I'm surprised that the lower case letters are included too. Also, what does "Stride" mean? It's 1 for all but the last, but the hi-lo difference varies.
As another example, printChars(unicode.Pe.R16)
. I thought this should give only the end punctuation:
- ) right parenthesis (U+0029, Pe)
- ] right square bracket (U+005D, Pe)
- } right curly bracket (U+007D, Pe)
But instead my function prints
// Lo: 41 Hi: 93 Stride: 52
// ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]
Presumably I'm completely misunderstanding the way this is supposed to work.
How might I correctly get a list of characters in a given category, for example, Punctuation End (Pe) as above?
答案1
得分: 2
步幅(Stride)是在范围上迭代时的步长。让我们将0x80
的边界提高一点,并使用Stride
来进行迭代循环:
package main
import (
"fmt"
"unicode"
)
func printChars(ranges []unicode.Range16) {
for _, r := range ranges {
if r.Hi >= 0x100 {
break
}
fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
for c := r.Lo; c <= r.Hi; c+=r.Stride {
fmt.Print(string(c) + " ")
}
}
fmt.Println()
}
func main() {
printChars(unicode.Punct.R16)
}
以下是输出结果:
% go run main.go
Lo: 33 Hi: 35 Stride: 1
! " #
Lo: 37 Hi: 42 Stride: 1
% & ' ( ) *
Lo: 44 Hi: 47 Stride: 1
, - . /
Lo: 58 Hi: 59 Stride: 1
: ;
Lo: 63 Hi: 64 Stride: 1
? @
Lo: 91 Hi: 93 Stride: 1
[ \ ]
Lo: 95 Hi: 123 Stride: 28
_ {
Lo: 125 Hi: 161 Stride: 36
} ¡
Lo: 167 Hi: 171 Stride: 4
§ «
Lo: 182 Hi: 183 Stride: 1
¶ ·
Lo: 187 Hi: 191 Stride: 4
» ¿
看起来基本正确。
英文:
Stride is the step with which you have to iterate over the range. Let's heighten the boundary of 0x80
a bit and make the loop to iterate using Stride
:
package main
import (
"fmt"
"unicode"
)
func printChars(ranges []unicode.Range16) {
for _, r := range ranges {
if r.Hi >= 0x100 {
break
}
fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
for c := r.Lo; c <= r.Hi; c+=r.Stride {
fmt.Print(string(c) + " ")
}
}
fmt.Println()
}
func main() {
printChars(unicode.Punct.R16)
}
And here is the output:
<!-- language: lang-none -->
% go run main.go
Lo: 33 Hi: 35 Stride: 1
! " #
Lo: 37 Hi: 42 Stride: 1
% & ' ( ) *
Lo: 44 Hi: 47 Stride: 1
, - . /
Lo: 58 Hi: 59 Stride: 1
: ;
Lo: 63 Hi: 64 Stride: 1
? @
Lo: 91 Hi: 93 Stride: 1
[ \ ]
Lo: 95 Hi: 123 Stride: 28
_ {
Lo: 125 Hi: 161 Stride: 36
} ¡
Lo: 167 Hi: 171 Stride: 4
§ «
Lo: 182 Hi: 183 Stride: 1
¶ ·
Lo: 187 Hi: 191 Stride: 4
» ¿
Looks pretty much correct to me.
答案2
得分: 2
这是一个辅助函数,可以方便地迭代包含在RangeTable中的所有符文:
func RunesFromRange(tab *unicode.RangeTable) <-chan rune {
res := make(chan rune)
go func() {
for _, r16 := range tab.R16 {
for c := r16.Lo; c <= r16.Hi; c += r16.Stride {
res <- rune(c)
}
}
for _, r32 := range tab.R32 {
for c := r32.Lo; c <= r32.Hi; c += r32.Stride {
res <- rune(c)
}
}
close(res)
}()
return res
}
可以按照以下方式使用该函数:
for c := range RunesFromRange(unicode.Punct) {
fmt.Printf("%04x %s\n", c, string(c))
}
可以在Go Playground上运行可执行代码(我喜欢输出中以0x0df4开头的字符)。
英文:
Here is a helper function which makes it easy to iterate over all runes contained in a RangeTable:
func RunesFromRange(tab *unicode.RangeTable) <-chan rune {
res := make(chan rune)
go func() {
for _, r16 := range tab.R16 {
for c := r16.Lo; c <= r16.Hi; c += r16.Stride {
res <- rune(c)
}
}
for _, r32 := range tab.R32 {
for c := r32.Lo; c <= r32.Hi; c += r32.Stride {
res <- rune(c)
}
}
close(res)
}()
return res
}
The function can be used as follows:
for c := range RunesFromRange(unicode.Punct) {
fmt.Printf("%04x %s\n", c, string(c))
}
Runnable code to play with is on the Go Playground (I like the characters starting with 0x 0df4 in the output).
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论