Stride在unicode.RangeTable中的工作原理是什么?

huangapple go评论186阅读模式
英文:

How does Stride in unicode.RangeTable work?

问题

我想帮助你理解unicode包中的RangeTable。

使用这个(据说有帮助的)函数:

  1. func printChars(ranges []unicode.Range16) {
  2. for _, r := range ranges {
  3. if r.Hi >= 0x80 { // 只显示ASCII字符
  4. break
  5. }
  6. fmt.Println("Lo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
  7. for c := r.Lo; c <= r.Hi; c++ {
  8. fmt.Print(string(c) + " ")
  9. }
  10. }
  11. fmt.Println()
  12. }

对于数字,你可以使用printChars(unicode.Digit.R16),数字的序列对我来说是有意义的。

  1. // Lo: 48 Hi: 57 Stride: 1
  2. // 0 1 2 3 4 5 6 7 8 9

然而,对于标点符号printChars(unicode.Punct.R16)的结果是:

  1. // Lo: 33 Hi: 35 Stride: 1
  2. // ! " #
  3. // Lo: 37 Hi: 42 Stride: 1
  4. // % & ' ( ) *
  5. // Lo: 44 Hi: 47 Stride: 1
  6. // , - . /
  7. // Lo: 58 Hi: 59 Stride: 1
  8. // : ;
  9. // Lo: 63 Hi: 64 Stride: 1
  10. // ? @
  11. // Lo: 91 Hi: 93 Stride: 1
  12. // [ \ ]
  13. // Lo: 95 Hi: 123 Stride: 28
  14. // _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {

我对小写字母也被包括在内感到惊讶。此外,"Stride"是什么意思?除了最后一个之外,它们都是1,但是hi-lo的差异是不同的。

另一个例子是printChars(unicode.Pe.R16)。我认为这应该只给出结束标点符号:

  • ) 右括号 (U+0029, Pe)
  • ] 右方括号 (U+005D, Pe)
  • } 右花括号 (U+007D, Pe)

但是,我的函数却打印出:

  1. // Lo: 41 Hi: 93 Stride: 52
  2. // ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]

我可能完全误解了这个函数的工作方式。

我应该如何正确地获取给定类别(例如上面的标点符号结束Pe)中的字符列表?

英文:

I'd like some help on understanding the unicode package's RangeTable.

Using this (supposedly helping) function:

  1. func printChars(ranges []unicode.Range16) {
  2. for _, r := range ranges {
  3. if r.Hi &gt;= 0x80 { // show only ascii
  4. break
  5. }
  6. fmt.Println(&quot;\nLo:&quot;, r.Lo, &quot;Hi:&quot;, r.Hi, &quot;Stride:&quot;, r.Stride)
  7. for c := r.Lo; c &lt;= r.Hi; c++ {
  8. fmt.Print(string(c) + &quot; &quot;)
  9. }
  10. }
  11. fmt.Println()
  12. }

For digits, I can do printChars(unicode.Digit.R16), and the sequence of digits make sense to me.

  1. // Lo: 48 Hi: 57 Stride: 1
  2. // 0 1 2 3 4 5 6 7 8 9

However, to get punctuation printChars(unicode.Punct.R16) results in

  1. // Lo: 33 Hi: 35 Stride: 1
  2. // ! &quot; #
  3. // Lo: 37 Hi: 42 Stride: 1
  4. // % &amp; &#39; ( ) *
  5. // Lo: 44 Hi: 47 Stride: 1
  6. // , - . /
  7. // Lo: 58 Hi: 59 Stride: 1
  8. // : ;
  9. // Lo: 63 Hi: 64 Stride: 1
  10. // ? @
  11. // Lo: 91 Hi: 93 Stride: 1
  12. // [ \ ]
  13. // Lo: 95 Hi: 123 Stride: 28
  14. // _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {

I'm surprised that the lower case letters are included too. Also, what does "Stride" mean? It's 1 for all but the last, but the hi-lo difference varies.

As another example, printChars(unicode.Pe.R16). I thought this should give only the end punctuation:

  • ) right parenthesis (U+0029, Pe)
  • ] right square bracket (U+005D, Pe)
  • } right curly bracket (U+007D, Pe)

But instead my function prints

  1. // Lo: 41 Hi: 93 Stride: 52
  2. // ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; &lt; = &gt; ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]

Presumably I'm completely misunderstanding the way this is supposed to work.

How might I correctly get a list of characters in a given category, for example, Punctuation End (Pe) as above?

答案1

得分: 2

步幅(Stride)是在范围上迭代时的步长。让我们将0x80的边界提高一点,并使用Stride来进行迭代循环:

  1. package main
  2. import (
  3. "fmt"
  4. "unicode"
  5. )
  6. func printChars(ranges []unicode.Range16) {
  7. for _, r := range ranges {
  8. if r.Hi >= 0x100 {
  9. break
  10. }
  11. fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
  12. for c := r.Lo; c <= r.Hi; c+=r.Stride {
  13. fmt.Print(string(c) + " ")
  14. }
  15. }
  16. fmt.Println()
  17. }
  18. func main() {
  19. printChars(unicode.Punct.R16)
  20. }

以下是输出结果:

  1. % go run main.go
  2. Lo: 33 Hi: 35 Stride: 1
  3. ! " #
  4. Lo: 37 Hi: 42 Stride: 1
  5. % & ' ( ) *
  6. Lo: 44 Hi: 47 Stride: 1
  7. , - . /
  8. Lo: 58 Hi: 59 Stride: 1
  9. : ;
  10. Lo: 63 Hi: 64 Stride: 1
  11. ? @
  12. Lo: 91 Hi: 93 Stride: 1
  13. [ \ ]
  14. Lo: 95 Hi: 123 Stride: 28
  15. _ {
  16. Lo: 125 Hi: 161 Stride: 36
  17. } ¡
  18. Lo: 167 Hi: 171 Stride: 4
  19. § «
  20. Lo: 182 Hi: 183 Stride: 1
  21. ¶ ·
  22. Lo: 187 Hi: 191 Stride: 4
  23. » ¿

看起来基本正确。

英文:

Stride is the step with which you have to iterate over the range. Let's heighten the boundary of 0x80 a bit and make the loop to iterate using Stride:

  1. package main
  2. import (
  3. &quot;fmt&quot;
  4. &quot;unicode&quot;
  5. )
  6. func printChars(ranges []unicode.Range16) {
  7. for _, r := range ranges {
  8. if r.Hi &gt;= 0x100 {
  9. break
  10. }
  11. fmt.Println(&quot;\nLo:&quot;, r.Lo, &quot;Hi:&quot;, r.Hi, &quot;Stride:&quot;, r.Stride)
  12. for c := r.Lo; c &lt;= r.Hi; c+=r.Stride {
  13. fmt.Print(string(c) + &quot; &quot;)
  14. }
  15. }
  16. fmt.Println()
  17. }
  18. func main() {
  19. printChars(unicode.Punct.R16)
  20. }

And here is the output:

<!-- language: lang-none -->

  1. % go run main.go
  2. Lo: 33 Hi: 35 Stride: 1
  3. ! &quot; #
  4. Lo: 37 Hi: 42 Stride: 1
  5. % &amp; &#39; ( ) *
  6. Lo: 44 Hi: 47 Stride: 1
  7. , - . /
  8. Lo: 58 Hi: 59 Stride: 1
  9. : ;
  10. Lo: 63 Hi: 64 Stride: 1
  11. ? @
  12. Lo: 91 Hi: 93 Stride: 1
  13. [ \ ]
  14. Lo: 95 Hi: 123 Stride: 28
  15. _ {
  16. Lo: 125 Hi: 161 Stride: 36
  17. } &#161;
  18. Lo: 167 Hi: 171 Stride: 4
  19. &#167; &#171;
  20. Lo: 182 Hi: 183 Stride: 1
  21. &#182; &#183;
  22. Lo: 187 Hi: 191 Stride: 4
  23. &#187; &#191;

Looks pretty much correct to me.

答案2

得分: 2

这是一个辅助函数,可以方便地迭代包含在RangeTable中的所有符文:

  1. func RunesFromRange(tab *unicode.RangeTable) <-chan rune {
  2. res := make(chan rune)
  3. go func() {
  4. for _, r16 := range tab.R16 {
  5. for c := r16.Lo; c <= r16.Hi; c += r16.Stride {
  6. res <- rune(c)
  7. }
  8. }
  9. for _, r32 := range tab.R32 {
  10. for c := r32.Lo; c <= r32.Hi; c += r32.Stride {
  11. res <- rune(c)
  12. }
  13. }
  14. close(res)
  15. }()
  16. return res
  17. }

可以按照以下方式使用该函数:

  1. for c := range RunesFromRange(unicode.Punct) {
  2. fmt.Printf("%04x %s\n", c, string(c))
  3. }

可以在Go Playground上运行可执行代码(我喜欢输出中以0x0df4开头的字符)。

英文:

Here is a helper function which makes it easy to iterate over all runes contained in a RangeTable:

  1. func RunesFromRange(tab *unicode.RangeTable) &lt;-chan rune {
  2. res := make(chan rune)
  3. go func() {
  4. for _, r16 := range tab.R16 {
  5. for c := r16.Lo; c &lt;= r16.Hi; c += r16.Stride {
  6. res &lt;- rune(c)
  7. }
  8. }
  9. for _, r32 := range tab.R32 {
  10. for c := r32.Lo; c &lt;= r32.Hi; c += r32.Stride {
  11. res &lt;- rune(c)
  12. }
  13. }
  14. close(res)
  15. }()
  16. return res
  17. }

The function can be used as follows:

  1. for c := range RunesFromRange(unicode.Punct) {
  2. fmt.Printf(&quot;%04x %s\n&quot;, c, string(c))
  3. }

Runnable code to play with is on the Go Playground (I like the characters starting with 0x 0df4 in the output).

huangapple
  • 本文由 发表于 2013年11月24日 23:01:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/20176024.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定