2013年11月24日 23:01:34go评论188阅读模式

英文:

How does Stride in unicode.RangeTable work?

问题

我想帮助你理解unicode包中的RangeTable。

使用这个（据说有帮助的）函数：

func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {
    if r.Hi >= 0x80 { // 只显示ASCII字符
      break
    }
    fmt.Println("Lo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
    for c := r.Lo; c <= r.Hi; c++ {
      fmt.Print(string(c) + " ")
    }
  }
  fmt.Println()
}

对于数字，你可以使用printChars(unicode.Digit.R16)，数字的序列对我来说是有意义的。

// Lo: 48 Hi: 57 Stride: 1
// 0 1 2 3 4 5 6 7 8 9

然而，对于标点符号printChars(unicode.Punct.R16)的结果是：

// Lo: 33 Hi: 35 Stride: 1
// ! " #
// Lo: 37 Hi: 42 Stride: 1
// % & ' ( ) *
// Lo: 44 Hi: 47 Stride: 1
// , - . /
// Lo: 58 Hi: 59 Stride: 1
// : ;
// Lo: 63 Hi: 64 Stride: 1
// ? @
// Lo: 91 Hi: 93 Stride: 1
// [ \ ]
// Lo: 95 Hi: 123 Stride: 28
// _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {

我对小写字母也被包括在内感到惊讶。此外，"Stride"是什么意思？除了最后一个之外，它们都是1，但是hi-lo的差异是不同的。

另一个例子是printChars(unicode.Pe.R16)。我认为这应该只给出结束标点符号：

) 右括号 (U+0029, Pe)
] 右方括号 (U+005D, Pe)
} 右花括号 (U+007D, Pe)

但是，我的函数却打印出：

// Lo: 41 Hi: 93 Stride: 52
// ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; < = > ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]

我可能完全误解了这个函数的工作方式。

我应该如何正确地获取给定类别（例如上面的标点符号结束Pe）中的字符列表？

英文:

I'd like some help on understanding the unicode package's RangeTable.

Using this (supposedly helping) function:

func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {
    if r.Hi &gt;= 0x80 { // show only ascii
      break
    }
    fmt.Println(&quot;\nLo:&quot;, r.Lo, &quot;Hi:&quot;, r.Hi, &quot;Stride:&quot;, r.Stride)
    for c := r.Lo; c &lt;= r.Hi; c++ {
      fmt.Print(string(c) + &quot; &quot;)
    }
  }
  fmt.Println()
}

For digits, I can do printChars(unicode.Digit.R16), and the sequence of digits make sense to me.

 // Lo: 48 Hi: 57 Stride: 1
 // 0 1 2 3 4 5 6 7 8 9

However, to get punctuation printChars(unicode.Punct.R16) results in

 // Lo: 33 Hi: 35 Stride: 1
 // ! &quot; #
 // Lo: 37 Hi: 42 Stride: 1
 // % &amp; &#39; ( ) *
 // Lo: 44 Hi: 47 Stride: 1
 //  , - . /
 // Lo: 58 Hi: 59 Stride: 1
 // : ;
 // Lo: 63 Hi: 64 Stride: 1
 // ? @
 // Lo: 91 Hi: 93 Stride: 1
 // [ \ ]
 // Lo: 95 Hi: 123 Stride: 28
 // _ ` a b c d e f g h i j k l m n o p q r s t u v w x y z {

I'm surprised that the lower case letters are included too. Also, what does "Stride" mean? It's 1 for all but the last, but the hi-lo difference varies.

As another example, printChars(unicode.Pe.R16). I thought this should give only the end punctuation:

) right parenthesis (U+0029, Pe)
] right square bracket (U+005D, Pe)
} right curly bracket (U+007D, Pe)

But instead my function prints

 // Lo: 41 Hi: 93 Stride: 52
 // ) * + , - . / 0 1 2 3 4 5 6 7 8 9 : ; &lt; = &gt; ? @ A B C D E F G H I J K L M N O P Q R S T U V W X Y Z [ \ ]

Presumably I'm completely misunderstanding the way this is supposed to work.

How might I correctly get a list of characters in a given category, for example, Punctuation End (Pe) as above?

答案1

得分: 2

步幅（Stride）是在范围上迭代时的步长。让我们将0x80的边界提高一点，并使用Stride来进行迭代循环：

package main
import (
    "fmt"
    "unicode"
)
func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {
    if r.Hi >= 0x100 {
      break
    }
    fmt.Println("\nLo:", r.Lo, "Hi:", r.Hi, "Stride:", r.Stride)
    for c := r.Lo; c <= r.Hi; c+=r.Stride {
      fmt.Print(string(c) + " ")
    }
  }
  fmt.Println()
}
func main() {
    printChars(unicode.Punct.R16)
}

以下是输出结果：

% go run main.go
Lo: 33 Hi: 35 Stride: 1
! " # 
Lo: 37 Hi: 42 Stride: 1
% & ' ( ) * 
Lo: 44 Hi: 47 Stride: 1
, - . / 
Lo: 58 Hi: 59 Stride: 1
: ; 
Lo: 63 Hi: 64 Stride: 1
? @ 
Lo: 91 Hi: 93 Stride: 1
[ \ ] 
Lo: 95 Hi: 123 Stride: 28
_ { 
Lo: 125 Hi: 161 Stride: 36
} ¡ 
Lo: 167 Hi: 171 Stride: 4
§ « 
Lo: 182 Hi: 183 Stride: 1
¶ · 
Lo: 187 Hi: 191 Stride: 4
» ¿

看起来基本正确。

英文:

Stride is the step with which you have to iterate over the range. Let's heighten the boundary of 0x80 a bit and make the loop to iterate using Stride:

package main
import (
    &quot;fmt&quot;
    &quot;unicode&quot;
)
func printChars(ranges []unicode.Range16) {
  for _, r := range ranges {
    if r.Hi &gt;= 0x100 {
      break
    }
    fmt.Println(&quot;\nLo:&quot;, r.Lo, &quot;Hi:&quot;, r.Hi, &quot;Stride:&quot;, r.Stride)
    for c := r.Lo; c &lt;= r.Hi; c+=r.Stride {
      fmt.Print(string(c) + &quot; &quot;)
    }
  }
  fmt.Println()
}
func main() {
    printChars(unicode.Punct.R16)
}

And here is the output:

% go run main.go
Lo: 33 Hi: 35 Stride: 1
! &quot; # 
Lo: 37 Hi: 42 Stride: 1
% &amp; &#39; ( ) * 
Lo: 44 Hi: 47 Stride: 1
, - . / 
Lo: 58 Hi: 59 Stride: 1
: ; 
Lo: 63 Hi: 64 Stride: 1
? @ 
Lo: 91 Hi: 93 Stride: 1
[ \ ] 
Lo: 95 Hi: 123 Stride: 28
_ { 
Lo: 125 Hi: 161 Stride: 36
} &#161; 
Lo: 167 Hi: 171 Stride: 4
&#167; &#171; 
Lo: 182 Hi: 183 Stride: 1
&#182; &#183; 
Lo: 187 Hi: 191 Stride: 4
&#187; &#191;

Looks pretty much correct to me.

答案2

得分: 2

这是一个辅助函数，可以方便地迭代包含在RangeTable中的所有符文：

func RunesFromRange(tab *unicode.RangeTable) <-chan rune {
    res := make(chan rune)
    go func() {
        for _, r16 := range tab.R16 {
            for c := r16.Lo; c <= r16.Hi; c += r16.Stride {
                res <- rune(c)
            }
        }
        for _, r32 := range tab.R32 {
            for c := r32.Lo; c <= r32.Hi; c += r32.Stride {
                res <- rune(c)
            }
        }
        close(res)
    }()
    return res
}

可以按照以下方式使用该函数：

for c := range RunesFromRange(unicode.Punct) {
    fmt.Printf("%04x %s\n", c, string(c))
}

可以在Go Playground上运行可执行代码（我喜欢输出中以0x0df4开头的字符）。

英文:

Here is a helper function which makes it easy to iterate over all runes contained in a RangeTable:

func RunesFromRange(tab *unicode.RangeTable) &lt;-chan rune {
	res := make(chan rune)
	go func() {
		for _, r16 := range tab.R16 {
			for c := r16.Lo; c &lt;= r16.Hi; c += r16.Stride {
				res &lt;- rune(c)
			}
		}
		for _, r32 := range tab.R32 {
			for c := r32.Lo; c &lt;= r32.Hi; c += r32.Stride {
				res &lt;- rune(c)
			}
		}
		close(res)
	}()
	return res
}

The function can be used as follows:

for c := range RunesFromRange(unicode.Punct) {
	fmt.Printf(&quot;%04x %s\n&quot;, c, string(c))
}

Runnable code to play with is on the Go Playground (I like the characters starting with 0x 0df4 in the output).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Stride在unicode.RangeTable中的工作原理是什么？

问题

答案1

答案2

在使用Go的mongo-driver运行Find().All()时遇到的问题。

Kperf构建失败

在Go语言的vector包中是否有类似Java中Vector类的removeElement函数的版本？

为什么这些 goroutine 在更多并发执行时没有提高性能？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。