2015年7月6日 06:25:46go评论106阅读模式

英文:

Last rune of golang unicode/norm iterator not being read

问题

我正在使用golang.org/x/text/unicode/norm包来迭代[]byte中的符文。我选择了这种方法，因为我需要检查每个符文并保持有关符文序列的信息。最后一次调用iter.Next()没有读取最后一个符文。它在最后一个符文上返回0字节。

以下是代码：

package main
import (
  "fmt"
  "unicode/utf8"
  "golang.org/x/text/unicode/norm"
)
func main() {
  var (
    n   int
    r   rune
    it  norm.Iter
    out []byte
  )
  in := []byte(`test`)
  fmt.Printf("%s\n", in)
  fmt.Println(in)
  it.Init(norm.NFD, in)
  for !it.Done() {
    ruf := it.Next()
    r, n = utf8.DecodeRune(ruf)
    fmt.Printf("bytes read: %d. val: %q\n", n, r)
    buf := make([]byte, utf8.RuneLen(r))
    utf8.EncodeRune(buf, r)
    out = norm.NFC.Append(out, buf...)
  }
  fmt.Printf("%s\n", out)
  fmt.Println(out)
}

这将产生以下输出：

test
[116 101 115 116]
bytes read: 1. val: 't'
bytes read: 1. val: 'e'
bytes read: 1. val: 's'
bytes read: 0. val: '�'
tes�
[116 101 115 239 191 189]

英文:

I'm using the golang.org/x/text/unicode/norm package to iterate over runes in a []byte. I've chosen this approach as I need to inspect each rune and maintain information about the sequence of runes. The last call to iter.Next() does not read the last rune. It gives 0 bytes read on the last rune.

Here is the code:

package main
import (
  &quot;fmt&quot;
  &quot;unicode/utf8&quot;
  &quot;golang.org/x/text/unicode/norm&quot;
)
func main() {
  var (
    n   int
    r   rune
    it  norm.Iter
    out []byte
  )
  in := []byte(`test`)
  fmt.Printf(&quot;%s\n&quot;, in)
  fmt.Println(in)
  it.Init(norm.NFD, in)
  for !it.Done() {
    ruf := it.Next()
    r, n = utf8.DecodeRune(ruf)
    fmt.Printf(&quot;bytes read: %d. val: %q\n&quot;, n, r)
    buf := make([]byte, utf8.RuneLen(r))
    utf8.EncodeRune(buf, r)
    out = norm.NFC.Append(out, buf...)
  }
  fmt.Printf(&quot;%s\n&quot;, out)
  fmt.Println(out)
}

This produces the following output:

test
[116 101 115 116]
bytes read: 1. val: &#39;t&#39;
bytes read: 1. val: &#39;e&#39;
bytes read: 1. val: &#39;s&#39;
bytes read: 0. val: &#39;�&#39;
tes�
[116 101 115 239 191 189]

答案1

得分: 0

可能这是golang.org/x/text/unicode/norm和它的*Init()*函数中的一个错误。

在该包的测试和示例中，我看到都使用了InitString。所以，作为一种解决方法，如果你将：

 it.Init(norm.NFD, in)

改为：

 it.InitString(norm.NFD, `test`)

事情将按预期工作。

我建议提交一个错误报告，但要注意，由于这是在"/x"目录下，该包被Go开发人员视为实验性的。

（顺便说一句，我使用了我的Go调试器来帮助我追踪发生了什么，但我必须说它的使用远远不是我想看到的那种调试器。）

英文:

It is possible this is a bug in golang.org/x/text/unicode/norm and its Init() function.

In the package's test and example that I see all use InitString. So as a workaround, if you change:

 it.Init(norm.NFD, in)

to:

 it.InitString(norm.NFD, `test`)

things will work as expected.

I would suggest opening up a bug report, but beware that since this is in the "/x" directory that package is considered experimental by go developers.

(BTW, I used my the go debugger to help me track down what's going on, but I should say its use was far the kind of debugger I'd like to see.)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Golang Unicode/Norm 迭代器未读取最后一个符文。

问题

答案1

关于接口和回调函数的代码解释

如何阅读Go文档？

初始化 Go AppEngine 应用程序与 Cloud Datastore。

使用golang的os/exec，我如何将一个进程的stdout复制到另一个进程的stdin？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。