英文:
Last rune of golang unicode/norm iterator not being read
问题
我正在使用golang.org/x/text/unicode/norm
包来迭代[]byte
中的符文。我选择了这种方法,因为我需要检查每个符文并保持有关符文序列的信息。最后一次调用iter.Next()
没有读取最后一个符文。它在最后一个符文上返回0字节。
以下是代码:
package main
import (
"fmt"
"unicode/utf8"
"golang.org/x/text/unicode/norm"
)
func main() {
var (
n int
r rune
it norm.Iter
out []byte
)
in := []byte(`test`)
fmt.Printf("%s\n", in)
fmt.Println(in)
it.Init(norm.NFD, in)
for !it.Done() {
ruf := it.Next()
r, n = utf8.DecodeRune(ruf)
fmt.Printf("bytes read: %d. val: %q\n", n, r)
buf := make([]byte, utf8.RuneLen(r))
utf8.EncodeRune(buf, r)
out = norm.NFC.Append(out, buf...)
}
fmt.Printf("%s\n", out)
fmt.Println(out)
}
这将产生以下输出:
test
[116 101 115 116]
bytes read: 1. val: 't'
bytes read: 1. val: 'e'
bytes read: 1. val: 's'
bytes read: 0. val: '�'
tes�
[116 101 115 239 191 189]
英文:
I'm using the golang.org/x/text/unicode/norm
package to iterate over runes in a []byte
. I've chosen this approach as I need to inspect each rune and maintain information about the sequence of runes. The last call to iter.Next()
does not read the last rune. It gives 0 bytes read on the last rune.
Here is the code:
package main
import (
"fmt"
"unicode/utf8"
"golang.org/x/text/unicode/norm"
)
func main() {
var (
n int
r rune
it norm.Iter
out []byte
)
in := []byte(`test`)
fmt.Printf("%s\n", in)
fmt.Println(in)
it.Init(norm.NFD, in)
for !it.Done() {
ruf := it.Next()
r, n = utf8.DecodeRune(ruf)
fmt.Printf("bytes read: %d. val: %q\n", n, r)
buf := make([]byte, utf8.RuneLen(r))
utf8.EncodeRune(buf, r)
out = norm.NFC.Append(out, buf...)
}
fmt.Printf("%s\n", out)
fmt.Println(out)
}
This produces the following output:
test
[116 101 115 116]
bytes read: 1. val: 't'
bytes read: 1. val: 'e'
bytes read: 1. val: 's'
bytes read: 0. val: '�'
tes�
[116 101 115 239 191 189]
答案1
得分: 0
可能这是golang.org/x/text/unicode/norm
和它的*Init()*函数中的一个错误。
在该包的测试和示例中,我看到都使用了InitString。所以,作为一种解决方法,如果你将:
it.Init(norm.NFD, in)
改为:
it.InitString(norm.NFD, `test`)
事情将按预期工作。
我建议提交一个错误报告,但要注意,由于这是在"/x"目录下,该包被Go开发人员视为实验性的。
(顺便说一句,我使用了我的Go调试器来帮助我追踪发生了什么,但我必须说它的使用远远不是我想看到的那种调试器。)
英文:
It is possible this is a bug in golang.org/x/text/unicode/norm
and its Init() function.
In the package's test and example that I see all use InitString. So as a workaround, if you change:
it.Init(norm.NFD, in)
to:
it.InitString(norm.NFD, `test`)
things will work as expected.
I would suggest opening up a bug report, but beware that since this is in the "/x" directory that package is considered experimental by go developers.
(BTW, I used my the go debugger to help me track down what's going on, but I should say its use was far the kind of debugger I'd like to see.)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论