在Go中输出未引用的Unicode。

huangapple go评论79阅读模式
英文:

Output unquoted Unicode in Go

问题

我正在使用goyaml作为YAML格式化工具。通过加载和转储YAML文件,我可以对其进行源代码格式化。我将YAML源文件中的数据解组为结构体,将这些字节编组,并将字节写入输出文件。但是这个过程会将我的Unicode字符串转换为带引号字符串的字面版本,我不知道如何恢复它。

示例输入subtitle.yaml

line: 你好

我已经将所有内容简化为最小的可重现问题。以下是代码,使用_来捕获不弹出的错误:

package main

import (
    "io/ioutil"
    "gopkg.in/yaml.v1"
)

type Subtitle struct {
    Line string
}

func main() {
    filename := "subtitle.yaml"
    in, _ := ioutil.ReadFile(filename)
    var subtitle Subtitle
    _ = yaml.Unmarshal(in, &subtitle)
    out, _ := yaml.Marshal(&subtitle)

    _ = ioutil.WriteFile(filename, out, 0644)
}

实际输出subtitle.yaml

line: "\u4F60\u597D"

我想在获得变量out之后恢复goyaml中的奇怪现象。

下面是注释掉的用于打印符文的代码块,它在符文之间添加空格以增加清晰度。它输出以下内容。它显示Unicode符文(如)没有被解码,而是被当作字面量处理:

l i n e :   "\ u 4 F 6 0 \ u 5 9 7 D "

在将其写入输出文件之前,我应该如何取消引号out,使输出看起来像输入(尽管经过美化)?

期望的输出subtitle.yaml

line: "你好"

临时解决方案

我已经提交了https://github.com/go-yaml/yaml/issues/11。与此同时,@bobince关于yaml_emitter_set_unicode的提示有助于发现问题。它被定义为C绑定,但从未被调用(或给予设置选项的机会)!我修改了encode.go并在第20行添加了yaml_emitter_set_unicode(&e.emitter, true),一切都按预期工作。最好将其设置为可选,但这需要更改Marshal API。

英文:

I'm using goyaml as a YAML beautifier. By loading and dumping a YAML file, I can source-format it. I unmarshal the data from a YAML source file into a struct, marshal those bytes, and write the bytes to an output file. But the process morphs my Unicode strings into the literal version of the quoted strings, and I don't know how to reverse it.

Example input subtitle.yaml:

line: 你好

I've stripped everything down to the smallest reproducible problem. Here's the code, using _ to catch errors which don't pop-up:

package main                                                                                                                                                                                      
                                                                                                                                                                                                  
import (                                                                                                                                                                                          
    "io/ioutil"                                                                                                                                                                                   
    //"unicode/utf8"                                                                                                                                                                              
    //"fmt"                                                                                                                                                                                       
                                                                                                                                                                                                  
    "gopkg.in/yaml.v1"                                                                                                                                                                        
)                                                                                                                                                                                                 
                                                                                                                                                                                                  
type Subtitle struct {                                                                                                                                                                            
    Line string                                                                                                                                                                                   
}                                                                                                                                                                                                 
                                                                                                                                                                                                  
func main() {                                                                                                                                                                                     
    filename := "subtitle.yaml"                                                                                                                                                                   
    in, _ := ioutil.ReadFile(filename)                                                                                                                                                            
    var subtitle Subtitle                                                                                                                                                                         
    _ = goyaml.Unmarshal(in, &subtitle)                                                                                                                                                           
    out, _ := goyaml.Marshal(&subtitle)                                                                                                                                                           
                                                                                                                                                                                                  
    //for len(out) > 0 { // For debugging, see what the runes are                                                                                                                                                                         
    //  r, size := utf8.DecodeRune(out)                                                                                                                                                             
    //  fmt.Printf("%c ", r)                                                                                                                                                              
    //  out = out[size:]                                                                                                                                                                            
    //}                                                                                                                                                                                           
                                                                                                                                                                                                  
    _ = ioutil.WriteFile(filename, out, 0644)                                                                                                                                                     
}

Actual output subtitle.yaml:

line: "\u4F60\u597D"

I want to reverse the weirdness in goyaml after I get the variable out.

The commented-out rune-printing code block, which adds spaces between runes for clarity, outputs the following. It shows that Unicode runes like aren't being decoded, but treated literally:

l i n e :   " \ u 4 F 6 0 \ u 5 9 7 D "

How can I unquote out, before writing it to the output file, so that the output looks like the input (albeit beautified)?

Desired output subtitle.yaml:

line: "你好"

Temporary Solution

I've filed https://github.com/go-yaml/yaml/issues/11. In the meantime, @bobince's tip on yaml_emitter_set_unicode was helpful in unconvering the problem. It was defined as a C binding but never called (or given an option to set it)! I changed encode.go and added yaml_emitter_set_unicode(&e.emitter, true) to line 20, and everything works as expected. It would be better to make it optional, but that would require a change in the Marshal API.

答案1

得分: 1

遇到类似问题时,可以使用(*Regexp) ReplaceAllFunc函数来解决goyaml.Marshal()中的bug。这个函数可以用来扩展字节数组中转义的Unicode字符。这种方法可能对于生产环境来说有点不太规范,但对于示例来说是有效的。

package main

import (
    "io/ioutil"
    "unicode/utf8"
    "regexp"
    "strconv"
    "launchpad.net/goyaml"
)

type Subtitle struct {
    Line string
}

var reFind = regexp.MustCompile(`^\s*[^\s\:]+\:\s*".*\\u.*"\s*$`)
var reFindU = regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)

func expandUnicodeInYamlLine(line []byte) []byte {
  // TODO: restrict this to the quoted string value
  return reFindU.ReplaceAllFunc(line, expandUnicodeRune)
}

func expandUnicodeRune(esc []byte) []byte {
  ri, _:= strconv.ParseInt(string(esc[2:]), 16, 32)
  r := rune(ri)
  repr := make([]byte, utf8.RuneLen(r))
  utf8.EncodeRune(repr, r)
  return repr
}

func main() {
    filename := "subtitle.yaml"
    filenameOut := "subtitleout.yaml"
    in, _ := ioutil.ReadFile(filename)
    var subtitle Subtitle
    _ = goyaml.Unmarshal(in, &subtitle)
    out, _ := goyaml.Marshal(&subtitle)

    out = reFind.ReplaceAllFunc(out, expandUnicodeInYamlLine)
    _ = ioutil.WriteFile(filenameOut, out, 0644)
}

以上是代码的翻译部分。

英文:

Had a similar issue and could apply this to circumvent the bug in goyaml.Marshal(). (*Regexp) ReplaceAllFunc is your friend which you can use to expand the escaped Unicode runes in the byte array. A little bit too dirty for production maybe, but works for the example 在Go中输出未引用的Unicode。

package main                                                                                                                                                                                      

import (                                                                                                                                                                                          
    "io/ioutil"                                                                                                                                                                                   
    "unicode/utf8"                                                                                                                                                                              
    "regexp"
    "strconv"
    "launchpad.net/goyaml"                                                                                                                                                                        
)                                                                                                                                                                                                 

type Subtitle struct {                                                                                                                                                                            
    Line string                                                                                                                                                                                   
}                                                                                                                                                                                                 

var reFind = regexp.MustCompile(`^\s*[^\s\:]+\:\s*".*\\u.*"\s*$`)
var reFindU = regexp.MustCompile(`\\u[0-9a-fA-F]{4}`)

func expandUnicodeInYamlLine(line []byte) []byte {
  // TODO: restrict this to the quoted string value
  return reFindU.ReplaceAllFunc(line, expandUnicodeRune)
}

func expandUnicodeRune(esc []byte) []byte {
  ri, _:= strconv.ParseInt(string(esc[2:]), 16, 32)
  r := rune(ri)
  repr := make([]byte, utf8.RuneLen(r))
  utf8.EncodeRune(repr, r)
  return repr
}

func main() {                                                                                                                                                                                     
    filename := "subtitle.yaml"
    filenameOut := "subtitleout.yaml"
    in, _ := ioutil.ReadFile(filename)                                                                                                                                                            
    var subtitle Subtitle                                                                                                                                                                         
    _ = goyaml.Unmarshal(in, &subtitle)
    out, _ := goyaml.Marshal(&subtitle)                                                                                                                                                           
    
    out = reFind.ReplaceAllFunc(out, expandUnicodeInYamlLine)
    _ = ioutil.WriteFile(filenameOut, out, 0644)                                                                                                                                                     
}

huangapple
  • 本文由 发表于 2014年2月11日 16:39:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/21696845.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定