检查输入是否为 JSON 或 YAML 的鲁棒方式

huangapple go评论113阅读模式
英文:

Robust way check if the input is JSON or YAML

问题

在下面的代码中,应该接受多行字符串输入,可以是JSON或YAML格式。首先尝试将输入解析为JSON,如果解析为JSON失败,则尝试将其解析为YAML,如果两者都失败,则返回错误。

现在的问题是yaml.Unmarshal()的问题。我检查了一下,如果输入是JSON字符串,它从不返回错误(无论正确与否)。主要问题是yaml.Unmarshal从不返回错误。

起初,我以为是yaml.Unmarshal的实现错误,但我觉得它在解析输入时尽力而为,并且结构不违反yaml的规定,所以它从不返回错误。

所以我的问题是如何正确地测试输入是JSON还是YAML?
在我的看法中,唯一的方法是在JSON解析器中区分两种错误情况。原因是JSON的结构通常更严格。
当输入根本不是JSON时,以及当输入是JSON但JSON结构有错误时。第二种情况将使我根本不需要调用yaml解析器。

也许有人能提出一个更简洁的解决方案?

谢谢。

英文:

In code below should accept a multiline as string input as either JSON or YAML. It firsts attempts to read the input as JSON, and if JSON failed it makes a second attempt to read it as YAML if both failed return error.

Now the problem is with yaml.Unmarshal(). I check, it never returns an error if the input is JSON string. (correct or incorrect). The main issue yaml.Unmarshal never returns an error.

Initially, I thought it error on yaml.Unmarshal implementation, but it looks to me it makes the best effort when parsing input, and structure doesn't violate yaml; it never returns an error.

func SpecsFromString(str string) (*Something, error) {
	r := strings.NewReader(str)
	return ReadTenantsSpec(r)
}

func ReadSpec(b io.Reader) (*Spec, error) {

	var spec Spec

	buffer, err := ioutil.ReadAll(b)
	if err != nil {
		return nil, err
	}

	err = json.Unmarshal(buffer, &spec)
	if err == nil {
		return &spec, nil
	}

	err = yaml.Unmarshal(buffer, &spec)
	if err == nil {
		return &spec, nil
	}

	return nil, &InvalidTenantsSpec{"unknown format"}
}

So my question how to properly do this test if the input is JSON or YAML?
It looks to me that the only way to do that on JSON unmashler differentiates
two error cases. The reason is JSON generally more strict in structure.
When the input is not JSON at all, and when input is JSON but an error in the structure of JSON. That second case will allow me never to call the yaml parser
in the first place.

Maybe someone comes up with a more neat solution?

Thank you.

答案1

得分: 1

json.Unmarshal在无效的JSON语法时会返回SyntaxError,当语法正确但解组失败时会返回其他不同的错误,因此您可以使用它来区分。

关于YAML,如果您使用yaml.v3,可以编写一个自定义的解组器来访问输入的Node表示,并检查根节点是否设置了Flow样式,这意味着JSON样式的语法。然而,即使使用这种语法,YAML仍然更加宽松(例如,字符串不需要引号,序列和映射中的尾逗号是允许的),而且虽然可以检查包含标量的引号样式,但可用的信息不足以确保输入是可解析为JSON的(通过此接口无法检测到尾逗号)。

因此,检查输入是否在语法上是有效的JSON的正确方法是检查json.Unmarshal的返回错误。

英文:

json.Unmarshal does return SyntaxError on invalid JSON syntax and has other, different errors when the syntax is correct but unmarshaling fails, so you can use that to differentiate.

Concerning YAML, if you use yaml.v3, you can write a custom unmarshaler to access the Node representation of your input, and check whether the root node has the Style Flow set, which means JSON-like syntax. However, YAML is far more permissive even with this syntax (e.g. strings do not need to be quoted, trailing commas in sequences and mappings are allowed) and while you can check the quoting style of contained scalars, the information available will not be enough to ensure that the input is JSON-parseable (trailing commas cannot be detected via this interface).

So the proper way to check whether the input is syntactically valid JSON is to check the returned error of json.Unmarshal.

答案2

得分: 1

这是我在问题评论中提到的参考内容。这是一个简单的中间件读取器的示例。

  1. 这种模式允许你避免完全解析文本主体,以防它过大。
  2. 理想情况下,它对下游操作没有影响,提供了一个透明的 API。

根据你的示例,你可以这样调用:

b = WrapReader(b)
buffer, err := ioutil.ReadAll(b)
if err != nil {return nil, err}
if b.Writable.A > b.Writable.B {
    err = json.Unmarshal(buffer, &spec)
}
if err != nil || b.Writable.A <= b.Writable.B {
    err = yaml.Unmarshal(buffer, &spec)
}

实际上,它不会改变你处理的接口,同时对过程进行一些控制。这里有很大的改进空间,但上述 API 是由下面的代码提供的:

type Line []byte
type Writable struct {
    Line
    A int
    B int
}
type Decision struct{
    io.Reader
    Writable
}
func (d *Decision) Read(b_rx []byte) (int, error) {
    n, err := d.Reader.Read(b_rx)
    if err != nil && err != io.EOF {return n, err}
    for _, b_tx := range b_rx {
        d.Writable.WriteByte(b_tx)
    }
    return n, nil        
}
func (w *Writable) WriteByte (b byte) error {
    if b == '\n' {
        pJSON, pYAML, err := w.Score()
        if err != nil {return err}
        w.A += pJSON
        w.B += pYAML
        w.Line = make(Line, 0)
    } else {
        w.Line = append(w.Line, b)
    }
    return nil
}
func (w *Writable) Score () (int, int, error) {
    //根据你能想到的评分启发式方法。
    return 0,0,nil
}
func WrapReader(b io.Reader) io.Reader {
    return Decision{b,*new(Writable)}
}
英文:

This is what I was referencing in my comment on the question. This is a simplistic example of a middleware reader.

  1. This pattern allows you to avoid having to fully parse the text body, in case it's unreasonably large
  2. It ideally has no effect on downstream operations, providing a transparent API.

From your example, you'd call something like:

b = WrapReader(b)
buffer, err := ioutil.ReadAll(b)
if err != nil {return nil, err}
if b.Writable.A &gt; b.Writable.B {
    err = json.Unmarshal(buffer, &amp;spec)
}
if err != nil || b.Writable.A &lt;= b.Writable.B {
    err = yaml.Unmarshal(buffer, &amp;spec)
}

Effectively, it doesn't change the interface you're dealing with, while gaining some control over how the process goes down. There's plenty of room for improvement, but the above API is offered by the code below:

type Line []byte
type Writable struct {
    Line
	A int
	B int
}
type Decision struct{
	io.Reader
	Writable
}
func (d *Decision) Read(b_rx []byte) (int, error) {
	n, err := d.Reader.Read(b_rx)
	if err != nil &amp;&amp; err != io.EOF {return n, err}
	for _, b_tx := range b_rx {
		d.Writable.WriteByte(b_tx)
	}
	return n, nil    	
}
func (w *Writable) WriteByte (b byte) error {
	if b == &#39;\n&#39; {
		pJSON, pYAML, err := w.Score()
		if err != nil {return err}
		w.A += pJSON
		w.B += pYAML
		w.Line = make(Line, 0)
	} else {
	    w.Line = append(w.Line, b)
    }
    return nil
}
func (w *Writable) Score () (int, int, error) {
	//whatever scoring heuristics you can think of.
	return 0,0,nil
}
func WrapReader(b io.Reader) io.Reader {
    return Decision{b,*new(Writable)}
}

答案3

得分: 0

我几天前在Bash脚本中遇到了同样的问题:如何检测文件是否包含JSON、YAML或纯文本?

我的解决方案是:

处理为 json

  • 可以无错误地解析为json

处理为 text

  • 可以解析为yaml,但类型只是一个yaml字符串

处理为 yaml

  • 可以解析为yaml,但不仅仅是一个yaml字符串
  • 无法解析为json

Bash脚本片段

parse_as_json() {
  jq -e '. ' > /dev/null 2>&1 < "$1"
}

parse_as_yaml() {
  local FILE=$1
  parse_as_json $FILE && return 1
  parse_as_text $FILE && return 1
  yq -e > /dev/null 2>&1 < $FILE || return 1
}

parse_as_text() {
  [[ $(yq 'type == "string"' 2>&1 < $1) == true ]]
}
英文:

I came across the same problem some days ago in bash scripting: How can I detect if a file contains json, yaml or plain text?

My solution was:

process as json

  • can be parsed as json without errors

process as text

  • can be parsed as yaml, but type is just a yaml string

process as yaml

  • can be parsed as yaml, but is not just a yaml string
  • cannot be parsed as json

Bash scripting snippet

parse_as_json() {
  jq -e &#39;.&#39; &gt; /dev/null 2&gt;&amp;1 &lt; &quot;$1&quot;
}

parse_as_yaml() {
  local FILE=$1
  parse_as_json $FILE &amp;&amp; return 1
  parse_as_text $FILE &amp;&amp; return 1
  yq -e &gt; /dev/null 2&gt;&amp;1 &lt; $FILE || return 1
}

parse_as_text() {
  [[ $(yq &#39;type == &quot;string&quot;&#39; 2&gt;&amp;1 &lt; $1) == true ]]
}

huangapple
  • 本文由 发表于 2021年6月28日 10:03:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/68156758.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定