如何在GO中解码灵活的XML?

huangapple go评论96阅读模式
英文:

How to decode flexible xml in GO?

问题

我有以下的 XML:

...
<solution>
  <ContainerBlockElement>
    <Paragraph>
       <Paragraph>
          Foo
       </Paragraph>
       <Paragraph>
          bar
       </Paragraph>
   </Paragraph>
 </ContainerBlockElement>
</solution>
...

我想提取 <Paragraph> 的内容,但问题是:服务器可能会发送给我第二种结构:

...
&lt;solution&gt;
  &lt;ContainerBlockElement&gt;
    &lt;Paragraph&gt;
      baz
    &lt;/Paragraph&gt;
  &lt;/ContainerBlockElement&gt;
&lt;/solution&gt;
...

我尝试在 Go 中使用以下结构进行解码,但不起作用:

type Blah struct {
    ...
    Solutions           []string     `xml:"solution>ContainerBlockElement>Paragraph"`
    Solutions2Paragraph []string     `xml:"solution>ContainerBlockElement>Paragraph>Paragraph"`
}

如何解码这个 XML?

英文:

I have the following xml:

    ...
    &lt;solution&gt;
      &lt;ContainerBlockElement&gt;
        &lt;Paragraph&gt;
           &lt;Paragraph&gt;
              Foo
           &lt;/Paragraph&gt;
           &lt;Paragraph&gt;
              bar
           &lt;/Paragraph&gt;
       &lt;/Paragraph&gt;
     &lt;/ContainerBlockElement&gt;
   &lt;/solution&gt;
   ...

I want to extract the <Paragraph> content but the problem is: The server can send me the second structure:

    ...
    &lt;solution&gt;
      &lt;ContainerBlockElement&gt;
        &lt;Paragraph&gt;
          baz
        &lt;/Paragraph&gt;
      &lt;/ContainerBlockElement&gt;
    &lt;/solution&gt;
    ...

I've tried to use this struct in go to decode but it doesn't work:

       type Blah struct {
	       ...
	Solutions           []string     `xml:&quot;solution&gt;ContainerBlockElement&gt;Paragraph&quot;`
	Solutions2Paragraph []string         `xml:&quot;solution&gt;ContainerBlockElement&gt;Paragraph&gt;Paragraph&quot;`

}

How can I decode this?

答案1

得分: 4

由于结构不可预测,将其反序列化为结构体是行不通的。相反,最好使用XML解析器的流模式,使用xml.Decoder.Token按顺序解析元素并根据需要处理它们。

decoder := xml.NewDecoder(xmlFile)
solutions := make([]string,0,0)

for {
    t, _ := decoder.Token()
    if t == nil {
        break
    }
    switch se := t.(type) {
    case xml.StartElement:
        if se.Name.Local == "Paragraph" {
            // 获取段落开始元素之后的下一个标记,即标记内容
            innerText,ok := decoder.Token().(xml.CharData)
            if !ok {
                continue
            }
            solutions = append(solutions, string(innerText))
        }
    }
}

这段代码未经测试,但应该提供一个不错的起点。

英文:

With unpredictable structures, deserializing into a struct is not going to work. Instead, you'll be better off using the streaming mode of the XML parser using xml.Decoder.Token to parse elements in order and handle them as necessary.

decoder := xml.NewDecoder(xmlFile) 
solutions := make([]string,0,0)

for { 
    t, _ := decoder.Token() 
    if t == nil { 
        break 
    }
    switch se := t.(type) { 
    case xml.StartElement: 
        if se.Name.Local == &quot;Paragraph&quot; {
            // Get the next token after the Paragraph start element, which will be the tag contents
            innerText,ok := decoder.Token().(xml.CharData)
            if !ok {
                continue
            }
            solutions = append(solutions, string(innerText))
        }
    }
}

This code is untested but should provide a decent starting point.

huangapple
  • 本文由 发表于 2017年8月22日 02:34:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/45803618.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定