在Golang中解析递归XML

huangapple go评论83阅读模式
英文:

Parsing recursive XML in Golang

问题

我正在尝试解析一个类似以下结构的XML文档(请注意,实际文档中还有许多其他标签,我能够成功解析)。

<process name="p" id="1234">
  <app>
    <element name="ele-1">
      <id>84594</id>
      <element name="ele-11">
        <id>95065</id>
        <element name="ele-111">
          <id>5065</id>
        </element> //ele-111
      </element> //ele-11
      <element name="ele-12">
        <id>4954</id>
      </element> //ele-12
    </element> //ele-1
  </app>
  .
  .
  .
  //这里还有许多其他正常工作的标签
</process>

我添加了ele-#以便更容易看到每个元素的起始和结束位置。一旦我们在最顶层的元素(ele-1)下面,这些嵌套元素可以无限深入。

这意味着ele-1将有一系列嵌套元素,每个元素可能有或没有一系列嵌套元素,然后每个元素可能有或没有...

我想在Go中解析这个结构。最佳方法是什么?

我定义了如下结构:

type ProcessDef struct {
	Process xml.Name `xml:"process"`
	Name    string   `xml:"name,attr"`
	Id      string   `xml:"id,attr"`
	App     AppDef   `xml:"app"`
}

type AppDef struct {
	App      xml.Name     `xml:"app"`
	Elements []ElementDef `xml:"element"`
}

type ElementDef struct {
	Element  xml.Name `xml:"element"`
	Name     string   `xml:"name,attr"`
	Id       string   `xml:"id"`
	Elements []ElementDef
}

虽然它可以读取并填充AppDef结构中第一个(最顶层)元素的值,但是一旦我解析XML,我无法使用ElementDef结构中的嵌套递归定义解析并获取所有子元素的值。

而且,当我尝试在ElementDef结构的切片字段上添加xml注释,像这样- Elements []ElementDef xml:"element"时,我收到一个警告struct field Elements repeats xml tag "element"

我有点迷失在如何以最佳方式在Go中解析这样的XML文档中。

英文:

I am trying to parse an XML that (kind of) looks like the following - (Please keep in mind that actual documents have many other tags which I am able to parse successfully.)

&lt;process name=&quot;p&quot; id=&quot;1234&quot;&gt;
&lt;app&gt;
 &lt;element name=&quot;ele-1&quot;&gt;
   &lt;id&gt;84594&lt;/id&gt;
   &lt;element name=&quot;ele-11&quot;&gt;
     &lt;id&gt;95065&lt;/id&gt;
     &lt;element name=&quot;ele-111&quot;&gt;
       &lt;id&gt;5065&lt;/id&gt;
     &lt;/element&gt; //ele-111
   &lt;/element&gt; //ele-11
   &lt;element name=&quot;ele-12&quot;&gt;
     &lt;id&gt;4954&lt;/id&gt;
   &lt;/element&gt; //ele-12
 &lt;/element&gt;//ele-1
&lt;/app&gt;
.
.
.
//Many other tags here which are working fine
&lt;/process&gt;

I have added the ele-# so that it is easy to see where one starts and where one ends. This nested elements can go until arbitrary depth once we are below the top most one(ele-1).

Which means ele-1 will have a list of nested elements and each of them may or may not have a list of nested elements, and then each of them may or may not ...

I want to parse this structure in Go. What is the best way?

I have defined something like this

type ProcessDef struct {
	Process xml.Name `xml:&quot;process&quot;`
	Name    string   `xml:&quot;name,attr&quot;`
	Id      string   `xml:&quot;id,attr&quot;`
	App     AppDef   `xml:&quot;app&quot;`
}

type AppDef struct {
	App      xml.Name     `xml:&quot;app&quot;`
	Elements []ElementDef `xml:&quot;element&quot;`
}

type ElementDef struct {
	Element  xml.Name `xml:&quot;element&quot;`
	Name     string   `xml:&quot;name,attr&quot;`
	Id       string   `xml:&quot;id&quot;`
	Elements []ElementDef
}

Although it reads and populates the value of the first (top level) element in the AppDef struct once I parse the XML, I am unable to parse and get the values of the all the children using the nested recursive definition in the ElementDef struct.

Also when I was trying to add the xml annotation at the slice field at the ElementDef struct, like so - Elements []ElementDef xml:&quot;element&quot; I was getting a warning struct field Elements repeats xml tag &quot;element&quot;

I am a bit lost in order to how to parse such an XML document the best way possible in Go.

答案1

得分: 4

你可以将ElementDef中的xml.Name移除,并将Elements字段添加xml:"element"标签。

type ElementDef struct {
	Name     string       `xml:"name,attr"`
	Id       string       `xml:"id"`
	Elements []ElementDef `xml:"element"`
}

或者,你可以保留xml.Name字段,移除标签,并将其重命名为XMLName。根据文档的说明:"如果结构体有一个名为XMLName且类型为Name的字段,Unmarshal会将元素名称记录在该字段中。"

type ElementDef struct {
	XMLName  xml.Name
	Name     string       `xml:"name,attr"`
	Id       string       `xml:"id"`
	Elements []ElementDef `xml:"element"`
}
英文:

You can remove the xml.Name from ElementDef and add the xml:&quot;element&quot; tag to the Elements field.

type ElementDef struct {
	Name     string       `xml:&quot;name,attr&quot;`
	Id       string       `xml:&quot;id&quot;`
	Elements []ElementDef `xml:&quot;element&quot;`
}

https://play.golang.org/p/SeQBRS_rdhf

Or, alternatively, you can keep the xml.Name field, remove the tag, and rename it to XMLName. As the docs state: "If the struct has a field named XMLName of type Name, Unmarshal records the element name in that field."

type ElementDef struct {
	XMLName  xml.Name
	Name     string       `xml:&quot;name,attr&quot;`
	Id       string       `xml:&quot;id&quot;`
	Elements []ElementDef `xml:&quot;element&quot;`
}

https://play.golang.org/p/WUo1GNYa2t0

huangapple
  • 本文由 发表于 2021年7月11日 18:27:14
  • 转载请务必保留本文链接:https://go.coder-hub.com/68335342.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定