如何在Golang中遍历XML数据?

huangapple go评论88阅读模式
英文:

How to traverse through XML data in Golang?

问题

我已经使用xml.UnMarshal方法获取了一个结构体对象,但它有自己的限制。我需要一种方法,在不指定确切的xpath的情况下,可以获取节点内特定类型的所有后代。

例如,我有以下格式的xml数据:

<content>
    <p>this is content area</p>
    <animal>
        <p>This id dog</p>
        <dog>
           <p>tommy</p>
        </dog>
    </animal>
    <birds>
        <p>this is birds</p>
        <p>this is birds</p>
    </birds>
    <animal>
        <p>this is animals</p>
    </animal>
</content>

现在我想遍历上述xml并按顺序处理每个节点及其子节点。问题在于这个结构不是固定的,元素的顺序可能会改变。因此,我需要一种方法,可以像这样遍历:

While(Content.nextnode())
{
   switch(type of node)
   {
      //处理节点或深入遍历子节点
   }
}
英文:

I have used xml.UnMarshal method to get a struct object but it has it's own limitations. I need a way where I can get all the descendants of a particular type inside a node without specifying the exact xpath.

For example, I have an xml data of the following format:

&lt;content&gt;
    &lt;p&gt;this is content area&lt;/p&gt;
    &lt;animal&gt;
        &lt;p&gt;This id dog&lt;/p&gt;
        &lt;dog&gt;
           &lt;p&gt;tommy&lt;/p&gt;
        &lt;/dog&gt;
    &lt;/animal&gt;
    &lt;birds&gt;
        &lt;p&gt;this is birds&lt;/p&gt;
        &lt;p&gt;this is birds&lt;/p&gt;
    &lt;/birds&gt;
    &lt;animal&gt;
        &lt;p&gt;this is animals&lt;/p&gt;
    &lt;/animal&gt;
&lt;/content&gt;

Now I want to traverse through the above xml and process each node and it's children in that order. The problem is that this structure is not fixed and order of elements may change. So I need a way so that I can traverse like

While(Content.nextnode())
{
   switch(type of node)
   {
      //Process the node or traverse the child node deeper
   }
}

答案1

得分: 40

你可以使用原生的encoding/xml包来实现,通过使用递归结构和一个简单的遍历函数:

type Node struct {
    XMLName xml.Name
    Content []byte `xml:",innerxml"`
    Nodes   []Node `xml:",any"`
}

func walk(nodes []Node, f func(Node) bool) {
    for _, n := range nodes {
        if f(n) {
            walk(n.Nodes, f)
        }
    }
}

这是一个示例:http://play.golang.org/p/rv1LlxaHvK。

编辑: 这是一个带有属性的版本:

type Node struct {
    XMLName xml.Name
    Attrs   []xml.Attr `xml:",any,attr"`
    Content []byte     `xml:",innerxml"`
    Nodes   []Node     `xml:",any"`
}

func (n *Node) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    n.Attrs = start.Attr
    type node Node

    return d.DecodeElement((*node)(n), &start)
}

这是一个示例:https://play.golang.org/p/d9BkGclp-1。

英文:

You can do it with a vanilla encoding/xml by using a recursive struct and a simple walk function:

type Node struct {
	XMLName xml.Name
	Content []byte `xml:&quot;,innerxml&quot;`
	Nodes   []Node `xml:&quot;,any&quot;`
}

func walk(nodes []Node, f func(Node) bool) {
	for _, n := range nodes {
		if f(n) {
			walk(n.Nodes, f)
		}
	}
}

Playground example: http://play.golang.org/p/rv1LlxaHvK.


EDIT: Here's a version with attrs:

type Node struct {
	XMLName xml.Name
	Attrs   []xml.Attr `xml:&quot;,any,attr&quot;`
	Content []byte     `xml:&quot;,innerxml&quot;`
	Nodes   []Node     `xml:&quot;,any&quot;`
}

func (n *Node) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	n.Attrs = start.Attr
	type node Node

	return d.DecodeElement((*node)(n), &amp;start)
}

Playground: https://play.golang.org/p/d9BkGclp-1.

答案2

得分: 4

我对如何处理通用的XML DOM进行了一些搜索,最接近的方法是使用decoder.Token()decoder.RawToken()

然而,如果你愿意考虑使用一个库,我发现这个库非常容易上手:https://github.com/beevik/etree

doc := etree.NewDocument()
if err := doc.ReadFromFile("bookstore.xml"); err != nil {
    panic(err)
}

root := doc.SelectElement("bookstore")
fmt.Println("ROOT element:", root.Tag)

for _, book := range root.SelectElements("book") {
    fmt.Println("CHILD element:", book.Tag)
    if title := book.SelectElement("title"); title != nil {
        lang := title.SelectAttrValue("lang", "unknown")
        fmt.Printf("  TITLE: %s (%s)\n", title.Text(), lang)
    }
    for _, attr := range book.Attr {
        fmt.Printf("  ATTR: %s=%s\n", attr.Key, attr.Value)
    }
}

它使用内置的XML解析器,以上述方式进行解析。

英文:

I did a bit of search on how to deal with generic XML DOM and the closest you can do is use decoder.Token() or decoder.RawToken().

However if you're willing to consider a library I found this one to be very easy to pick up: https://github.com/beevik/etree

doc := etree.NewDocument()
if err := doc.ReadFromFile(&quot;bookstore.xml&quot;); err != nil {
    panic(err)
}

root := doc.SelectElement(&quot;bookstore&quot;)
fmt.Println(&quot;ROOT element:&quot;, root.Tag)

for _, book := range root.SelectElements(&quot;book&quot;) {
    fmt.Println(&quot;CHILD element:&quot;, book.Tag)
    if title := book.SelectElement(&quot;title&quot;); title != nil {
        lang := title.SelectAttrValue(&quot;lang&quot;, &quot;unknown&quot;)
        fmt.Printf(&quot;  TITLE: %s (%s)\n&quot;, title.Text(), lang)
    }
    for _, attr := range book.Attr {
        fmt.Printf(&quot;  ATTR: %s=%s\n&quot;, attr.Key, attr.Value)
    }
}

It uses the built-in xml parser with in the manner described above.

答案3

得分: 1

xmlquery 支持将 XML 文档解析为 DOM 树,以便遍历所有节点,类似于 Go 的 html 包。

英文:

xmlquery supports parse an XML document as DOM tree to traverse all nodes, like Go's html package.

答案4

得分: 0

由于您正在寻找一个库,并且似乎您想遍历XML树,我可以推荐XMLDom-Go,我在一些过去的项目中使用过它。

英文:

Since you are asking for a library and since it seems you would like to traverse the XML tree, i can recommend XMLDom-Go, i've used it on some past projects.

huangapple
  • 本文由 发表于 2015年5月15日 18:06:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/30256729.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定