如何在Go中解析大型XML并忽略嵌套元素?

huangapple go评论74阅读模式
英文:

How to parser huge xml in GO ignoring nested elements?

问题

我有这个XML示例:

<Report>
    ...
    <ElementOne Blah="bleh">
        <IgnoreElement>
            <Foo>
               ...
            </Foo>
        </IgnoreElement>

        <WantThisElement>
            <Bar Baz="test">
               ...
            </Bar>
            <Bar Baz="test2">
               ...
            </Bar>
        </WantThisElement>
    </ElementOne>
    ...
</Report>

我正在使用encoding/xml解析它:

...
decoder := xml.NewDecoder(resp.Body)
Mystruct := MyStruct{}
for {
    t, _ := decoder.Token()

    if t == nil {
        break
    }
    switch se := t.(type) {
    case xml.StartElement:
        if se.Name.Local == "ElementOne" {
            decoder.DecodeElement(&Mystruct, &se)
        }
    }
}
...

type MyStruct struct{
    Blah string
    Bar []Bar
}
type Bar struct{
    Baz string
    ...
}

我不确定这是否是最佳的方法,也不知道decoder.DecodeElement(...)是否会忽略我不想解析的嵌套元素。我想通过低内存成本来提高性能。解析这些庞大的XML文件的最佳方法是什么?

英文:

I have this XML, for example:

     &lt;Report&gt;
        ...
        &lt;ElementOne Blah=&quot;bleh&quot;&gt;
            &lt;IgnoreElement&gt;
                &lt;Foo&gt;
                   ...
                &lt;/Foo&gt;
            &lt;/IgnoreElement&gt;

            &lt;WantThisElement&gt;
                &lt;Bar Baz=&quot;test&quot;&gt;
                   ...
                &lt;/Bar&gt;
                &lt;Bar Baz=&quot;test2&quot;&gt;
                   ...
                &lt;/Bar&gt;
            &lt;/WantThisElement&gt;
        &lt;/ElementOne&gt;
        ...
    &lt;/Report&gt;

And I'm parsing this with encode/xml:

    ... 
    decoder := xml.NewDecoder(resp.Body)
    Mystruct := MyStruct{}
    for {
	t, _ := decoder.Token()

	if t == nil {
		break
	}
	switch se := t.(type) {
	case xml.StartElement:
		if se.Name.Local == &quot;ElementOne&quot; {
			decoder.DecodeElement(&amp;Mystruct, &amp;se)
		}
	}
    ...



   type MyStruct struct{
        Blah string
        Bar []Bar
   }
   type Bar struct{
        Baz string
        ...
   }

I'm not sure if it is the best way to do it and I don't know if the decoder.DecodeElement(...) ignoring the nested elements that I don't want to parse. I want to increase perfomance with low memory cost. What the best way to parser these huge XML files?

答案1

得分: 4

通常情况下,对于大型XML文件,最好使用XML解码器。它使用流和Go语言的选择性绑定(例如WantThisElement>Bar),然后XML解码器会按照该路径进行解析。

让我们使用你提供的XML内容创建一个示例。

XML内容:

<Report>
    <ElementOne Blah="bleh">
        <IgnoreElement>
            <Foo>
		        <FooValue>example foo value</FooValue>
            </Foo>
        </IgnoreElement>

        <WantThisElement>
            <Bar Baz="test">
		         <BarValue>example bar value 1</BarValue>
            </Bar>
            <Bar Baz="test2">
		        <BarValue>example bar value 2</BarValue>
            </Bar>
        </WantThisElement>
    </ElementOne>
</Report>

结构体:

type Report struct {
	XMLName    xml.Name `xml:"Report"`
	ElementOne ElementOne
}

type ElementOne struct {
	XMLName xml.Name `xml:"ElementOne"`
	Blah    string   `xml:"Blah,attr"`
	Bar     []Bar    `xml:"WantThisElement>Bar"`
}

type Bar struct {
	XMLName  xml.Name `xml:"Bar"`
	Baz      string   `xml:"Baz,attr"`
	BarValue string   `xml:"BarValue"`
}

Play链接:https://play.golang.org/p/26xDkojeUp

英文:

Typically it is best to use XML decoder for large XML, it uses the stream and Go with selective binding (like WantThisElement&gt;Bar) then XML decoder follows that path.

Let's use XML content from your question to create an example.

XML Content:

&lt;Report&gt;
    &lt;ElementOne Blah=&quot;bleh&quot;&gt;
        &lt;IgnoreElement&gt;
            &lt;Foo&gt;
		        &lt;FooValue&gt;example foo value&lt;/FooValue&gt;
            &lt;/Foo&gt;
        &lt;/IgnoreElement&gt;

        &lt;WantThisElement&gt;
            &lt;Bar Baz=&quot;test&quot;&gt;
		         &lt;BarValue&gt;example bar value 1&lt;/BarValue&gt;
            &lt;/Bar&gt;
            &lt;Bar Baz=&quot;test2&quot;&gt;
		        &lt;BarValue&gt;example bar value 2&lt;/BarValue&gt;
            &lt;/Bar&gt;
        &lt;/WantThisElement&gt;
    &lt;/ElementOne&gt;
&lt;/Report&gt;

Structures:

type Report struct {
	XMLName    xml.Name `xml:&quot;Report&quot;`
	ElementOne ElementOne
}

type ElementOne struct {
	XMLName xml.Name `xml:&quot;ElementOne&quot;`
	Blah    string   `xml:&quot;Blah,attr&quot;`
	Bar     []Bar    `xml:&quot;WantThisElement&gt;Bar&quot;`
}

type Bar struct {
	XMLName  xml.Name `xml:&quot;Bar&quot;`
	Baz      string   `xml:&quot;Baz,attr&quot;`
	BarValue string   `xml:&quot;BarValue&quot;`
}

Play Link: https://play.golang.org/p/26xDkojeUp

huangapple
  • 本文由 发表于 2017年8月15日 04:24:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/45682512.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定