英文:
How to parser huge xml in GO ignoring nested elements?
问题
我有这个XML示例:
<Report>
...
<ElementOne Blah="bleh">
<IgnoreElement>
<Foo>
...
</Foo>
</IgnoreElement>
<WantThisElement>
<Bar Baz="test">
...
</Bar>
<Bar Baz="test2">
...
</Bar>
</WantThisElement>
</ElementOne>
...
</Report>
我正在使用encoding/xml
解析它:
...
decoder := xml.NewDecoder(resp.Body)
Mystruct := MyStruct{}
for {
t, _ := decoder.Token()
if t == nil {
break
}
switch se := t.(type) {
case xml.StartElement:
if se.Name.Local == "ElementOne" {
decoder.DecodeElement(&Mystruct, &se)
}
}
}
...
type MyStruct struct{
Blah string
Bar []Bar
}
type Bar struct{
Baz string
...
}
我不确定这是否是最佳的方法,也不知道decoder.DecodeElement(...)
是否会忽略我不想解析的嵌套元素。我想通过低内存成本来提高性能。解析这些庞大的XML文件的最佳方法是什么?
英文:
I have this XML, for example:
<Report>
...
<ElementOne Blah="bleh">
<IgnoreElement>
<Foo>
...
</Foo>
</IgnoreElement>
<WantThisElement>
<Bar Baz="test">
...
</Bar>
<Bar Baz="test2">
...
</Bar>
</WantThisElement>
</ElementOne>
...
</Report>
And I'm parsing this with encode/xml:
...
decoder := xml.NewDecoder(resp.Body)
Mystruct := MyStruct{}
for {
t, _ := decoder.Token()
if t == nil {
break
}
switch se := t.(type) {
case xml.StartElement:
if se.Name.Local == "ElementOne" {
decoder.DecodeElement(&Mystruct, &se)
}
}
...
type MyStruct struct{
Blah string
Bar []Bar
}
type Bar struct{
Baz string
...
}
I'm not sure if it is the best way to do it and I don't know if the decoder.DecodeElement(...) ignoring the nested elements that I don't want to parse. I want to increase perfomance with low memory cost. What the best way to parser these huge XML files?
答案1
得分: 4
通常情况下,对于大型XML文件,最好使用XML解码器。它使用流和Go语言的选择性绑定(例如WantThisElement>Bar
),然后XML解码器会按照该路径进行解析。
让我们使用你提供的XML内容创建一个示例。
XML内容:
<Report>
<ElementOne Blah="bleh">
<IgnoreElement>
<Foo>
<FooValue>example foo value</FooValue>
</Foo>
</IgnoreElement>
<WantThisElement>
<Bar Baz="test">
<BarValue>example bar value 1</BarValue>
</Bar>
<Bar Baz="test2">
<BarValue>example bar value 2</BarValue>
</Bar>
</WantThisElement>
</ElementOne>
</Report>
结构体:
type Report struct {
XMLName xml.Name `xml:"Report"`
ElementOne ElementOne
}
type ElementOne struct {
XMLName xml.Name `xml:"ElementOne"`
Blah string `xml:"Blah,attr"`
Bar []Bar `xml:"WantThisElement>Bar"`
}
type Bar struct {
XMLName xml.Name `xml:"Bar"`
Baz string `xml:"Baz,attr"`
BarValue string `xml:"BarValue"`
}
Play链接:https://play.golang.org/p/26xDkojeUp
英文:
Typically it is best to use XML decoder for large XML, it uses the stream and Go with selective binding (like WantThisElement>Bar
) then XML decoder follows that path.
Let's use XML content from your question to create an example.
XML Content:
<Report>
<ElementOne Blah="bleh">
<IgnoreElement>
<Foo>
<FooValue>example foo value</FooValue>
</Foo>
</IgnoreElement>
<WantThisElement>
<Bar Baz="test">
<BarValue>example bar value 1</BarValue>
</Bar>
<Bar Baz="test2">
<BarValue>example bar value 2</BarValue>
</Bar>
</WantThisElement>
</ElementOne>
</Report>
Structures:
type Report struct {
XMLName xml.Name `xml:"Report"`
ElementOne ElementOne
}
type ElementOne struct {
XMLName xml.Name `xml:"ElementOne"`
Blah string `xml:"Blah,attr"`
Bar []Bar `xml:"WantThisElement>Bar"`
}
type Bar struct {
XMLName xml.Name `xml:"Bar"`
Baz string `xml:"Baz,attr"`
BarValue string `xml:"BarValue"`
}
Play Link: https://play.golang.org/p/26xDkojeUp
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论