英文:
General XML parser in Go
问题
在Go语言中,读取XML文档有一些常用的方法。有类似于C#中的XmlDocument或XDocument的方法吗?
我找到的所有示例都展示了如何使用解组(unmarshaling)功能将XML读取到需要定义的对象中,但这需要花费很多时间,因为我需要定义很多我不打算使用的内容。
xml.Unmarshal(...)
另一种方法是使用顺序读取(forward only reading):
xml.NewDecoder(xmlFile)
在这里有详细描述:http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/
英文:
Is there some general approach of reading XML document in Go? Something similar to XmlDocument or XDocument in C#?
All the examples I found show how to read using unmarshaling functionality into the objects I need to define, but it's quite time consuming as I need to define a lot of staff that I'm not going to use.
xml.Unmarshal(...)
Another approach is forward only reading using:
xml.NewDecoder(xmlFile)
Described here: http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/
答案1
得分: 6
我找到的所有示例都展示了如何使用解组功能将数据读取到需要定义的对象中,但这需要花费很多时间,因为我需要定义很多我不打算使用的内容。
那么,不要定义你不打算使用的内容,只定义你打算使用的内容。你不必创建一个完全覆盖 XML 结构的 Go 模型。
假设你有一个如下的 XML:
<blog id="1234">
<meta keywords="xml,parsing,partial" />
<name>Partial XML parsing</name>
<url>http://somehost.com/xml-blog</url>
<entries count="2">
<entry time="2016-01-19 08:40:00">
<author>Bob</author>
<content>First entry</content>
</entry>
<entry time="2016-01-19 08:30:00">
<author>Alice</author>
<content>Second entry</content>
</entry>
</entries>
</blog>
假设你只需要从这个 XML 中获取以下信息:
- id
- keywords
- 博客名称
- 作者名称
你可以使用以下结构来建模这些想要的信息:
type Data struct {
Id string `xml:"id,attr"`
Meta struct {
Keywords string `xml:"keywords,attr"`
} `xml:"meta"`
Name string `xml:"name"`
Authors []string `xml:"entries>entry>author"`
}
现在,你可以使用以下代码仅解析这些信息:
d := Data{}
if err := xml.Unmarshal([]byte(s), &d); err != nil {
panic(err)
}
fmt.Printf("%+v", d)
输出结果(在 Go Playground 上尝试):
{Id:1234 Meta:{Keywords:xml,parsing,partial} Name:Partial XML parsing Authors:[Bob Alice]}
英文:
> All the examples I found show how to read using unmarshaling functionality into the objects I need to define, but it's quite time consuming as I need to define a lot of staff that I'm not going to use.
Then don't define what you're not going to use, define only what you're going to use. You don't have to create a Go model that perfectly covers the XML structure.
Let's assume you have an XML like this:
<blog id="1234">
<meta keywords="xml,parsing,partial" />
<name>Partial XML parsing</name>
<url>http://somehost.com/xml-blog</url>
<entries count="2">
<entry time="2016-01-19 08:40:00">
<author>Bob</author>
<content>First entry</content>
</entry>
<entry time="2016-01-19 08:30:00">
<author>Alice</author>
<content>Second entry</content>
</entry>
</entries>
</blog>
And let's assume you only need the following info out of this XML:
- id
- keywords
- blog name
- authors names
You can model these wanted pieces of information with the following struct:
type Data struct {
Id string `xml:"id,attr"`
Meta struct {
Keywords string `xml:"keywords,attr"`
} `xml:"meta"`
Name string `xml:"name"`
Authors []string `xml:"entries>entry>author"`
}
And now you can parse only these information with the following code:
d := Data{}
if err := xml.Unmarshal([]byte(s), &d); err != nil {
panic(err)
}
fmt.Printf("%+v", d)
Output (try it on the Go Playground):
{Id:1234 Meta:{Keywords:xml,parsing,partial} Name:Partial XML parsing Authors:[Bob Alice]}
答案2
得分: 3
好的,以下是翻译好的内容:
首先,你不必使用encoding/xml
来定义映射到complex元素的Go类型来解析XML。相反,你可以纯粹地按过程解析XML文档,并且只对原始(非嵌套)元素调用xml.Unmarshal()
,将它们解析为"primitive"类型的值(如string
、int32
或time.Time
等)。
这当然会产生很多代码,但这只是从更动态的角度来解决同样的问题。为了理解我的意思,考虑将完全解析的XML文档表示为DOM对象的形式。要从中提取有用的数据,你必须以某种方式查询该对象或遍历整个树。使用你提到的博客文章中的方法,你在解析XML文档时遍历它,实质上将解析与查询/遍历结合在一起。
这种方法可能适用于你,也可能不适用,因为将XML格式的数据解析为特定方法的适用性高度取决于其结构和解析的预期结果。例如,如果你需要对文档执行多个查询,并且后续查询依赖于前面的查询结果,那么从该博客文章中的过程化解码方法几乎不起作用。
其次,存在其他的库。例如,看看xmltree
和xmlpath
。虽然这两个库都是用纯Go编写的,但也有一些包装libxml
的包,例如goxml
。使用它们,你可以选择DOM导向的解析方式。
另一种方法是使用mxj
将XML解析为一组嵌套的键/值映射。
英文:
Well, two things.
First, you are not obliged to define Go types which map to complex elements to parse XML with nothing but encoding/xml
.
On the contrary, you can parse XML documents purely procedurally and calling xml.Unmarshal()
only on primitive (non-nested) elements—to parse them as values of "primitive" types (such as string
or int32
or time.Time
etc).
That would be a lot of code, for sure, but that's just approaching the same problem from a more dynamic angle. To understand what I mean, consider your fully-parsed XML document in the form of a DOM object. To extract useful data from it, you have to query that object somehow or iterate over the tree. With the approach the blog post you've referred to presents, you traverse your XML document as you parse it—essentially combining parsing with querying/traversing.
This may or may not work for you as applicability of a particular approach to parsing of XML-formatted datum highly depends on its structure and the intended outcome of its parsing. For instance, if you need to perform several queries over the document with the later queries depending on the former, procedural decoding from that blog post hardly works.
Second, alternative libraries exist. For instance, look at xmltree
and xmlpath
.
While these two are written in pure Go, there exist a couple of packages wrapping libxml
, for instance, goxml
. With them, you can have DOM-oriented parsing if you like.
Yet another approach is to parse XML into a set of nested key/value maps using mxj
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论