2016年1月19日 00:01:40go评论101阅读模式

英文:

General XML parser in Go

问题

在Go语言中，读取XML文档有一些常用的方法。有类似于C#中的XmlDocument或XDocument的方法吗？

我找到的所有示例都展示了如何使用解组(unmarshaling)功能将XML读取到需要定义的对象中，但这需要花费很多时间，因为我需要定义很多我不打算使用的内容。

xml.Unmarshal(...)

另一种方法是使用顺序读取(forward only reading)：

xml.NewDecoder(xmlFile)

在这里有详细描述：http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/

英文:

Is there some general approach of reading XML document in Go? Something similar to XmlDocument or XDocument in C#?

All the examples I found show how to read using unmarshaling functionality into the objects I need to define, but it's quite time consuming as I need to define a lot of staff that I'm not going to use.

xml.Unmarshal(...)

Another approach is forward only reading using:

xml.NewDecoder(xmlFile)

Described here: http://blog.davidsingleton.org/parsing-huge-xml-files-with-go/

答案1

得分: 6

我找到的所有示例都展示了如何使用解组功能将数据读取到需要定义的对象中，但这需要花费很多时间，因为我需要定义很多我不打算使用的内容。

那么，不要定义你不打算使用的内容，只定义你打算使用的内容。你不必创建一个完全覆盖 XML 结构的 Go 模型。

假设你有一个如下的 XML：

<blog id="1234">
    <meta keywords="xml,parsing,partial" />
    <name>Partial XML parsing</name>
    <url>http://somehost.com/xml-blog</url>
    <entries count="2">
        <entry time="2016-01-19 08:40:00">
            <author>Bob</author>
            <content>First entry</content>
        </entry>
        <entry time="2016-01-19 08:30:00">
            <author>Alice</author>
            <content>Second entry</content>
        </entry>
    </entries>
</blog>

假设你只需要从这个 XML 中获取以下信息：

id
keywords
博客名称
作者名称

你可以使用以下结构来建模这些想要的信息：

type Data struct {
    Id      string   `xml:"id,attr"`
    Meta    struct {
        Keywords string `xml:"keywords,attr"`
    } `xml:"meta"`
    Name    string   `xml:"name"`
    Authors []string `xml:"entries>entry>author"`
}

现在，你可以使用以下代码仅解析这些信息：

d := Data{}
if err := xml.Unmarshal([]byte(s), &d); err != nil {
    panic(err)
}
fmt.Printf("%+v", d)

输出结果（在 Go Playground 上尝试）：

{Id:1234 Meta:{Keywords:xml,parsing,partial} Name:Partial XML parsing Authors:[Bob Alice]}

英文:

> All the examples I found show how to read using unmarshaling functionality into the objects I need to define, but it's quite time consuming as I need to define a lot of staff that I'm not going to use.

Then don't define what you're not going to use, define only what you're going to use. You don't have to create a Go model that perfectly covers the XML structure.

Let's assume you have an XML like this:

&lt;blog id=&quot;1234&quot;&gt;
	&lt;meta keywords=&quot;xml,parsing,partial&quot; /&gt;
	&lt;name&gt;Partial XML parsing&lt;/name&gt;
	&lt;url&gt;http://somehost.com/xml-blog&lt;/url&gt;
	&lt;entries count=&quot;2&quot;&gt;
		&lt;entry time=&quot;2016-01-19 08:40:00&quot;&gt;
			&lt;author&gt;Bob&lt;/author&gt;
			&lt;content&gt;First entry&lt;/content&gt;
		&lt;/entry&gt;
		&lt;entry time=&quot;2016-01-19 08:30:00&quot;&gt;
			&lt;author&gt;Alice&lt;/author&gt;
			&lt;content&gt;Second entry&lt;/content&gt;
		&lt;/entry&gt;
	&lt;/entries&gt;
&lt;/blog&gt;

And let's assume you only need the following info out of this XML:

id
keywords
blog name
authors names

You can model these wanted pieces of information with the following struct:

type Data struct {
	Id   string `xml:&quot;id,attr&quot;`
	Meta struct {
		Keywords string `xml:&quot;keywords,attr&quot;`
	} `xml:&quot;meta&quot;`
	Name    string   `xml:&quot;name&quot;`
	Authors []string `xml:&quot;entries&gt;entry&gt;author&quot;`
}

And now you can parse only these information with the following code:

d := Data{}
if err := xml.Unmarshal([]byte(s), &amp;d); err != nil {
	panic(err)
}
fmt.Printf(&quot;%+v&quot;, d)

Output (try it on the Go Playground):

{Id:1234 Meta:{Keywords:xml,parsing,partial} Name:Partial XML parsing Authors:[Bob Alice]}

答案2

得分: 3

好的，以下是翻译好的内容：

首先，你不必使用encoding/xml来定义映射到complex元素的Go类型来解析XML。相反，你可以纯粹地按过程解析XML文档，并且只对原始（非嵌套）元素调用xml.Unmarshal()，将它们解析为"primitive"类型的值（如string、int32或time.Time等）。

这当然会产生很多代码，但这只是从更动态的角度来解决同样的问题。为了理解我的意思，考虑将完全解析的XML文档表示为DOM对象的形式。要从中提取有用的数据，你必须以某种方式查询该对象或遍历整个树。使用你提到的博客文章中的方法，你在解析XML文档时遍历它，实质上将解析与查询/遍历结合在一起。

这种方法可能适用于你，也可能不适用，因为将XML格式的数据解析为特定方法的适用性高度取决于其结构和解析的预期结果。例如，如果你需要对文档执行多个查询，并且后续查询依赖于前面的查询结果，那么从该博客文章中的过程化解码方法几乎不起作用。

其次，存在其他的库。例如，看看xmltree和xmlpath。虽然这两个库都是用纯Go编写的，但也有一些包装libxml的包，例如goxml。使用它们，你可以选择DOM导向的解析方式。

另一种方法是使用mxj将XML解析为一组嵌套的键/值映射。

英文:

Well, two things.

First, you are not obliged to define Go types which map to complex elements to parse XML with nothing but encoding/xml.
On the contrary, you can parse XML documents purely procedurally and calling xml.Unmarshal() only on primitive (non-nested) elements—to parse them as values of "primitive" types (such as string or int32 or time.Time etc).

That would be a lot of code, for sure, but that's just approaching the same problem from a more dynamic angle. To understand what I mean, consider your fully-parsed XML document in the form of a DOM object. To extract useful data from it, you have to query that object somehow or iterate over the tree. With the approach the blog post you've referred to presents, you traverse your XML document as you parse it—essentially combining parsing with querying/traversing.

This may or may not work for you as applicability of a particular approach to parsing of XML-formatted datum highly depends on its structure and the intended outcome of its parsing. For instance, if you need to perform several queries over the document with the later queries depending on the former, procedural decoding from that blog post hardly works.

Second, alternative libraries exist. For instance, look at xmltree and xmlpath.
While these two are written in pure Go, there exist a couple of packages wrapping libxml, for instance, goxml. With them, you can have DOM-oriented parsing if you like.

Yet another approach is to parse XML into a set of nested key/value maps using mxj.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Go语言中的通用XML解析器

问题

答案1

答案2

抓取多个节点实例。

Go语言中的Go协程和并发问题

如何在Go语言的Fyne中隐藏任务栏？

encoding/json的Marshal函数默认按字母顺序对JSON进行编码吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论