2010年9月15日 08:09:52go评论120阅读模式

英文:

What does a basic DOM XML parser need?

问题

我开始使用谷歌的Go语言进行编程，我尝试编写的包是用于处理和创建DOCX文件的API（我对这个主题很熟悉，所以认为这是学习Go的好方法）。由于DOCX文件主要是一个包含各种XML文件的ZIP文件，所以我需要一个DOM XML解析器。然而，我找不到任何原生的Go DOM XML解析器，因为我看到的唯一的解析器似乎非常有限，可能是SAX解析器（如果有使用Go的人，请纠正我）。

所以上个周末，我编写了一个非常基本的DOM XML解析器，能够解析DOCX包中的一个较简单的XML文件，并将其完整地输出。目前我不打算处理命名空间、XSLT或模式验证支持，因为这些对于操作DOCX文件没有用处。我的问题是，还有哪些XML标准和功能对于解析器来说是重要的？

目前，它只是创建了一个元素和属性的树，我可以修改和保存。我目前还没有处理CDATA元素或XML转义字符（虽然这些很容易做，我会在本周末处理）。

英文:

I've started programming in Google's Go Language, and the package I'm attempting to write is an API for processing and creating DOCX files (I'm familiar with this topic and thought it would be a good way to learn Go). As DOCX files are primarly a ZIP file with various XML files inside them, I rather need a DOM XML parser. However, I was unable to find any native Go DOM XML Parsers, as the only ones I saw seemed to be very limited, and probably SAX parsers (anyone who uses Go, correct me if I'm wrong).

So this past weekend I wrote a very basic DOM XML parser that was able to parse one of the simpler XML files within the DOCX package and output it back intact. At the moment I'm not going to bother with Namespace, XSLT, or schema validation support, as those aren't useful for manipulating DOCX files. My question is, what other XML standards and functionality would be important to incorporate into the parser?

At the moment, it only really just creates a tree of elements and attributes, which I can modify and save. I'm not current handling CDATA elements or XML escape characters (though those would be easy to do and I'll get to that this weekend).

答案1

得分: 3

首先，如果您特别想要进行DOM解析，您需要实现DOM API。但我不确定您是否真的是这个意思；也许您只是想要一个生成XML树模型（"dom"）的XML解析器；或者只是一个XML解析器？DOM并不是唯一的方式。

此外，请注意，使用SAX解析器实现DOM树模型是最常见的方式；很少有DOM包内置解析器，通常解析器是单独公开的。

至于XML解析器的功能，我认为以下是必需的：

处理字符实体（ampersand和数字）、预定义的通用实体（lt、gt、apos、quot）
处理xml声明（）
处理各种输入编码；由xml声明或外部声明 -- 太多的解析器在这方面偷工减料，但这非常重要，因为xml文档可以可靠地在内部检测编码。
检查属性值的唯一性
检查元素的正确嵌套
跳过注释
跳过（如果不处理）处理指令
CDATA处理 -- 这很简单
跟踪行号以进行错误报告

其他可能有用的功能包括：

命名空间处理
检查字符的有效性，包括内容和名称
根据xml规范对换行符进行规范化

英文:

First of all: if you specifically want to do DOM parser, you need to implement DOM API. But I am not sure if you actually mean that; perhaps you just mean an XML parser that produces XML tree model ("dom"); or just an XML parser? DOM is hardly the only way.
Also note that implementing DOM tree model using SAX parser is the most common way; few if any DOM packages have embedded parsers, commonly parser is exposed separately.

As to XML parser features, some of things that are MUSTs in my opinion are:

Handling of character entities (ampersand and number), pre-defined general entities (lt, gt, apos, quot)
Handling of xml declaration (<?xml ... ?>)
Handling of various input encodings; declared by xml declaration or externally -- too many parsers skimp on this, but is very imporant since xml documents can reliably detect encoding internally.
Checking for uniqueness of attribute values
Checking for proper nesting of elements
Skipping of comments
Skippping (if not handling) of processing instructions
CDATA handling -- it's simple to do
Keeping track of line numbers for error reporting

Other eventually useful things are:

Namespace handling
Checking of character validity, both content and names
Normalization of lineefeds as per xml specification

答案2

得分: 1

你有没有看过Go的XML解析器？http://golang.org/pkg/xml/

如果它缺少你需要的功能，可能还是比自己编写更容易添加。

英文:

Have you looked at Go's XML parser? http://golang.org/pkg/xml/

If it is missing functionality you need, it's probably still easier to add than roll your own.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基本的DOM XML解析器需要什么？

问题

答案1

答案2

不使用cgo或SWIG时，不允许使用C源文件。

如何在Golang中添加带引号和斜杠的字符串

在一个select-case语句中从一个通道读取并写入另一个通道。

在Go语言中，可以使用for循环迭代返回的函数吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论