解析带有命名空间的 XML 标签 – Golang

huangapple go评论105阅读模式
英文:

Unmarshal namespaced XML tags - Golang

问题

我正在尝试从一个.odt文档中提取元数据。

包含XML的文件名为meta.xml

它的结构如下:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" office:version="1.2">
  3. <office:meta>
  4. <meta:creation-date>2016-10-18T15:05:19.137453293</meta:creation-date>
  5. <dc:title>Document</dc:title>
  6. <dc:date>2017-01-17T00:59:04.731054728</dc:date>
  7. <meta:document-statistic meta:page-count="1" meta:word-count="0" meta:character-count="0"/>
  8. </office:meta>
  9. </office:document-meta>

我正在使用encoding/xml包中的xml.Unmarshal()函数解析XML并将值放入结构体字段中。

这是我使用的结构体:

  1. type Query struct {
  2. CreationDate string `xml:"meta:creation-date"`
  3. PageCount int `xml:"meta:document-statistic>meta:page-count,attr"`
  4. WordCount int `xml:"meta:document-statistic>meta:word-count,attr"`
  5. CharCount int `xml:"meta:document-statistic>meta:character-count,attr"`
  6. }

根据现有文档的理解,我可以使用类型为xml:"tag-name"的字段标签来获取XML标签的值,使用xml:"name,attr"来获取属性值。

然而,由于XML文件具有命名空间标签(例如meta:creation-date),到目前为止,我尝试的所有方法都导致结构体字段为空。不过,我可能做错了什么。

这是在Go Playground中的完整程序:
https://play.golang.org/p/n7C50l1gsV

英文:

I'm trying to extract metadata from an .odt document.

The file containing XML is called meta.xml

It has this structure:

  1. &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
  2. &lt;office:document-meta xmlns:office=&quot;urn:oasis:names:tc:opendocument:xmlns:office:1.0&quot; xmlns:dc=&quot;http://purl.org/dc/elements/1.1/&quot; xmlns:meta=&quot;urn:oasis:names:tc:opendocument:xmlns:meta:1.0&quot; office:version=&quot;1.2&quot;&gt;
  3. &lt;office:meta&gt;
  4. &lt;meta:creation-date&gt;2016-10-18T15:05:19.137453293&lt;/meta:creation-date&gt;
  5. &lt;dc:title&gt;Document&lt;/dc:title&gt;
  6. &lt;dc:date&gt;2017-01-17T00:59:04.731054728&lt;/dc:date&gt;
  7. &lt;meta:document-statistic meta:page-count=&quot;1&quot; meta:word-count=&quot;0&quot; meta:character-count=&quot;0&quot;/&gt;
  8. &lt;/office:meta&gt;
  9. &lt;/office:document-meta&gt;

I'm using xml.Unmarshal() from the encoding/xml package to parse the XML and place the values inside struct fields.

This is the struct I'm using:

  1. type Query struct {
  2. CreationDate string `xml:&quot;&quot;`
  3. PageCount int `xml:&quot;&quot;`
  4. WordCount int `xml:&quot;&quot;`
  5. CharCount int `xml:&quot;&quot;`
  6. }

What I understood from the available documentation is that I can use field tags of type xml:&quot;tag-name&quot; to get the value of the XML tag, and xml:name,attr to get the attribute value.

However, since the XML file has namespaced tags (e.g. meta:creation-date) everything I've tried so far leaves the struct fields empty. Although, I might be doing something wrong.

This is the entire program inside Go playground:
https://play.golang.org/p/n7C50l1gsV

答案1

得分: 3

你会发现,如果你的结构体与XML相对应,事情会更顺利。以下是可以帮助你入门的代码:

  1. package main
  2. import (
  3. "encoding/xml"
  4. "fmt"
  5. )
  6. type Stats struct {
  7. XMLName xml.Name
  8. PageCount int `xml:"page-count,attr"`
  9. }
  10. type Meta struct {
  11. XMLName xml.Name
  12. Date string `xml:"creation-date"`
  13. Title string `xml:"title"`
  14. Stats Stats `xml:"document-statistic"`
  15. }
  16. type DocumentMeta struct {
  17. XMLName xml.Name
  18. Meta Meta `xml:"meta"`
  19. }
  20. var data = []byte(`<?xml version="1.0" encoding="UTF-8"?>
  21. <office:document-meta>
  22. <office:meta>
  23. <meta:creation-date>2016-10-18T15:05:19.137453293</meta:creation-date>
  24. <dc:title>Document</dc:title>
  25. <dc:date>2017-01-17T00:59:04.731054728</dc:date>
  26. <meta:document-statistic meta:page-count="1" meta:word-count="0" meta:character-count="0"/>
  27. </office:meta>
  28. </office:document-meta>`)
  29. func main() {
  30. var dm DocumentMeta
  31. xml.Unmarshal(data, &dm)
  32. fmt.Println(dm)
  33. }

希望对你有帮助!

英文:

You will find that things go more smoothly if your structs map fairly directly to the XML. This code should get you started:

  1. package main
  2. import (
  3. &quot;encoding/xml&quot;
  4. &quot;fmt&quot;
  5. )
  6. type Stats struct {
  7. XMLName xml.Name
  8. PageCount int `xml:&quot;page-count,attr&quot;`
  9. }
  10. type Meta struct {
  11. XMLName xml.Name
  12. Date string `xml:&quot;creation-date&quot;`
  13. Title string `xml:&quot;title&quot;`
  14. Stats Stats `xml:&quot;document-statistic&quot;`
  15. }
  16. type DocumentMeta struct {
  17. XMLName xml.Name
  18. Meta Meta `xml:&quot;meta&quot;`
  19. }
  20. var data = []byte(`&lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
  21. &lt;office:document-meta&gt;
  22. &lt;office:meta&gt;
  23. &lt;meta:creation-date&gt;2016-10-18T15:05:19.137453293&lt;/meta:creation-date&gt;
  24. &lt;dc:title&gt;Document&lt;/dc:title&gt;
  25. &lt;dc:date&gt;2017-01-17T00:59:04.731054728&lt;/dc:date&gt;
  26. &lt;meta:document-statistic meta:page-count=&quot;1&quot; meta:word-count=&quot;0&quot; meta:character-count=&quot;0&quot;/&gt;
  27. &lt;/office:meta&gt;
  28. &lt;/office:document-meta&gt;`)
  29. func main() {
  30. var dm DocumentMeta
  31. xml.Unmarshal(data, &amp;dm)
  32. fmt.Println(dm)
  33. }

huangapple
  • 本文由 发表于 2017年2月14日 00:57:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/42209427.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定