英文:
Unmarshal namespaced XML tags - Golang
问题
我正在尝试从一个.odt
文档中提取元数据。
包含XML的文件名为meta.xml
。
它的结构如下:
<?xml version="1.0" encoding="UTF-8"?>
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" office:version="1.2">
<office:meta>
<meta:creation-date>2016-10-18T15:05:19.137453293</meta:creation-date>
<dc:title>Document</dc:title>
<dc:date>2017-01-17T00:59:04.731054728</dc:date>
<meta:document-statistic meta:page-count="1" meta:word-count="0" meta:character-count="0"/>
</office:meta>
</office:document-meta>
我正在使用encoding/xml
包中的xml.Unmarshal()
函数解析XML并将值放入结构体字段中。
这是我使用的结构体:
type Query struct {
CreationDate string `xml:"meta:creation-date"`
PageCount int `xml:"meta:document-statistic>meta:page-count,attr"`
WordCount int `xml:"meta:document-statistic>meta:word-count,attr"`
CharCount int `xml:"meta:document-statistic>meta:character-count,attr"`
}
根据现有文档的理解,我可以使用类型为xml:"tag-name"
的字段标签来获取XML标签的值,使用xml:"name,attr"
来获取属性值。
然而,由于XML文件具有命名空间标签(例如meta:creation-date
),到目前为止,我尝试的所有方法都导致结构体字段为空。不过,我可能做错了什么。
这是在Go Playground中的完整程序:
https://play.golang.org/p/n7C50l1gsV
英文:
I'm trying to extract metadata from an .odt
document.
The file containing XML is called meta.xml
It has this structure:
<?xml version="1.0" encoding="UTF-8"?>
<office:document-meta xmlns:office="urn:oasis:names:tc:opendocument:xmlns:office:1.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:meta="urn:oasis:names:tc:opendocument:xmlns:meta:1.0" office:version="1.2">
<office:meta>
<meta:creation-date>2016-10-18T15:05:19.137453293</meta:creation-date>
<dc:title>Document</dc:title>
<dc:date>2017-01-17T00:59:04.731054728</dc:date>
<meta:document-statistic meta:page-count="1" meta:word-count="0" meta:character-count="0"/>
</office:meta>
</office:document-meta>
I'm using xml.Unmarshal()
from the encoding/xml
package to parse the XML and place the values inside struct fields.
This is the struct I'm using:
type Query struct {
CreationDate string `xml:""`
PageCount int `xml:""`
WordCount int `xml:""`
CharCount int `xml:""`
}
What I understood from the available documentation is that I can use field tags of type xml:"tag-name"
to get the value of the XML tag, and xml:name,attr
to get the attribute value.
However, since the XML file has namespaced tags (e.g. meta:creation-date
) everything I've tried so far leaves the struct fields empty. Although, I might be doing something wrong.
This is the entire program inside Go playground:
https://play.golang.org/p/n7C50l1gsV
答案1
得分: 3
你会发现,如果你的结构体与XML相对应,事情会更顺利。以下是可以帮助你入门的代码:
package main
import (
"encoding/xml"
"fmt"
)
type Stats struct {
XMLName xml.Name
PageCount int `xml:"page-count,attr"`
}
type Meta struct {
XMLName xml.Name
Date string `xml:"creation-date"`
Title string `xml:"title"`
Stats Stats `xml:"document-statistic"`
}
type DocumentMeta struct {
XMLName xml.Name
Meta Meta `xml:"meta"`
}
var data = []byte(`<?xml version="1.0" encoding="UTF-8"?>
<office:document-meta>
<office:meta>
<meta:creation-date>2016-10-18T15:05:19.137453293</meta:creation-date>
<dc:title>Document</dc:title>
<dc:date>2017-01-17T00:59:04.731054728</dc:date>
<meta:document-statistic meta:page-count="1" meta:word-count="0" meta:character-count="0"/>
</office:meta>
</office:document-meta>`)
func main() {
var dm DocumentMeta
xml.Unmarshal(data, &dm)
fmt.Println(dm)
}
希望对你有帮助!
英文:
You will find that things go more smoothly if your structs map fairly directly to the XML. This code should get you started:
package main
import (
"encoding/xml"
"fmt"
)
type Stats struct {
XMLName xml.Name
PageCount int `xml:"page-count,attr"`
}
type Meta struct {
XMLName xml.Name
Date string `xml:"creation-date"`
Title string `xml:"title"`
Stats Stats `xml:"document-statistic"`
}
type DocumentMeta struct {
XMLName xml.Name
Meta Meta `xml:"meta"`
}
var data = []byte(`<?xml version="1.0" encoding="UTF-8"?>
<office:document-meta>
<office:meta>
<meta:creation-date>2016-10-18T15:05:19.137453293</meta:creation-date>
<dc:title>Document</dc:title>
<dc:date>2017-01-17T00:59:04.731054728</dc:date>
<meta:document-statistic meta:page-count="1" meta:word-count="0" meta:character-count="0"/>
</office:meta>
</office:document-meta>`)
func main() {
var dm DocumentMeta
xml.Unmarshal(data, &dm)
fmt.Println(dm)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论