英文:
Unmarshal with global namespace
问题
我有以下的XML:
<rss version="2.0">
<channel>
...
<item>
<link>http://stackoverflow.com</link>
<atom:link xmlns:atom="http://www.w3.org/2005/Atom" href="http://stackoverflow.com"/>
...
</item>
</channel>
</rss>
我想提取link
属性,我有以下的结构体:
type Item struct {
Link string `xml:"http://www.w3.org/2005/Atom link"`
}
我知道,我需要一个前缀来获取Link,但是因为没有给出命名空间(以xmls
属性的形式),我不知道该怎么做。
当然,我可以将所有的:*link
属性保存到一个切片中,但我相信有更好的解决方案。
提前谢谢!
英文:
I have the following XML:
<rss version="2.0">
<channel>
...
<item>
<link>http://stackoverflow.com</link>
<atom:link xmlns:atom="http://www.w3.org/2005/Atom" href="http://stackoverflow.com"/>
...
</item>
</channel>
</rss>
I want to extract the link
attribute, I have the following struct:
type Item struct {
Link string `xml:"http://www.w3.org/2005/Atom link"`
}
I know, that I need a prefix to get the Link, but because there is no namespace given (in form of an xmls
-Attribute, but I don't know, how.
I could, of course, save all :*link
-Attributes to a slice, but I'm sure there is a better solution.
Thanks in advance!
答案1
得分: 1
标准库encoding/xml
包中的命名空间处理似乎是一个大杂糅,并且具有相同名称的不同命名空间中的元素似乎是一个触发器。
理想情况下,您应该能够将给定的XML解码为以下结构:
type Rss struct {
Items []Item `xml:"channel>item"`
}
type Item struct {
Link string `xml:"link"`
AtomLink AtomLink `xml:"http://www.w3.org/2005/Atom link"`
}
type AtomLink struct {
Href string `xml:"href,attr"`
}
但是这会导致错误main.Item字段"Link"的标签"link"与字段"AtomLink"的标签"http://www.w3.org/2005/Atom link"冲突
(如http://play.golang.org/p/LgW-vm4euL中所示)。
然而,如果我们决定忽略<atom:link>
元素,即将Item.AtomLink
字段注释掉,我们最终解码得到一个空字符串,因为xml:"link"
匹配任何命名空间中的<link>
元素,而不仅仅是空命名空间。最后的<atom:link>
元素是空的,所以不返回任何内容。
一些可能的解决方法包括:
-
仅尝试解码
<atom:link>
元素,因为它可以唯一选择。如果您还要处理不带Atom命名空间元素的RSS源,这可能不太有用。 -
通过修改
Item
结构来收集所有<link>
元素的内容:
Links []string `xml:"link"`
然后丢弃切片中的任何空字符串。
归根结底,该包需要一种引用空命名空间的方式。为了保持现有程序的功能,这可能需要新的语法。
英文:
The namespace handling in the standard library encoding/xml
package seems to be a big ad-hoc, and having elements in different namespaces with the same name seems to be a trigger.
Ideally you'd be able to decode the given XML into the following structures:
type Rss struct {
Items []Item `xml:"channel>item"`
}
type Item struct {
Link string `xml:"link"`
AtomLink AtomLink `xml:"http://www.w3.org/2005/Atom link"`
}
type AtomLink struct {
Href string `xml:"href,attr"`
}
But this results in the error main.Item field "Link" with tag "link" conflicts with field "AtomLink" with tag "http://www.w3.org/2005/Atom link"
(as seen in http://play.golang.org/p/LgW-vm4euL).
However, if we decide that we want to ignore the <atom:link>
element by commenting out the Item.AtomLink
field, we end up decoding an empty string, since xml:"link"
matches <link>
elements in any namespace rather than just the blank namespace. The final <atom:link>
element is empty, so doesn't return anything.
A couple of possible work arounds include:
-
Only try to decode the
<atom:link>
element, since it can be selected uniquely. This may not be useful if you're also processing RSS feeds without Atom namespace elements. -
Collect the contents of all
<link>
elements by modifying theItem
struct to use:Links []string `xml:"link"`
And then discard any empty strings in the slice.
At the end of the day, the package will need some way to refer to the blank namespace. That may require new syntax in order to keep existing programs functioning though.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论