使用全局命名空间进行解组

huangapple go评论78阅读模式
英文:

Unmarshal with global namespace

问题

我有以下的XML:

<rss version="2.0">
  <channel>
    ...
    <item>
      <link>http://stackoverflow.com</link>
      <atom:link xmlns:atom="http://www.w3.org/2005/Atom" href="http://stackoverflow.com"/>
      ...
    </item>
  </channel>
</rss>

我想提取link属性,我有以下的结构体:

type Item struct {
  Link string `xml:"http://www.w3.org/2005/Atom link"`
}

我知道,我需要一个前缀来获取Link,但是因为没有给出命名空间(以xmls属性的形式),我不知道该怎么做。

当然,我可以将所有的:*link属性保存到一个切片中,但我相信有更好的解决方案。

提前谢谢!

英文:

I have the following XML:

&lt;rss version=&quot;2.0&quot;&gt;
  &lt;channel&gt;
    ...
    &lt;item&gt;
      &lt;link&gt;http://stackoverflow.com&lt;/link&gt;
      &lt;atom:link xmlns:atom=&quot;http://www.w3.org/2005/Atom&quot; href=&quot;http://stackoverflow.com&quot;/&gt;
      ...
    &lt;/item&gt;
  &lt;/channel&gt;
&lt;/rss&gt;

I want to extract the link attribute, I have the following struct:

type Item struct {
  Link string `xml:&quot;http://www.w3.org/2005/Atom link&quot;`
}

I know, that I need a prefix to get the Link, but because there is no namespace given (in form of an xmls-Attribute, but I don't know, how.

I could, of course, save all :*link-Attributes to a slice, but I'm sure there is a better solution.

Thanks in advance!

答案1

得分: 1

标准库encoding/xml包中的命名空间处理似乎是一个大杂糅,并且具有相同名称的不同命名空间中的元素似乎是一个触发器。

理想情况下,您应该能够将给定的XML解码为以下结构:

type Rss struct {
    Items []Item `xml:"channel>item"`
}

type Item struct {
    Link     string   `xml:"link"`
    AtomLink AtomLink `xml:"http://www.w3.org/2005/Atom link"`
}

type AtomLink struct {
    Href string `xml:"href,attr"`
}

但是这会导致错误main.Item字段"Link"的标签"link"与字段"AtomLink"的标签"http://www.w3.org/2005/Atom link"冲突(如http://play.golang.org/p/LgW-vm4euL中所示)。

然而,如果我们决定忽略<atom:link>元素,即将Item.AtomLink字段注释掉,我们最终解码得到一个空字符串,因为xml:"link"匹配任何命名空间中的<link>元素,而不仅仅是空命名空间。最后的<atom:link>元素是空的,所以不返回任何内容。

一些可能的解决方法包括:

  1. 仅尝试解码<atom:link>元素,因为它可以唯一选择。如果您还要处理不带Atom命名空间元素的RSS源,这可能不太有用。

  2. 通过修改Item结构来收集所有<link>元素的内容:

Links []string `xml:"link"`

然后丢弃切片中的任何空字符串。

归根结底,该包需要一种引用空命名空间的方式。为了保持现有程序的功能,这可能需要新的语法。

英文:

The namespace handling in the standard library encoding/xml package seems to be a big ad-hoc, and having elements in different namespaces with the same name seems to be a trigger.

Ideally you'd be able to decode the given XML into the following structures:

type Rss struct {
	Items []Item `xml:&quot;channel&gt;item&quot;`
}

type Item struct {
	Link     string   `xml:&quot;link&quot;`
	AtomLink AtomLink `xml:&quot;http://www.w3.org/2005/Atom link&quot;`
}

type AtomLink struct {
	Href string `xml:&quot;href,attr&quot;`
}

But this results in the error main.Item field &quot;Link&quot; with tag &quot;link&quot; conflicts with field &quot;AtomLink&quot; with tag &quot;http://www.w3.org/2005/Atom link&quot; (as seen in http://play.golang.org/p/LgW-vm4euL).

However, if we decide that we want to ignore the &lt;atom:link&gt; element by commenting out the Item.AtomLink field, we end up decoding an empty string, since xml:&quot;link&quot; matches &lt;link&gt; elements in any namespace rather than just the blank namespace. The final &lt;atom:link&gt; element is empty, so doesn't return anything.

A couple of possible work arounds include:

  1. Only try to decode the &lt;atom:link&gt; element, since it can be selected uniquely. This may not be useful if you're also processing RSS feeds without Atom namespace elements.

  2. Collect the contents of all &lt;link&gt; elements by modifying the Item struct to use:

    Links []string `xml:&quot;link&quot;`
    

    And then discard any empty strings in the slice.

At the end of the day, the package will need some way to refer to the blank namespace. That may require new syntax in order to keep existing programs functioning though.

huangapple
  • 本文由 发表于 2014年4月11日 12:26:49
  • 转载请务必保留本文链接:https://go.coder-hub.com/23003521.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定