在Go语言中解析XML时处理命名空间

huangapple go评论86阅读模式
英文:

Dealing with namespaces while parsing XML in Go

问题

我正在尝试在Go中解析一段XML代码:

package main

import (
	"encoding/xml"
	"fmt"
)

type XML struct {
	Foo string `xml:"foo"`
}

func main() {
	rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

	x := new(XML)
	xml.Unmarshal(rawXML, x)
	fmt.Printf("foo: %s\n", x.Foo)
}

这将输出:

foo: B

而我期望它输出:

foo: A

如何获取第一个foo标签的内容(即没有命名空间的标签)?

英文:

I am trying to parse a piece if XML in Go:

package main

import (
	&quot;encoding/xml&quot;
	&quot;fmt&quot;
)

type XML struct {
	Foo string `xml:&quot;foo&quot;`
}

func main() {
	rawXML := []byte(`
&lt;xml&gt;
  &lt;foo&gt;A&lt;/foo&gt;
  &lt;ns:foo&gt;B&lt;/ns:foo&gt;
&lt;/xml&gt;`)

	x := new(XML)
	xml.Unmarshal(rawXML, x)
	fmt.Printf(&quot;foo: %s\n&quot;, x.Foo)
}

This outputs:

foo: B

While I expected it to produce:

foo: A

How do I get content of the first foo tag (i.e. one without namespace)?

答案1

得分: 8

我不认为XML解码器可以使用结构标签指定一个元素不应该有命名空间。但是我知道它可以为您检索有关命名空间的信息,然后您可以在数据后进行后处理以获得相同的结果:

package main

import (
	"encoding/xml"
	"fmt"
)

type Foo struct {
	XMLName xml.Name
	Data    string `xml:",chardata"`
}

type XML struct {
	Foo []Foo `xml:"foo"`
}

func main() {
	rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

	x := new(XML)
	xml.Unmarshal(rawXML, x)
	//fmt.Printf("foo: %#v\n", x)
	for _, el := range x.Foo {
		if el.XMLName.Space == "" {
			fmt.Printf("non namespaced foo %q", el.Data)
		}
	}
}

http://play.golang.org/p/aDEFPmHPc0

英文:

I don't think the xml decoder can specify an element should have no namespace with struct tags. But I do know that it can retrieve the information about the namespaces for you and you could then post process the data after to get the same result:

package main

import (
	&quot;encoding/xml&quot;
	&quot;fmt&quot;
)

type Foo struct {
	XMLName xml.Name
	Data string `xml:&quot;,chardata&quot;`
}

type XML struct {
	Foo []Foo `xml:&quot;foo&quot;`
}

func main() {
	rawXML := []byte(`
&lt;xml&gt;
  &lt;foo&gt;A&lt;/foo&gt;
  &lt;ns:foo&gt;B&lt;/ns:foo&gt;
&lt;/xml&gt;`)

	x := new(XML)
	xml.Unmarshal(rawXML, x)
	//fmt.Printf(&quot;foo: %#v\n&quot;, x)
	for _, el := range x.Foo {
	   if el.XMLName.Space == &quot;&quot; {
	      fmt.Printf(&quot;non namespaced foo %q&quot;, el.Data)
	  }
	}
}

http://play.golang.org/p/aDEFPmHPc0

答案2

得分: 4

你的xml文档中有两个连续的值。你的结构体只有一个值的空间。xml解析器会解析第一个值,然后用第二个值覆盖它。

将结构体中的Foo改为切片,然后你就可以得到两个值。

http://play.golang.org/p/BRgsuMQ7rK

package main

import (
    "encoding/xml"
    "fmt"
)

type XML struct {
    Foo []string `xml:"foo"`
}

func main() {
    rawXML := []byte(`
<xml>
  <foo>A</foo>
  <ns:foo>B</ns:foo>
</xml>`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    fmt.Printf("foo: %s\n", x.Foo[0])
    fmt.Printf("both: %v\n", x.Foo)
}
英文:

You have two values in series in your xml document. You only have room for one value in your struct. The xml parser is parsing the first one and then overwriting it with the second one.

Change Foo to a slice in the struct and then you'll get both values.

http://play.golang.org/p/BRgsuMQ7rK

package main

import (
    &quot;encoding/xml&quot;
    &quot;fmt&quot;
)

type XML struct {
    Foo []string `xml:&quot;foo&quot;`
}

func main() {
    rawXML := []byte(`
&lt;xml&gt;
  &lt;foo&gt;A&lt;/foo&gt;
  &lt;ns:foo&gt;B&lt;/ns:foo&gt;
&lt;/xml&gt;`)

    x := new(XML)
    xml.Unmarshal(rawXML, x)
    fmt.Printf(&quot;foo: %s\n&quot;, x.Foo[0])
    fmt.Printf(&quot;both: %v\n&quot;, x.Foo)
}

答案3

得分: 1

xml:&quot;foo&quot;选择器语法可以使用可选的命名空间xml:&quot;ns foo&quot;,但问题是它不支持选择没有命名空间的情况。

一种修复方法是使用xml.Decoder.DefaultSpace来为无命名空间的标签分配一个命名空间,然后可以使用xml:&quot;&lt;ns&gt; &lt;tag&gt;&quot;语法进行选择:

https://play.golang.org/p/1UggvqLFT9x

import (
	&quot;encoding/xml&quot;
	&quot;strings&quot;
	&quot;fmt&quot;
)

type Doc struct {
	Foo string `xml:&quot;_ foo&quot;` // &lt;-- &lt;foo&gt; will now be &lt;_:foo&gt;
	NsFoo string `xml:&quot;ns foo&quot;`
}

var input = `&lt;xml&gt;
  &lt;foo&gt;A&lt;/foo&gt;
  &lt;ns:foo&gt;B&lt;/ns:foo&gt;
&lt;/xml&gt;`

func main() {
	decoder := xml.NewDecoder(strings.NewReader(input))
	decoder.DefaultSpace = &quot;_&quot;

	doc := &amp;Doc{}
	decoder.Decode(doc)
	
	fmt.Printf(&quot;&lt;foo&gt;: %#v\n&quot;, doc.Foo)
	fmt.Printf(&quot;&lt;ns:foo&gt;: %#v\n&quot;, doc.NsFoo)
	
}

输出结果:

&lt;foo&gt;: A
&lt;ns:foo&gt;: B
英文:

The xml:&quot;foo&quot; selector syntax takes an optional namespace xml:&quot;ns foo&quot;, but the problem is that it doesn't support a way to select for no namespace.

One fix is to use xml.Decoder.DefaultSpace to simply assign a namespace to non-namespaced tags that you can now select using xml:&quot;&lt;ns&gt; &lt;tag&gt;&quot; syntax:

https://play.golang.org/p/1UggvqLFT9x

import (
	&quot;encoding/xml&quot;
	&quot;strings&quot;
	&quot;fmt&quot;
)

type Doc struct {
	Foo string `xml:&quot;_ foo&quot;` // &lt;-- &lt;foo&gt; will now be &lt;_:foo&gt;
	NsFoo string `xml:&quot;ns foo&quot;`
}

var input = `&lt;xml&gt;
  &lt;foo&gt;A&lt;/foo&gt;
  &lt;ns:foo&gt;B&lt;/ns:foo&gt;
&lt;/xml&gt;`

func main() {
	decoder := xml.NewDecoder(strings.NewReader(input))
	decoder.DefaultSpace = &quot;_&quot;

	doc := &amp;Doc{}
	decoder.Decode(doc)
	
	fmt.Printf(&quot;&lt;foo&gt;: %#v\n&quot;, doc.Foo)
	fmt.Printf(&quot;&lt;ns:foo&gt;: %#v\n&quot;, doc.NsFoo)
	
}

Prints:

&lt;foo&gt;: A
&lt;ns:foo&gt;: B

huangapple
  • 本文由 发表于 2013年1月4日 03:33:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/14145864.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定