处理未映射元素的 encoding/xml

huangapple go评论96阅读模式
英文:

encoding/xml handle unmapped elements

问题

从http://golang.org/pkg/encoding/xml/#Unmarshal

> * 如果XML元素包含一个未匹配上述规则的子元素,并且结构体具有带有标签",any"的字段,
> unmarshal将该子元素映射到该结构体字段。

我在将XML信封的剩余部分放入我的结构体中遇到了问题(以显示我有一个不完整的映射)

http://play.golang.org/p/mnFqAcguJQ

我知道你可以使用mgo包中的bson.M来完全使用这种方法,使用,inline - 但是看起来map[string]interface{}在这里不是答案。

编辑:
经过一些额外的尝试,我发现了一些我认为是额外的意外行为。

将类型切换为[]string开始接受输入,但没有键/值对:http://play.golang.org/p/wCAJeeQa4m

我还计划调整encode/xml以解析HTML。我在文档中没有看到如果一个元素存在多次,它将保存最后一个实例而不是报错:http://play.golang.org/p/0MY__R-Xi3

英文:

From http://golang.org/pkg/encoding/xml/#Unmarshal

> * If the XML element contains a sub-element that hasn't matched any of the above rules and the struct has a field with tag ",any",
> unmarshal maps the sub-element to that struct field.

I'm having trouble getting the remainder of an XML envelope into my struct (to show that I have an incomplete mapping)

http://play.golang.org/p/mnFqAcguJQ

I know you can use exactly this method with bson.M from the mgo packages using ,inline - but it looks like map[string]interface{} isn't the answer here.

EDIT:
After some additional playing, I've found what I believe to be some additional unexpected behavior.

Switching to []string as a type starts to accept input, but no key/value pairs: http://play.golang.org/p/wCAJeeQa4m

I also planned on adapting encode/xml in order to parse html. I do not see in the documentation that if an element exists more than once, it will save the last instance of it, rather than erroring out: http://play.golang.org/p/0MY__R-Xi3

答案1

得分: 5

这里是链接:http://play.golang.org/p/iY8YlxYym0

由于c是具体的内容,不应该使用",any",因此它应该有一个结构体定义。C本身包含一个任意标签的列表,因此它应该包含一个[]Tag xml:'",any"'... 现在要捕获Tag本身,你需要使用xml.Name来获取标签名和",innerxml"的内容。

最终的结果如下:

const xmlString = `<foo><a>1</a><b>2</b><c><c1>3</c1><c2>4</c2></c></foo>`
type Foo struct {
	A int   `xml:"a"`
	B int   `xml:"b"`
	C Extra `xml:"c"`
}

type Extra struct {
	Items []Tag `xml:",any"`
}

type Tag struct {
	XMLName xml.Name
	Content string `xml:",innerxml"`
}

或者更简洁的版本:

type Foo struct {
	A int   `xml:"a"`
	B int   `xml:"b"`
	C struct {
		Items []struct {
			XMLName xml.Name
			Content string `xml:",innerxml"`
		} `xml:",any"`
	} `xml:"c"`
}

对于HTML,可以使用go.net/html。使用xml解析器解析HTML会比较复杂。

英文:

Here: http://play.golang.org/p/iY8YlxYym0

Since c is something concrete, it shouldn't use ",any", hence it should have a struct definition. C itself contains a list of arbitrary tags, hence it should contain an []Tag xml:'",any"'... now to capture the Tag itself, you need xml.Name to get the tag name and something with ",innerxml".

Finally the result is this:

const xmlString = `<foo><a>1</a><b>2</b><c><c1>3</c1><c2>4</c2></c></foo>`
type Foo struct {
	A int   `xml:"a"`
	B int   `xml:"b"`
	C Extra `xml:"c"`
}

type Extra struct {
	Items []Tag `xml:",any"`
}

type Tag struct {
	XMLName xml.Name
	Content string `xml:",innerxml"`
}

Or the shorter version:

type Foo struct {
	A int   `xml:"a"`
	B int   `xml:"b"`
	C struct {
		Items []struct {
			XMLName xml.Name
			Content string `xml:",innerxml"`
		} `xml:",any"`
	} `xml:"c"`
}

For HTML there is go.net/html. Using xml parser for html will be complicated.

huangapple
  • 本文由 发表于 2014年6月5日 03:01:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/24045497.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定