Why does Go's encoding/xml.Decoder.Token() not produce xml.Attr tokens as it should?

huangapple go评论90阅读模式
英文:

Why does Go's encoding/xml.Decoder.Token() not produce xml.Attr tokens as it should?

问题

使用encoding/xml.Decoder,我正在尝试手动解析从http://www.khronos.org/files/collada_schema_1_4加载的XML文件。

为了测试目的,我只是遍历文档并打印出遇到的任何令牌类型:

func Test (r io.Reader) {
	var t xml.Token
	var pa *xml.Attr
	var a xml.Attr
	var co xml.Comment
	var cd xml.CharData
	var se xml.StartElement
	var pi xml.ProcInst
	var ee xml.EndElement
	var is bool
	var xd = xml.NewDecoder(r)
	for i := 0; i < 24; i++ {
		if t, err = xd.Token(); (err == nil) && (t != nil) {
			if a, is = t.(xml.Attr); is { print("ATTR\t"); println(a.Name.Local) }
			if pa, is = t.(*xml.Attr); is { print("*ATTR\t"); println(pa) }
			if co, is = t.(xml.Comment); is { print("COMNT\t"); println(co) }
			if cd, is = t.(xml.CharData); is { print("CDATA\t"); println(cd) }
			if pi, is = t.(xml.ProcInst); is { print("PROCI\t"); println(pi.Target) }
			if se, is = t.(xml.StartElement); is { print("START\t"); println(se.Name.Local) }
			if ee, is = t.(xml.EndElement); is { print("END\t\t"); println(ee.Name.Local) }
		}
	}
}

现在这是输出:

PROCI	xml
CDATA	[1/64]0xf84004e050
START	schema
CDATA	[2/129]0xf84004d090
COMNT	[29/129]0xf84004d090
CDATA	[2/129]0xf84004d090
START	annotation
CDATA	[3/129]0xf84004d090
START	documentation
CDATA	[641/1039]0xf840061000
END		documentation
CDATA	[2/1039]0xf840061000
END		annotation
CDATA	[2/1039]0xf840061000
COMNT	[37/1039]0xf840061000
CDATA	[2/1039]0xf840061000
START	import
END		import
CDATA	[2/1039]0xf840061000
COMNT	[14/1039]0xf840061000
CDATA	[2/1039]0xf840061000
START	element
CDATA	[3/1039]0xf840061000
START	annotation

注意,即使在最后(第24行),已经传递了许多属性,但没有输出ATTR或*ATTR行。

这是在Windows 7 64位下的Go 1.0.3 64位中进行的。我是做错了什么还是应该提交一个Go软件包错误报告?

[附注:当对正确准备的结构进行正常的xml.Unmarshal时,xml包可以很好地捕获和映射已知命名和映射的属性。但我还需要收集根元素中的“未知”属性(为了收集此用例的命名空间信息,该用例是http://github.com/metaleap/go-xsd),因此我尝试使用Decoder.Token()。]

英文:

Using encoding/xml.Decoder I'm attempting to manually parse an XML file loaded from http://www.khronos.org/files/collada_schema_1_4

For test purposes, I'm just iterating over the document printing out whatever token type is encountered:

func Test (r io.Reader) {
	var t xml.Token
	var pa *xml.Attr
	var a xml.Attr
	var co xml.Comment
	var cd xml.CharData
	var se xml.StartElement
	var pi xml.ProcInst
	var ee xml.EndElement
	var is bool
	var xd = xml.NewDecoder(r)
	for i := 0; i &lt; 24; i++ {
		if t, err = xd.Token(); (err == nil) &amp;&amp; (t != nil) {
			if a, is = t.(xml.Attr); is { print(&quot;ATTR\t&quot;); println(a.Name.Local) }
			if pa, is = t.(*xml.Attr); is { print(&quot;*ATTR\t&quot;); println(pa) }
			if co, is = t.(xml.Comment); is { print(&quot;COMNT\t&quot;); println(co) }
			if cd, is = t.(xml.CharData); is { print(&quot;CDATA\t&quot;); println(cd) }
			if pi, is = t.(xml.ProcInst); is { print(&quot;PROCI\t&quot;); println(pi.Target) }
			if se, is = t.(xml.StartElement); is { print(&quot;START\t&quot;); println(se.Name.Local) }
			if ee, is = t.(xml.EndElement); is { print(&quot;END\t\t&quot;); println(ee.Name.Local) }
		}
	}
}

Now here's the output:

PROCI	xml
CDATA	[1/64]0xf84004e050
START	schema
CDATA	[2/129]0xf84004d090
COMNT	[29/129]0xf84004d090
CDATA	[2/129]0xf84004d090
START	annotation
CDATA	[3/129]0xf84004d090
START	documentation
CDATA	[641/1039]0xf840061000
END		documentation
CDATA	[2/1039]0xf840061000
END		annotation
CDATA	[2/1039]0xf840061000
COMNT	[37/1039]0xf840061000
CDATA	[2/1039]0xf840061000
START	import
END		import
CDATA	[2/1039]0xf840061000
COMNT	[14/1039]0xf840061000
CDATA	[2/1039]0xf840061000
START	element
CDATA	[3/1039]0xf840061000
START	annotation

Notice no ATTR or *ATTR lines are output even though by the last (24th) line many attributes have been passed both in the root xs:schema element as well as in xs:import and xs:element elements.

This is in Go 1.0.3 64-bit under Windows 7 64-bit. Am I doing something wrong or should I file a Go package bug report?

[Side note: when doing a normal xml.Unmarshal into properly prepared structs, known-named-and-mapped attributes are captured and mapped by the xml package just fine. But I also need to collect "unknown" attributes in the root element (to collect namespace information for this use-case, the use-case being http://github.com/metaleap/go-xsd ), hence my attempts to use Decoder.Token().]

答案1

得分: 5

是的,这种行为是预期的。属性被解析,但不作为xml.Token返回。属性只是不是Tokens。参见:http://golang.org/pkg/encoding/xml/#Token

可以通过Token StartElement中的Attr字段访问属性。参见:http://golang.org/pkg/encoding/xml/#StartElement

(一些一般提示:

a)不要使用print或println。

b)a, ok := t.(SomeType)的习惯用法被称为“逗号好”,因为布尔值通常被命名为“ok”,而不是“is”。请遵循这些约定。

c)习惯用法可能是这样的

switch t := t.(type) {
  case xml.StartElement: ...
  case xml.EndElement: ...
}

而不是你的“if a, is = t.(xml.Attr) ...”列表。

d)所有这些“var se xml.StartElement”都是噪音(杂乱)。使用

if se, ok := t.(xml.StartElement); ok { ... }

这将使您的代码更易读。)

英文:

Yes, this behavior is expected. The attributes are parsed, but
not returned as a xml.Token. Attributes simply arn't Tokens.
See: http://golang.org/pkg/encoding/xml/#Token

The attributes are accessible through the Attr field in
the Token StartElement.
See: http://golang.org/pkg/encoding/xml/#StartElement

(( Some general hints:

a) Do not use print or println.

b) The a, ok := t.(SomeType) idioma is called "comma okay", because the boolean is normaly named "ok", not "is". Please stick to these conventions.

c) Idiomatic would be something like

switch t := t.(type) {
  case xml.StartElement: ...
  case xml.EndElement: ...
}

instead of your list of "if a, is = t.(xml.Attr) ..."

d) All this "var se xml.StartElement" is noise (clutter). Use

if se, ok := t.(xml.StartElement); ok { ... }

This would make your code much readable. ))

huangapple
  • 本文由 发表于 2012年11月7日 16:49:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/13266090.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定