在golang中保持顺序的混合XML解码

huangapple go评论88阅读模式
英文:

mixed XML decoding in golang preserving order

问题

我需要从一个XML中提取出优惠信息,但要考虑节点的顺序:

<items>
  <offer/>
  <product>
    <offer/>
    <offer/>
  </product>
  <offer/>
  <offer/>
</items>

以下结构体可以解码这些值,但会分成两个不同的切片,导致原始顺序丢失:

type Offers struct {
	Offers   []offer `xml:"items>offer"`
	Products []offer `xml:"items>product>offer"`
}

有什么想法吗?

英文:

I need to extract offers from an XML, but taking into consideration nodes order:

<pre>
&lt;items&gt;
&lt;offer/&gt;
&lt;product&gt;
&lt;offer/&gt;
&lt;offer/&gt;
&lt;/product&gt;
&lt;offer/&gt;
&lt;offer/&gt;
&lt;/items&gt;
</pre>

The following struct would decode the values, but into two different slices, which will cause loss of original order:

<pre>
type Offers struct {
Offers []offer xml:&quot;items&gt;offer&quot;
Products []offer xml:&quot;items&gt;product&gt;offer&quot;
}
</pre>

Any ideas?

答案1

得分: 8

一种方法是重写UnmarshalXML方法。假设我们的输入如下所示:

<doc>
	<head>My Title</head>
	<p>A first paragraph.</p>
	<p>A second one.</p>
</doc>

我们希望反序列化文档并保留head和paragraph的顺序。为了保持顺序,我们需要一个切片。为了适应head和p,我们需要一个接口。我们可以这样定义我们的文档:

type Document struct {
	XMLName  xml.Name `xml:"doc"`
	Contents []Mixed  `xml:",any"`
}

,any注释将任何元素收集到Contents中。它是一个Mixed类型,我们需要将其定义为类型:

type Mixed struct {
	Type  string      // 在这里只保留"head"或"p"
	Value interface{} // 保留值,我们也可以在这里使用字符串
}

我们需要对反序列化过程有更多的控制,所以我们通过实现UnmarshalXMLMixed转换为xml.Unmashaler。我们根据开始元素的名称(例如head或p)决定代码路径。在这里,我们只是用一些值填充我们的Mixed结构,但你基本上可以在这里做任何事情:

func (m *Mixed) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	switch start.Name.Local {
	case "head", "p":
		var e string
		if err := d.DecodeElement(&e, &start); err != nil {
			return err
		}
		m.Value = e
		m.Type = start.Name.Local
	default:
		return fmt.Errorf("unknown element: %s", start)
	}
	return nil
}

将所有内容放在一起,使用上述结构的用法可能如下所示:

func main() {
	s := `
	<doc>
		<head>My Title</head>
		<p>A first paragraph.</p>
		<p>A second one.</p>
	</doc>
	`

	var doc Document
	if err := xml.Unmarshal([]byte(s), &doc); err != nil {
		log.Fatal(err)
	}
	fmt.Printf("#%v", doc)
}

这将打印出:

#{{ doc} [{head My Title} {p A first paragraph.} {p A second one.}]}

我们保留了顺序并保留了一些类型信息。你可以使用多种不同的类型进行反序列化,而不仅仅是一个单一的类型,比如Mixed。这种方法的代价是你的容器(在这里是文档的Contents字段)是一个接口。要执行任何特定于元素的操作,你需要进行类型断言或使用一些辅助方法。

完整的代码在playground上:https://play.golang.org/p/fzsUPPS7py

英文:

One way would be to overwrite the UnmarshalXML method. Let's say our input looks like this:

&lt;doc&gt;
	&lt;head&gt;My Title&lt;/head&gt;
	&lt;p&gt;A first paragraph.&lt;/p&gt;
	&lt;p&gt;A second one.&lt;/p&gt;
&lt;/doc&gt;

We want to deserialize the document and preserve the order of the head and paragraphs. For order we will need a slice. To accommodate both head and p, we will need an interface. We could define our document like this:

type Document struct {
    XMLName  xml.Name `xml:&quot;doc&quot;`
	Contents []Mixed  `xml:&quot;,any&quot;`
}

The ,any annotation will collect any element into Contents. It is a Mixed type, which we need to define as a type:

type Mixed struct {
    Type  string      // just keep &quot;head&quot; or &quot;p&quot; in here
    Value interface{} // keep the value, we could use string here, too
}

We need more control over the deserialization process, so we turn Mixed into an xml.Unmashaler by implementing UnmarshalXML. We decide on the code path based on the name of the start element, e.g. head or p. Here, we only populate our Mixed struct with some values, but you can basically do anything here:

func (m *Mixed) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
    switch start.Name.Local {
    case &quot;head&quot;, &quot;p&quot;:
        var e string
        if err := d.DecodeElement(&amp;e, &amp;start); err != nil {
            return err
        }
        m.Value = e
        m.Type = start.Name.Local
    default:
        return fmt.Errorf(&quot;unknown element: %s&quot;, start)
    }
    return nil
}

Putting it all together, usage of the above structs could look like this:

func main() {
    s := `
    &lt;doc&gt;
        &lt;head&gt;My Title&lt;/head&gt;
        &lt;p&gt;A first paragraph.&lt;/p&gt;
        &lt;p&gt;A second one.&lt;/p&gt;
    &lt;/doc&gt;
    `

    var doc Document
    if err := xml.Unmarshal([]byte(s), &amp;doc); err != nil {
        log.Fatal(err)
    }
    fmt.Printf(&quot;#%v&quot;, doc)
}   

Which would print.

#{{ doc} [{head My Title} {p A first paragraph.} {p A second one.}]}

We preserved order and kept some type information. Instead of a single type, like Mixed you could use many different types for the deserialization. The cost of this approach is that your container - here the Contents field of the document - is an interface. To do anything element-specific, you'll need a type assertion or some helper method.

Complete code on play: https://play.golang.org/p/fzsUPPS7py

huangapple
  • 本文由 发表于 2015年8月25日 00:19:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/32187067.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定