将扁平的XML解组为Go数据结构

huangapple go评论67阅读模式
英文:

Unmarshal flat XML to go data structure

问题

我有一个扁平的 XML 结构,我正在尝试将其解组为一个 Go 数据结构。我正在尝试找到一种方法来从下面的 XML 中获取每个桶中项目的列表(item-name),即 bucket1 = [apple,orange,grapes],bucket2 = [apple,mangoes]。

当我尝试将 XML 解组为下面的 Go 数据结构时,我能够获取到桶名称和项目列表,但是我无法将项目列表映射到它们各自的桶中,因为每个桶可以有多个项目。
有没有一种方法可以通过更改 Go 数据结构来实现这个需求?我无法控制 XML 的结构,所以无法将其更改以适应我的需求。
我对 Go 还不熟悉,我会感谢任何关于这个问题的建议。

type buckets struct {
	XMLName    xml.Name `xml:"buckets"`
	BucketName []string `xml:"bucket-name"`
	ItemName   []string `xml:"item-name"`
	Weight     []string `xml:"weight"`
	Quantity   []string `xml:"quantity"`
}
<?xml version="1.0" encoding="UTF-8"?>
<buckets>
   <bucket-name>bucket1</bucket-name>
   <item-name>apple</item-name>
   <weight>500</weight>
   <quantity>3</quantity>
   <item-name>orange</item-name>
   <weight>500</weight>
   <quantity>2</quantity>
   <item-name>grapes</item-name>
   <weight>800</weight>
   <quantity>1</quantity>
   <bucket-name>bucket2</bucket-name>
   <item-name>apple</item-name>
   <weight>500</weight>
   <quantity>3</quantity>
   <item-name>mangoes</item-name>
   <weight>400</weight>
   <quantity>2</quantity>
</buckets>
英文:

I have a flat XML structure which I am trying to unmarshal into a go data structure. I am trying to find a way to get the list of items(item-name) in each bucket from the
below XML i.e.- bucket1 = [apple,orange,grapes], bucket2= [apple,mangoes].

When I tried to unmarshal the xml into below go data structure, I was able to get the list of bucketnames and items, but I am unable to map the list of items to their respective buckets as each bucket can have many items.
Is there a way to achieve this requirement from this xml by changing the go data structure? I don't have control over the structure of the XML so I can't change the it to suit my requirement.
I am new to go and I'd appreciate any inputs here.

type buckets struct {
	XMLName    xml.Name `xml:&quot;buckets&quot;`
	BucketName []string `xml:&quot;bucket-name&quot;`
	ItemName   []string `xml:&quot;item-name&quot;`
	Weight     []string `xml:&quot;weight&quot;`
	Quantity   []string `xml:&quot;quantity&quot;`
}
    	
    
    &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
    &lt;buckets&gt;
       &lt;bucket-name&gt;bucket1&lt;/bucket-name&gt;
       &lt;item-name&gt;apple&lt;/item-name&gt;
       &lt;weight&gt;500&lt;/weight&gt;
       &lt;quantity&gt;3&lt;/quantity&gt;
       &lt;item-name&gt;orange&lt;/item-name&gt;
       &lt;weight&gt;500&lt;/weight&gt;
       &lt;quantity&gt;2&lt;/quantity&gt;
       &lt;item-name&gt;grapes&lt;/item-name&gt;
       &lt;weight&gt;800&lt;/weight&gt;
       &lt;quantity&gt;1&lt;/quantity&gt;
       &lt;bucket-name&gt;bucket2&lt;/bucket-name&gt;
       &lt;item-name&gt;apple&lt;/item-name&gt;
       &lt;weight&gt;500&lt;/weight&gt;
       &lt;quantity&gt;3&lt;/quantity&gt;
       &lt;item-name&gt;mangoes&lt;/item-name&gt;
       &lt;weight&gt;400&lt;/weight&gt;
       &lt;quantity&gt;2&lt;/quantity&gt;
    &lt;/buckets&gt;

答案1

得分: 1

你想要做的事情可以通过使用自定义的xml.UnmarshalXML函数和手动将桶映射到Go结构体来实现。

下面描述的代码假设XML元素与提供的示例以相同的顺序出现。

首先,我们有如下所述的结构体:

type Buckets struct {
	XMLName xml.Name `xml:"buckets"`
	Buckets []*Bucket
}

type Bucket struct {
	BucketName string `xml:"Bucket-name"`
	Items      []*Item
}

type Item struct {
	Name     string `xml:"item-name"`
	Weight   int    `xml:"weight"`
	Quantity int    `xml:"quantity"`
}

接下来,我们需要通过为Buckets结构体实现Unmarshaler接口来实现UnmarshalXML方法。当我们调用xml.Unmarshal并将Buckets结构体作为目标传递时,将调用此方法。

func (b *Buckets) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	b.XMLName = start.Name

	var currentBucket *Bucket
	var currentItem *Item
	for {
		t, err := d.Token()
		if t == nil {
			// 在退出之前将最后一个桶添加到切片中
			b.Buckets = append(b.Buckets, currentBucket)
			break
		}
		if err != nil {
			return err
		}
		switch se := t.(type) {
		case xml.StartElement:
			switch se.Name.Local {
			case "Bucket-name":
				// 检查currentBucket是否为nil,这对于第一次运行是必要的。
				// 否则,将最后一个桶追加到切片中并重置它
				if currentBucket != nil {
					b.Buckets = append(b.Buckets, currentBucket)
				}
				currentBucket = &Bucket{}

				if err := d.DecodeElement(&currentBucket.BucketName, &se); err != nil {
					return err
				}
			case "item-name":
				currentItem = &Item{}
				if err := d.DecodeElement(&currentItem.Name, &se); err != nil {
					return err
				}
			case "weight":
				if err := d.DecodeElement(&currentItem.Weight, &se); err != nil {
					return err
				}
			case "quantity":
				if err := d.DecodeElement(&currentItem.Quantity, &se); err != nil {
					return err
				}

				// 由于quantity是最后一个元素,将item追加到bucket中并重置它
				currentBucket.Items = append(currentBucket.Items, currentItem)
				currentItem = &Item{}
			}
		}
	}

	return nil
}

我们实际上是在循环遍历XML元素,并使用自定义逻辑将它们映射到我们的结构体中。关于d.Token()xml.StartElement的详细信息,我不会详细介绍,你可以阅读文档了解更多信息。

现在让我们分解上面的方法:

  • 当我们遇到名称为Bucket-name的元素时,我们知道接下来是一个新的桶,所以将已处理的桶(我们必须检查nil,因为第一次不会有任何处理的桶)追加到切片中,并将currentBucket设置为一个新的Bucket(我们要处理的那个)。
  • 当我们遇到名称为item-name的元素时,我们知道接下来是一个新的项,所以将currentItem设置为一个新的Item。
  • 当我们遇到名称为quantity的元素时,我们知道这是属于currentItem的最后一个元素,所以将其追加到currentBucket.Items中。
  • t最终变为nil时,表示输入流的结束,但由于我们在遇到新桶时追加了一个桶,最后一个桶(或者如果只有一个桶)不会被追加。因此,在break之前,我们需要追加最后一个处理的桶。

注意:

  • 你完全可以避免使用Buckets结构体,并创建一个函数来处理解组,利用xml.Decoder,像这样:
func UnmarshalBuckets(rawXML []byte) []*Bucket {
    // 或者任何指向XML数据的io.Reader
    d := xml.NewDecoder(bytes.NewReader(rawXML))
    ...
}

免责声明:

  • 我知道上面的代码有点粗糙,我相信你可以改进它。请随意尝试并以更可读的方式实现自定义逻辑。
  • 我可能没有涵盖或示例中没有出现的一些边缘情况。你应该分析你的XML并尽量覆盖它们(如果可能)。
  • 如前所述,该代码在很大程度上依赖于XML元素的顺序。

Go Playground上查看工作示例

英文:

What you are trying to do can be achieved by using a custom xml.UnmarshalXML and manually mapping the buckets to a Go struct.

The code described below assumes that the XML elements come in the same order as the example provided.

First of all we have the structs as described on the question:

type Buckets struct {
XMLName xml.Name `xml:&quot;buckets&quot;`
Buckets []*Bucket
}
type Bucket struct {
BucketName string `xml:&quot;Bucket-name&quot;`
Items      []*Item
}
type Item struct {
Name     string `xml:&quot;item-name&quot;`
Weight   int    `xml:&quot;weight&quot;`
Quantity int    `xml:&quot;quantity&quot;`
}

Next we will need to implement the Unmarshaler interface by implementing the UnmarshalXML method for the Buckets struct. This method is going to get called when we call the xml.Unmarhsal and passing as destination a Buckets struct.

func (b *Buckets) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
b.XMLName = start.Name
var currentBucket *Bucket
var currentItem *Item
for {
t, err := d.Token()
if t == nil {
// append the last bucket before exiting
b.Buckets = append(b.Buckets, currentBucket)
break
}
if err != nil {
return err
}
switch se := t.(type) {
case xml.StartElement:
switch se.Name.Local {
case &quot;Bucket-name&quot;:
// check if currentBucket is nil, it is necessary for the first time that
// is going to run. Otherwise, append the last bucket to the slice and reset it
if currentBucket != nil {
b.Buckets = append(b.Buckets, currentBucket)
}
currentBucket = &amp;Bucket{}
if err := d.DecodeElement(&amp;currentBucket.BucketName, &amp;se); err != nil {
return err
}
case &quot;item-name&quot;:
currentItem = &amp;Item{}
if err := d.DecodeElement(&amp;currentItem.Name, &amp;se); err != nil {
return err
}
case &quot;weight&quot;:
if err := d.DecodeElement(&amp;currentItem.Weight, &amp;se); err != nil {
return err
}
case &quot;quantity&quot;:
if err := d.DecodeElement(&amp;currentItem.Quantity, &amp;se); err != nil {
return err
}
// since quantity comes last append the item to the bucket,  and reset it
currentBucket.Items = append(currentBucket.Items, currentItem)
currentItem = &amp;Item{}
}
}
}
return nil
}

What we are essentially doing is looping over the XML elements and mapping them to our struct with our custom logic. I won't go into great details about d.Token() and xml.StartElement, you can always read the docs for more.

Now let's break down the above method:

  • When we meet an element with name Bucket-name we know that a new bucket is following, so append the already processed one (we have to check for nil since the first time there won't be any processed) to the slice and set currentBucket to a new Bucket (the one we are going to process).
  • When we meet an element with name item-name we know that a new item is following so set currentItem to a new Item.
  • When we meet an element with name quantity we know that this is the last element that belongs to the currentItem, so append it to the currentBucket.Items
  • When t finally becomes nil it indicates the end of the input stream, but since we are appending a bucket whenever we meet a new one, the last one (or if there is only a single bucket) won't get appended. So, before we break we need to append the last proccesed one.

Notes:

  • You could totally avoid the Buckets struct and create a function to handle the unmarhsaling by making use of the xml.Decoder like that:
func UnmarshalBuckets(rawXML []byte) []*Bucket {
// or any io.Reader that points to the xml data
d := xml.NewDecoder(bytes.NewReader(rawXML))
...
}

Disclaimers:

  • I know the code above feels a bit sketchy and I am sure that you can improve it. Feel free to play with it and implement the custom logic in a more readable way.
  • There should be some edge cases that I didn't cover or are not present in the example provided. You should analyze your XML and try (if possible) to cover them.
  • As already mentioned, the code is heavily dependant in the order of the XML elements.

Working example at Go Playground

答案2

得分: 0

我同意mkopriva的观点。Go的注释是为了优化用于具有相同结构的数据记录的XML。将它们用于混合内容就像给牛戴上马鞍一样。插一句:我已经在GitHub上编写了处理混合内容的代码,欢迎提供反馈。

英文:

I agree with mkopriva. Go's annotations are optimized for XML used for identically-structured data records. Using them for mixed content is like putting a saddle on a cow. plug: I have written code for handling mixed content that is on GitHub and I'd welcome feedback.

huangapple
  • 本文由 发表于 2022年4月12日 20:42:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/71842954.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定