2022年4月12日 20:42:41go评论78阅读模式

英文:

Unmarshal flat XML to go data structure

问题

我有一个扁平的 XML 结构，我正在尝试将其解组为一个 Go 数据结构。我正在尝试找到一种方法来从下面的 XML 中获取每个桶中项目的列表（item-name），即 bucket1 = [apple,orange,grapes]，bucket2 = [apple,mangoes]。

当我尝试将 XML 解组为下面的 Go 数据结构时，我能够获取到桶名称和项目列表，但是我无法将项目列表映射到它们各自的桶中，因为每个桶可以有多个项目。
有没有一种方法可以通过更改 Go 数据结构来实现这个需求？我无法控制 XML 的结构，所以无法将其更改以适应我的需求。
我对 Go 还不熟悉，我会感谢任何关于这个问题的建议。

type buckets struct {
	XMLName    xml.Name `xml:"buckets"`
	BucketName []string `xml:"bucket-name"`
	ItemName   []string `xml:"item-name"`
	Weight     []string `xml:"weight"`
	Quantity   []string `xml:"quantity"`
}

<?xml version="1.0" encoding="UTF-8"?>
<buckets>
   <bucket-name>bucket1</bucket-name>
   <item-name>apple</item-name>
   <weight>500</weight>
   <quantity>3</quantity>
   <item-name>orange</item-name>
   <weight>500</weight>
   <quantity>2</quantity>
   <item-name>grapes</item-name>
   <weight>800</weight>
   <quantity>1</quantity>
   <bucket-name>bucket2</bucket-name>
   <item-name>apple</item-name>
   <weight>500</weight>
   <quantity>3</quantity>
   <item-name>mangoes</item-name>
   <weight>400</weight>
   <quantity>2</quantity>
</buckets>

英文:

I have a flat XML structure which I am trying to unmarshal into a go data structure. I am trying to find a way to get the list of items(item-name) in each bucket from the
below XML i.e.- bucket1 = [apple,orange,grapes], bucket2= [apple,mangoes].

When I tried to unmarshal the xml into below go data structure, I was able to get the list of bucketnames and items, but I am unable to map the list of items to their respective buckets as each bucket can have many items.
Is there a way to achieve this requirement from this xml by changing the go data structure? I don't have control over the structure of the XML so I can't change the it to suit my requirement.
I am new to go and I'd appreciate any inputs here.

type buckets struct {
	XMLName    xml.Name `xml:&quot;buckets&quot;`
	BucketName []string `xml:&quot;bucket-name&quot;`
	ItemName   []string `xml:&quot;item-name&quot;`
	Weight     []string `xml:&quot;weight&quot;`
	Quantity   []string `xml:&quot;quantity&quot;`
}
    	
    
    &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
    &lt;buckets&gt;
       &lt;bucket-name&gt;bucket1&lt;/bucket-name&gt;
       &lt;item-name&gt;apple&lt;/item-name&gt;
       &lt;weight&gt;500&lt;/weight&gt;
       &lt;quantity&gt;3&lt;/quantity&gt;
       &lt;item-name&gt;orange&lt;/item-name&gt;
       &lt;weight&gt;500&lt;/weight&gt;
       &lt;quantity&gt;2&lt;/quantity&gt;
       &lt;item-name&gt;grapes&lt;/item-name&gt;
       &lt;weight&gt;800&lt;/weight&gt;
       &lt;quantity&gt;1&lt;/quantity&gt;
       &lt;bucket-name&gt;bucket2&lt;/bucket-name&gt;
       &lt;item-name&gt;apple&lt;/item-name&gt;
       &lt;weight&gt;500&lt;/weight&gt;
       &lt;quantity&gt;3&lt;/quantity&gt;
       &lt;item-name&gt;mangoes&lt;/item-name&gt;
       &lt;weight&gt;400&lt;/weight&gt;
       &lt;quantity&gt;2&lt;/quantity&gt;
    &lt;/buckets&gt;

答案1

得分: 1

你想要做的事情可以通过使用自定义的xml.UnmarshalXML函数和手动将桶映射到Go结构体来实现。

下面描述的代码假设XML元素与提供的示例以相同的顺序出现。

首先，我们有如下所述的结构体：

type Buckets struct {
	XMLName xml.Name `xml:"buckets"`
	Buckets []*Bucket
}

type Bucket struct {
	BucketName string `xml:"Bucket-name"`
	Items      []*Item
}

type Item struct {
	Name     string `xml:"item-name"`
	Weight   int    `xml:"weight"`
	Quantity int    `xml:"quantity"`
}

接下来，我们需要通过为Buckets结构体实现Unmarshaler接口来实现UnmarshalXML方法。当我们调用xml.Unmarshal并将Buckets结构体作为目标传递时，将调用此方法。

func (b *Buckets) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	b.XMLName = start.Name

	var currentBucket *Bucket
	var currentItem *Item
	for {
		t, err := d.Token()
		if t == nil {
			// 在退出之前将最后一个桶添加到切片中
			b.Buckets = append(b.Buckets, currentBucket)
			break
		}
		if err != nil {
			return err
		}
		switch se := t.(type) {
		case xml.StartElement:
			switch se.Name.Local {
			case "Bucket-name":
				// 检查currentBucket是否为nil，这对于第一次运行是必要的。
				// 否则，将最后一个桶追加到切片中并重置它
				if currentBucket != nil {
					b.Buckets = append(b.Buckets, currentBucket)
				}
				currentBucket = &Bucket{}

				if err := d.DecodeElement(&currentBucket.BucketName, &se); err != nil {
					return err
				}
			case "item-name":
				currentItem = &Item{}
				if err := d.DecodeElement(&currentItem.Name, &se); err != nil {
					return err
				}
			case "weight":
				if err := d.DecodeElement(&currentItem.Weight, &se); err != nil {
					return err
				}
			case "quantity":
				if err := d.DecodeElement(&currentItem.Quantity, &se); err != nil {
					return err
				}

				// 由于quantity是最后一个元素，将item追加到bucket中并重置它
				currentBucket.Items = append(currentBucket.Items, currentItem)
				currentItem = &Item{}
			}
		}
	}

	return nil
}

我们实际上是在循环遍历XML元素，并使用自定义逻辑将它们映射到我们的结构体中。关于d.Token()和xml.StartElement的详细信息，我不会详细介绍，你可以阅读文档了解更多信息。

现在让我们分解上面的方法：

当我们遇到名称为Bucket-name的元素时，我们知道接下来是一个新的桶，所以将已处理的桶（我们必须检查nil，因为第一次不会有任何处理的桶）追加到切片中，并将currentBucket设置为一个新的Bucket（我们要处理的那个）。
当我们遇到名称为item-name的元素时，我们知道接下来是一个新的项，所以将currentItem设置为一个新的Item。
当我们遇到名称为quantity的元素时，我们知道这是属于currentItem的最后一个元素，所以将其追加到currentBucket.Items中。
当t最终变为nil时，表示输入流的结束，但由于我们在遇到新桶时追加了一个桶，最后一个桶（或者如果只有一个桶）不会被追加。因此，在break之前，我们需要追加最后一个处理的桶。

注意：

你完全可以避免使用Buckets结构体，并创建一个函数来处理解组，利用xml.Decoder，像这样：

func UnmarshalBuckets(rawXML []byte) []*Bucket {
    // 或者任何指向XML数据的io.Reader
    d := xml.NewDecoder(bytes.NewReader(rawXML))
    ...
}

免责声明：

我知道上面的代码有点粗糙，我相信你可以改进它。请随意尝试并以更可读的方式实现自定义逻辑。
我可能没有涵盖或示例中没有出现的一些边缘情况。你应该分析你的XML并尽量覆盖它们（如果可能）。
如前所述，该代码在很大程度上依赖于XML元素的顺序。

在Go Playground上查看工作示例

英文:

What you are trying to do can be achieved by using a custom xml.UnmarshalXML and manually mapping the buckets to a Go struct.

The code described below assumes that the XML elements come in the same order as the example provided.

First of all we have the structs as described on the question:

type Buckets struct {
XMLName xml.Name `xml:&quot;buckets&quot;`
Buckets []*Bucket
}
type Bucket struct {
BucketName string `xml:&quot;Bucket-name&quot;`
Items      []*Item
}
type Item struct {
Name     string `xml:&quot;item-name&quot;`
Weight   int    `xml:&quot;weight&quot;`
Quantity int    `xml:&quot;quantity&quot;`
}

Next we will need to implement the Unmarshaler interface by implementing the UnmarshalXML method for the Buckets struct. This method is going to get called when we call the xml.Unmarhsal and passing as destination a Buckets struct.

func (b *Buckets) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
b.XMLName = start.Name
var currentBucket *Bucket
var currentItem *Item
for {
t, err := d.Token()
if t == nil {
// append the last bucket before exiting
b.Buckets = append(b.Buckets, currentBucket)
break
}
if err != nil {
return err
}
switch se := t.(type) {
case xml.StartElement:
switch se.Name.Local {
case &quot;Bucket-name&quot;:
// check if currentBucket is nil, it is necessary for the first time that
// is going to run. Otherwise, append the last bucket to the slice and reset it
if currentBucket != nil {
b.Buckets = append(b.Buckets, currentBucket)
}
currentBucket = &amp;Bucket{}
if err := d.DecodeElement(&amp;currentBucket.BucketName, &amp;se); err != nil {
return err
}
case &quot;item-name&quot;:
currentItem = &amp;Item{}
if err := d.DecodeElement(&amp;currentItem.Name, &amp;se); err != nil {
return err
}
case &quot;weight&quot;:
if err := d.DecodeElement(&amp;currentItem.Weight, &amp;se); err != nil {
return err
}
case &quot;quantity&quot;:
if err := d.DecodeElement(&amp;currentItem.Quantity, &amp;se); err != nil {
return err
}
// since quantity comes last append the item to the bucket,  and reset it
currentBucket.Items = append(currentBucket.Items, currentItem)
currentItem = &amp;Item{}
}
}
}
return nil
}

What we are essentially doing is looping over the XML elements and mapping them to our struct with our custom logic. I won't go into great details about d.Token() and xml.StartElement, you can always read the docs for more.

Now let's break down the above method:

When we meet an element with name Bucket-name we know that a new bucket is following, so append the already processed one (we have to check for nil since the first time there won't be any processed) to the slice and set currentBucket to a new Bucket (the one we are going to process).
When we meet an element with name item-name we know that a new item is following so set currentItem to a new Item.
When we meet an element with name quantity we know that this is the last element that belongs to the currentItem, so append it to the currentBucket.Items
When t finally becomes nil it indicates the end of the input stream, but since we are appending a bucket whenever we meet a new one, the last one (or if there is only a single bucket) won't get appended. So, before we break we need to append the last proccesed one.

Notes:

You could totally avoid the Buckets struct and create a function to handle the unmarhsaling by making use of the xml.Decoder like that:

func UnmarshalBuckets(rawXML []byte) []*Bucket {
// or any io.Reader that points to the xml data
d := xml.NewDecoder(bytes.NewReader(rawXML))
...
}

Disclaimers:

I know the code above feels a bit sketchy and I am sure that you can improve it. Feel free to play with it and implement the custom logic in a more readable way.
There should be some edge cases that I didn't cover or are not present in the example provided. You should analyze your XML and try (if possible) to cover them.
As already mentioned, the code is heavily dependant in the order of the XML elements.

Working example at Go Playground

答案2

得分: 0

我同意mkopriva的观点。Go的注释是为了优化用于具有相同结构的数据记录的XML。将它们用于混合内容就像给牛戴上马鞍一样。插一句：我已经在GitHub上编写了处理混合内容的代码，欢迎提供反馈。

英文:

I agree with mkopriva. Go's annotations are optimized for XML used for identically-structured data records. Using them for mixed content is like putting a saddle on a cow. plug: I have written code for handling mixed content that is on GitHub and I'd welcome feedback.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将扁平的XML解组为Go数据结构

问题

答案1

答案2

国际象棋：静态搜索主导运行时

在Golang中访问类型为map[string]interface{}的嵌套映射可以通过以下方式实现：

如何解码 Golang 中的 map？

有条件地定义一个变量

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论