在Go语言中,如何解析包含混合元素/字符数据/元素/字符数据内容的XML?

huangapple go评论76阅读模式
英文:

In Go, how to parse XML with mixed elements/chardata/elements/chardata content?

问题

假设我有一个可以多次引用元素的结构:

<?xml version="1.0" encoding="UTF-8"?>
<book category="cooking">
  <title lang="en">Everyday Italian</title>
  <author>Giada De Laurentiis</author>
  <year>2005</year>
  <price>30.00</price>
  Blah Blah Blah Bleh Blah of <year/> written by <author/>
</book>

我该如何解析这个XML(或者更准确地说,我该如何描述这个结构),以便我可以在其中有这些内部引用

type Book struct{
   t string `xml:"book>title"`
   p string `xml:"book>price"`
   y string `xml:"book>year"`
   a string `xml:"book>author"`
   blah string ???????
}

天真的方法(https://go.dev/play/p/JVM98pCcI0D),将blah描述为cdata显然是错误的,因为引用<year/><author/>会丢失。

在这里定义blah的正确方式是什么,以便在解析后仍然可以访问它的内部结构?

英文:

Let's say I have a structure, that can reference elements
multiple times:

    &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
    &lt;book category=&quot;cooking&quot;&gt;
      &lt;title lang=&quot;en&quot;&gt;Everyday Italian&lt;/title&gt;
      &lt;author&gt;Giada De Laurentiis&lt;/author&gt;
      &lt;year&gt;2005&lt;/year&gt;
      &lt;price&gt;30.00&lt;/price&gt;
      Blah Blah Blah Bleh Blah of &lt;year/&gt; written by &lt;author/&gt;
    &lt;/book&gt;

How can I parse this XML (or better to say, how can I describe the structure),
so that I can have these internal references to it?

    type Book struct{
       t string `xml:&quot;book&gt;title&quot;`
       p string `xml:&quot;book&gt;price&quot;`
       y string `xml:&quot;book&gt;year&quot;`
       a string `xml:&quot;book&gt;author&quot;`
       blah string ???????
    }

The naïve approach (https://go.dev/play/p/JVM98pCcI0D), just to describe blah as cdata is obviously wrong, because the references &lt;year/&gt; and &lt;author/&gt; are getting lost.

What is the right way to define blah here, so that the internal structure of it, is still available after parsing?

答案1

得分: 2

基于icza的评论的解决方案:

func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	for {
		t, err := d.Token()
		if err != nil {
			if err != io.EOF {
				return err
			}
			return nil
		}

		switch t := t.(type) {
		case xml.StartElement:
			var f interface{} // 字段
			var r string      // 替换
			switch t.Name.Local {
			case "title":
				f = &b.Title
			case "author":
				if len(b.Author) > 0 { // 如果“author”已经解码,则假设这是“blah chardata”中的元素
					r = b.Author // 如果你想让<author/>出现在Text中,则使用`r = "<author/>"`代替
				} else {
					f = &b.Author
				}
			case "year":
				if len(b.Year) > 0 { // 与上面的author逻辑相同
					r = b.Year
				} else {
					f = &b.Year
				}
			case "price":
				f = &b.Price
			}
			if f != nil {
				if err := d.DecodeElement(f, &t); err != nil {
					return err
				}
			}
			if len(r) > 0 {
				b.Text += " " + r + " " // 添加空格以填充替换字符串
			}
		case xml.CharData:
			s := strings.TrimSpace(string(t))
			if len(s) > 0 {
				b.Text += s
			}
		}
	}
	return nil
}
英文:

A solution based on icza's comment:

func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
	for {
		t, err := d.Token()
		if err != nil {
			if err != io.EOF {
				return err
			}
			return nil
		}

		switch t := t.(type) {
		case xml.StartElement:
			var f interface{} // field
			var r string      // replace
			switch t.Name.Local {
			case &quot;title&quot;:
				f = &amp;b.Title
			case &quot;author&quot;:
				if len(b.Author) &gt; 0 { // if &quot;author&quot; was already decoded then assume this is the element in the &quot;blah chardata&quot;
					r = b.Author // if you want &lt;author/&gt; to appear in Text then do `r = &quot;&lt;author/&gt;&quot;` instead
				} else {
					f = &amp;b.Author
				}
			case &quot;year&quot;:
				if len(b.Year) &gt; 0 { // same logic as for author above
					r = b.Year
				} else {
					f = &amp;b.Year
				}
			case &quot;price&quot;:
				f = &amp;b.Price
			}
			if f != nil {
				if err := d.DecodeElement(f, &amp;t); err != nil {
					return err
				}
			}
			if len(r) &gt; 0 {
				b.Text += &quot; &quot; + r + &quot; &quot; // add empty space for padding the replacement string
			}
		case xml.CharData:
			s := strings.TrimSpace(string(t))
			if len(s) &gt; 0 {
				b.Text += s
			}
		}
	}
	return nil
}

huangapple
  • 本文由 发表于 2022年2月16日 22:00:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/71143243.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定