在Go语言中,如何解析包含混合元素/字符数据/元素/字符数据内容的XML?

huangapple go评论126阅读模式
英文:

In Go, how to parse XML with mixed elements/chardata/elements/chardata content?

问题

假设我有一个可以多次引用元素的结构:

  1. <?xml version="1.0" encoding="UTF-8"?>
  2. <book category="cooking">
  3. <title lang="en">Everyday Italian</title>
  4. <author>Giada De Laurentiis</author>
  5. <year>2005</year>
  6. <price>30.00</price>
  7. Blah Blah Blah Bleh Blah of <year/> written by <author/>
  8. </book>

我该如何解析这个XML(或者更准确地说,我该如何描述这个结构),以便我可以在其中有这些内部引用

  1. type Book struct{
  2. t string `xml:"book>title"`
  3. p string `xml:"book>price"`
  4. y string `xml:"book>year"`
  5. a string `xml:"book>author"`
  6. blah string ???????
  7. }

天真的方法(https://go.dev/play/p/JVM98pCcI0D),将blah描述为cdata显然是错误的,因为引用<year/><author/>会丢失。

在这里定义blah的正确方式是什么,以便在解析后仍然可以访问它的内部结构?

英文:

Let's say I have a structure, that can reference elements
multiple times:

  1. &lt;?xml version=&quot;1.0&quot; encoding=&quot;UTF-8&quot;?&gt;
  2. &lt;book category=&quot;cooking&quot;&gt;
  3. &lt;title lang=&quot;en&quot;&gt;Everyday Italian&lt;/title&gt;
  4. &lt;author&gt;Giada De Laurentiis&lt;/author&gt;
  5. &lt;year&gt;2005&lt;/year&gt;
  6. &lt;price&gt;30.00&lt;/price&gt;
  7. Blah Blah Blah Bleh Blah of &lt;year/&gt; written by &lt;author/&gt;
  8. &lt;/book&gt;

How can I parse this XML (or better to say, how can I describe the structure),
so that I can have these internal references to it?

  1. type Book struct{
  2. t string `xml:&quot;book&gt;title&quot;`
  3. p string `xml:&quot;book&gt;price&quot;`
  4. y string `xml:&quot;book&gt;year&quot;`
  5. a string `xml:&quot;book&gt;author&quot;`
  6. blah string ???????
  7. }

The naïve approach (https://go.dev/play/p/JVM98pCcI0D), just to describe blah as cdata is obviously wrong, because the references &lt;year/&gt; and &lt;author/&gt; are getting lost.

What is the right way to define blah here, so that the internal structure of it, is still available after parsing?

答案1

得分: 2

基于icza的评论的解决方案:

  1. func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
  2. for {
  3. t, err := d.Token()
  4. if err != nil {
  5. if err != io.EOF {
  6. return err
  7. }
  8. return nil
  9. }
  10. switch t := t.(type) {
  11. case xml.StartElement:
  12. var f interface{} // 字段
  13. var r string // 替换
  14. switch t.Name.Local {
  15. case "title":
  16. f = &b.Title
  17. case "author":
  18. if len(b.Author) > 0 { // 如果“author”已经解码,则假设这是“blah chardata”中的元素
  19. r = b.Author // 如果你想让<author/>出现在Text中,则使用`r = "<author/>"`代替
  20. } else {
  21. f = &b.Author
  22. }
  23. case "year":
  24. if len(b.Year) > 0 { // 与上面的author逻辑相同
  25. r = b.Year
  26. } else {
  27. f = &b.Year
  28. }
  29. case "price":
  30. f = &b.Price
  31. }
  32. if f != nil {
  33. if err := d.DecodeElement(f, &t); err != nil {
  34. return err
  35. }
  36. }
  37. if len(r) > 0 {
  38. b.Text += " " + r + " " // 添加空格以填充替换字符串
  39. }
  40. case xml.CharData:
  41. s := strings.TrimSpace(string(t))
  42. if len(s) > 0 {
  43. b.Text += s
  44. }
  45. }
  46. }
  47. return nil
  48. }
英文:

A solution based on icza's comment:

  1. func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
  2. for {
  3. t, err := d.Token()
  4. if err != nil {
  5. if err != io.EOF {
  6. return err
  7. }
  8. return nil
  9. }
  10. switch t := t.(type) {
  11. case xml.StartElement:
  12. var f interface{} // field
  13. var r string // replace
  14. switch t.Name.Local {
  15. case &quot;title&quot;:
  16. f = &amp;b.Title
  17. case &quot;author&quot;:
  18. if len(b.Author) &gt; 0 { // if &quot;author&quot; was already decoded then assume this is the element in the &quot;blah chardata&quot;
  19. r = b.Author // if you want &lt;author/&gt; to appear in Text then do `r = &quot;&lt;author/&gt;&quot;` instead
  20. } else {
  21. f = &amp;b.Author
  22. }
  23. case &quot;year&quot;:
  24. if len(b.Year) &gt; 0 { // same logic as for author above
  25. r = b.Year
  26. } else {
  27. f = &amp;b.Year
  28. }
  29. case &quot;price&quot;:
  30. f = &amp;b.Price
  31. }
  32. if f != nil {
  33. if err := d.DecodeElement(f, &amp;t); err != nil {
  34. return err
  35. }
  36. }
  37. if len(r) &gt; 0 {
  38. b.Text += &quot; &quot; + r + &quot; &quot; // add empty space for padding the replacement string
  39. }
  40. case xml.CharData:
  41. s := strings.TrimSpace(string(t))
  42. if len(s) &gt; 0 {
  43. b.Text += s
  44. }
  45. }
  46. }
  47. return nil
  48. }

huangapple
  • 本文由 发表于 2022年2月16日 22:00:04
  • 转载请务必保留本文链接:https://go.coder-hub.com/71143243.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定