英文:
In Go, how to parse XML with mixed elements/chardata/elements/chardata content?
问题
假设我有一个可以多次引用元素的结构:
<?xml version="1.0" encoding="UTF-8"?>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
Blah Blah Blah Bleh Blah of <year/> written by <author/>
</book>
我该如何解析这个XML(或者更准确地说,我该如何描述这个结构),以便我可以在其中有这些内部引用?
type Book struct{
t string `xml:"book>title"`
p string `xml:"book>price"`
y string `xml:"book>year"`
a string `xml:"book>author"`
blah string ???????
}
天真的方法(https://go.dev/play/p/JVM98pCcI0D),将blah
描述为cdata
显然是错误的,因为引用<year/>
和<author/>
会丢失。
在这里定义blah
的正确方式是什么,以便在解析后仍然可以访问它的内部结构?
英文:
Let's say I have a structure, that can reference elements
multiple times:
<?xml version="1.0" encoding="UTF-8"?>
<book category="cooking">
<title lang="en">Everyday Italian</title>
<author>Giada De Laurentiis</author>
<year>2005</year>
<price>30.00</price>
Blah Blah Blah Bleh Blah of <year/> written by <author/>
</book>
How can I parse this XML (or better to say, how can I describe the structure),
so that I can have these internal references to it?
type Book struct{
t string `xml:"book>title"`
p string `xml:"book>price"`
y string `xml:"book>year"`
a string `xml:"book>author"`
blah string ???????
}
The naïve approach (https://go.dev/play/p/JVM98pCcI0D), just to describe blah
as cdata
is obviously wrong, because the references <year/>
and <author/>
are getting lost.
What is the right way to define blah
here, so that the internal structure of it, is still available after parsing?
答案1
得分: 2
基于icza的评论的解决方案:
func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
for {
t, err := d.Token()
if err != nil {
if err != io.EOF {
return err
}
return nil
}
switch t := t.(type) {
case xml.StartElement:
var f interface{} // 字段
var r string // 替换
switch t.Name.Local {
case "title":
f = &b.Title
case "author":
if len(b.Author) > 0 { // 如果“author”已经解码,则假设这是“blah chardata”中的元素
r = b.Author // 如果你想让<author/>出现在Text中,则使用`r = "<author/>"`代替
} else {
f = &b.Author
}
case "year":
if len(b.Year) > 0 { // 与上面的author逻辑相同
r = b.Year
} else {
f = &b.Year
}
case "price":
f = &b.Price
}
if f != nil {
if err := d.DecodeElement(f, &t); err != nil {
return err
}
}
if len(r) > 0 {
b.Text += " " + r + " " // 添加空格以填充替换字符串
}
case xml.CharData:
s := strings.TrimSpace(string(t))
if len(s) > 0 {
b.Text += s
}
}
}
return nil
}
英文:
A solution based on icza's comment:
func (b *Book) UnmarshalXML(d *xml.Decoder, start xml.StartElement) error {
for {
t, err := d.Token()
if err != nil {
if err != io.EOF {
return err
}
return nil
}
switch t := t.(type) {
case xml.StartElement:
var f interface{} // field
var r string // replace
switch t.Name.Local {
case "title":
f = &b.Title
case "author":
if len(b.Author) > 0 { // if "author" was already decoded then assume this is the element in the "blah chardata"
r = b.Author // if you want <author/> to appear in Text then do `r = "<author/>"` instead
} else {
f = &b.Author
}
case "year":
if len(b.Year) > 0 { // same logic as for author above
r = b.Year
} else {
f = &b.Year
}
case "price":
f = &b.Price
}
if f != nil {
if err := d.DecodeElement(f, &t); err != nil {
return err
}
}
if len(r) > 0 {
b.Text += " " + r + " " // add empty space for padding the replacement string
}
case xml.CharData:
s := strings.TrimSpace(string(t))
if len(s) > 0 {
b.Text += s
}
}
}
return nil
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论