英文:
Go xml unmarshalling
问题
type XML struct {
A Image xml:"div>div>img"
}
type Image struct {
I string xml:"src,attr"
}
英文:
Is there a way to extract the source of an image in an HTML file using only one struct (with encode/xml
)? Now I have something like this
<!-- language: lang-go -->
type XML struct {
A Image `xml:"div>img"`
}
type Image struct {
I string `xml:"src,attr"`
}
And would be great to only declare something like this :
<!-- language: lang-go -->
type Image struct {
I string `xml:"div>img,src,attr"`
}
This is the HTML :
<div><div><img src="hello.png"/></div></div>
答案1
得分: 1
似乎一个好的方法是使用exp/html
包,像这样:
package main
import (
"exp/html"
"strings"
)
func main() {
a, _ := html.Parse(strings.NewReader(testString))
println(a.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild.Attr[0].Val)
}
var testString = `<div><div><img src="hello.png"/></div></div>`
所有这些FirstChild
和NextSibling
都是必需的,因为exp/html
构建了一个“正确”的html5树,所以这段代码实际上解析了这个:
<html>
<head></head>
<body>
<div>
<div>
<img src="hello.png"/>
</div>
</div>
</body>
</html>
英文:
Seems that a good way is to use the exp/html
package, like this:
<!-- language: lang-go -->
package main
import (
"exp/html"
"strings"
)
func main() {
a, _ := html.Parse(strings.NewReader(testString))
println(a.FirstChild.FirstChild.NextSibling.FirstChild.FirstChild.FirstChild.Attr[0].Val)
}
var testString = `<div><div><img src="hello.png"/></div></div>`
All this FirstChild
and NextSibling
are needed because exp/html
constructs a "correct" html5 tree so this code is actually parsing this:
<html>
<head></head>
<body>
<div>
<div>
<img src="hello.png"/>
</div>
</div>
</body>
</html>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论