英文:
encoding/xml Unmarshal on dynamically structure elements
问题
我正在使用Golang处理epub文件,需要从cover.xhtml文件(或者在.opf文件中指定的其他文件)中获取封面图片。
我的问题在于Cover.xhtml文件的动态结构。
每个epub的Cover.xhtml文件结构都不同。例如,
<body>
<figure id="cover-image">
<img src="covers/9781449328030_lrg.jpg" alt="First Edition" />
</figure>
</body>
另一个epub的cover.xhtml文件:
<body>
<div>
<img src="@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg" alt="Cover" />
</div>
</body>
我需要从这个文件中获取img标签的src属性。但是我无法做到。
这是我处理unmarshalling cover.xhtml文件的代码的一部分:
type CPSRCS struct {
Src string `xml:"src,attr"`
}
type CPIMGS struct {
Image CPSRCS `xml:"img"`
}
XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
coverFile := CPIMGS{}
err = xml.Unmarshal(XMLContent, &coverFile)
CheckError(err)
fmt.Println(coverFile)
输出结果是:
{{}}
我期望的输出是:
{{covers/9781449328030_lrg.jpg}}
提前感谢!
英文:
I'm working with epubs using Golang, I have to fetch the cover image from cover.xhtml file (or whatever file it is mentioned in .opf file).
My problem is with dynamic structure of elements in the Cover.xhtml files.
Each epubs has different structure on the Cover.xhtml file. For example,
<body>
<figure id="cover-image">
<img src="covers/9781449328030_lrg.jpg" alt="First Edition" />
</figure>
</body>
Another epub cover.xhtml file
<body>
<div>
<img src="@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg" alt="Cover" />
</div>
</body>
I need to fetch the img tag's src attribute from this file. But I couldn't do it.
Here is the part of my Code that deals with unmarshalling the cover.xhtml file
type CPSRCS struct {
Src string `xml:"src,attr"`
}
type CPIMGS struct {
Image CPSRCS `xml:"img"`
}
XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
coverFile := CPIMGS{}
err = xml.Unmarshal(XMLContent, &coverFile)
CheckError(err)
fmt.Println(coverFile)
The output is:
{{}}
The output I'm expecting is:
{{covers/9781449328030_lrg.jpg}}
Thanks in advance!
答案1
得分: 1
这将从读取的文件中提取出img
元素,然后从元素中解析出src属性。这是基于你只需要从文件中获取第一个img
元素的假设。
XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
//解析XMLContent以仅获取img元素
strContent := string(XMLContent)
imgLoc := strings.Index(strContent, "<img")
prefixRem := strContent[imgLoc:]
endImgLoc := strings.Index(prefixRem, "/>")
//向右移动2个位置以恢复'/>'
trimmed := prefixRem[:endImgLoc+2]
var coverFile CPSRCS
err = xml.Unmarshal([]byte(trimmed), &coverFile)
CheckError(err)
fmt.Println(coverFile)
这将产生以下结果:对于第一个输入文件,结果为{covers/9781449328030_lrg.jpg};对于第二个输入文件,结果为{@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg}。
英文:
This will pull out the img
element from the read in file and then unmarshal the src attribute from the element. This is making the assumption that you will only ever need to grab the first img
element from the file.
XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
//Parse the XMLContent to grab just the img element
strContent := string(XMLContent)
imgLoc := strings.Index(strContent, "<img")
prefixRem := strContent[imgLoc:]
endImgLoc := strings.Index(prefixRem, "/>")
//Move over by 2 to recover the '/>'
trimmed := prefixRem[:endImgLoc+2]
var coverFile CPSRCS
err = xml.Unmarshal([]byte(trimmed), &coverFile)
CheckError(err)
fmt.Println(coverFile)
This will produce the result of {covers/9781449328030_lrg.jpg} for the first input file and {@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg} for the second input file you provided.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论