2017年6月15日 19:25:38go评论97阅读模式

英文:

encoding/xml Unmarshal on dynamically structure elements

问题

我正在使用Golang处理epub文件，需要从cover.xhtml文件（或者在.opf文件中指定的其他文件）中获取封面图片。

我的问题在于Cover.xhtml文件的动态结构。

每个epub的Cover.xhtml文件结构都不同。例如，

<body>
    <figure id="cover-image">
        <img src="covers/9781449328030_lrg.jpg" alt="First Edition" />
    </figure>
</body>

另一个epub的cover.xhtml文件：

<body>
    <div>
        <img src="@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg" alt="Cover" />
    </div>
</body>

我需要从这个文件中获取img标签的src属性。但是我无法做到。

这是我处理unmarshalling cover.xhtml文件的代码的一部分：

type CPSRCS struct {
    Src string `xml:"src,attr"`
}
type CPIMGS struct {
    Image CPSRCS `xml:"img"`
}
XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
coverFile := CPIMGS{}
err = xml.Unmarshal(XMLContent, &coverFile)
CheckError(err)
fmt.Println(coverFile)

输出结果是：

{{}}

我期望的输出是：

{{covers/9781449328030_lrg.jpg}}

提前感谢！

英文:

I'm working with epubs using Golang, I have to fetch the cover image from cover.xhtml file (or whatever file it is mentioned in .opf file).

My problem is with dynamic structure of elements in the Cover.xhtml files.

Each epubs has different structure on the Cover.xhtml file. For example,

&lt;body&gt;
    &lt;figure id=&quot;cover-image&quot;&gt;
        &lt;img src=&quot;covers/9781449328030_lrg.jpg&quot; alt=&quot;First Edition&quot; /&gt;
    &lt;/figure&gt;
&lt;/body&gt;

Another epub cover.xhtml file

&lt;body&gt;
    &lt;div&gt;
        &lt;img src=&quot;@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg&quot; alt=&quot;Cover&quot; /&gt;
    &lt;/div&gt;
&lt;/body&gt;

I need to fetch the img tag's src attribute from this file. But I couldn't do it.

Here is the part of my Code that deals with unmarshalling the cover.xhtml file

type CPSRCS struct {
    Src string `xml:&quot;src,attr&quot;`
}
type CPIMGS struct {
    Image CPSRCS `xml:&quot;img&quot;`
}
XMLContent, err = ioutil.ReadFile(&quot;./uploads/moby-dick/OPS/cover.xhtml&quot;)
CheckError(err)
coverFile := CPIMGS{}
err = xml.Unmarshal(XMLContent, &amp;coverFile)
CheckError(err)
fmt.Println(coverFile)

The output is:

{{}}

The output I'm expecting is:

{{covers/9781449328030_lrg.jpg}}

Thanks in advance!

答案1

得分: 1

这将从读取的文件中提取出img元素，然后从元素中解析出src属性。这是基于你只需要从文件中获取第一个img元素的假设。

XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)
//解析XMLContent以仅获取img元素
strContent := string(XMLContent)
imgLoc := strings.Index(strContent, "<img")
prefixRem := strContent[imgLoc:]
endImgLoc := strings.Index(prefixRem, "/>")
//向右移动2个位置以恢复'/>'
trimmed := prefixRem[:endImgLoc+2]
var coverFile CPSRCS
err = xml.Unmarshal([]byte(trimmed), &coverFile)
CheckError(err)
fmt.Println(coverFile)

这将产生以下结果：对于第一个输入文件，结果为{covers/9781449328030_lrg.jpg}；对于第二个输入文件，结果为{@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg}。

英文:

This will pull out the img element from the read in file and then unmarshal the src attribute from the element. This is making the assumption that you will only ever need to grab the first img element from the file.

XMLContent, err = ioutil.ReadFile(&quot;./uploads/moby-dick/OPS/cover.xhtml&quot;)
CheckError(err)
//Parse the XMLContent to grab just the img element
strContent := string(XMLContent)
imgLoc := strings.Index(strContent, &quot;&lt;img&quot;)
prefixRem := strContent[imgLoc:]
endImgLoc := strings.Index(prefixRem, &quot;/&gt;&quot;)
//Move over by 2 to recover the &#39;/&gt;&#39;
trimmed := prefixRem[:endImgLoc+2]
var coverFile CPSRCS
err = xml.Unmarshal([]byte(trimmed), &amp;coverFile)
CheckError(err)
fmt.Println(coverFile)

This will produce the result of {covers/9781449328030_lrg.jpg} for the first input file and {@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg} for the second input file you provided.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

encoding/xml在动态结构元素上的解组

问题

答案1

对于给定的一组符文，你想要对其进行排序吗？

Golang – 解析额外的 XML 属性

Go-Gin绑定具有一对多关系的数据

Android Studio预览在向标签布局添加标签项时停止工作。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。