How can I convert a text node to a string in Go with Gokogiri?

huangapple go评论82阅读模式
英文:

How can I convert a text node to a string in Go with Gokogiri?

问题

对于我的第一个Go编程尝试,我正在尝试自动下载Psiu Puxa上的精美壁纸,并将图像保存为基于HTML中帖子标题的文件名。

然而,我还没有找到如何将文本节点的值作为字符串获取的方法。

简化的示例HTML如下:

<div class="post">
	<a class="w-inline-block post-name-link" href="/posts/mars-30">
		<h4>#80 Martian Landscape</h4>
	</a>
</div>
<div class="post">
	<a class="w-inline-block post-name-link" href="#">
		<h4><strong>#79 MARTIAN terrain</strong></h4>
	</a>
</div>

我的Go包如下:

package main

import (
	"fmt"
	"net/http"
	"io/ioutil"
	"github.com/moovweb/gokogiri"
)

func main() {
	resp, _ := http.Get("http://psiupuxa3.webflow.io/")
	page, _ := ioutil.ReadAll(resp.Body)
	resp.Body.Close()

	doc, _ := gokogiri.ParseHtml(page)
	res, _ := doc.Search("//div[@class='post']")
	defer doc.Free()

	for i := range res {
		postTitleRes, _ := res[i].Search("a[contains(@class,'post-name-link')]//text()")
		fmt.Printf("%T: %v\n", postTitleRes, postTitleRes)
	}

}

结果如下:

<!-- language: lang-none -->

[]xml.Node: [#80 Martian Landscape]
[]xml.Node: [#79 MARTIAN terrain]
[]xml.Node: [#78 MARTIAN TERRAIN]

我如何获取#79 MARTIAN terrain等字符串以便稍后保存文件时使用?

我尝试过postTitle := postTitleRes.String(),但是这个方法显然对于xml.Node不可用。我花了一些时间查看了Gokogiri的源代码,并找到了关于强制转换为字符串的方法/说明,但我感到很困惑,希望能得到任何指导。

英文:

For my first programming attempt with Go I'm trying to automate the downloading of the lovely wallpapers from Psiu Puxa, saving the images with filenames based on titles in the posts in the HTML.

However, I haven't found how to get the value of a text node as a string.

Example HTML, simplified:

&lt;div class=&quot;post&quot;&gt;
	&lt;a class=&quot;w-inline-block post-name-link&quot; href=&quot;/posts/mars-30&quot;&gt;
		&lt;h4&gt;#80 Martian Landscape&lt;/h4&gt;
	&lt;/a&gt;
&lt;/div&gt;
&lt;div class=&quot;post&quot;&gt;
	&lt;a class=&quot;w-inline-block post-name-link&quot; href=&quot;#&quot;&gt;
		&lt;h4&gt;&lt;strong&gt;#79 MARTIAN terrain&lt;/strong&gt;&lt;/h4&gt;
	&lt;/a&gt;
&lt;/div&gt;

My Go package:

package main

import (
	&quot;fmt&quot;
	&quot;net/http&quot;
	&quot;io/ioutil&quot;
	&quot;github.com/moovweb/gokogiri&quot;
)

func main() {
	resp, _ := http.Get(&quot;http://psiupuxa3.webflow.io/&quot;)
	page, _ := ioutil.ReadAll(resp.Body)
	resp.Body.Close()

	doc, _ := gokogiri.ParseHtml(page)
	res, _ := doc.Search(&quot;//div[@class=&#39;post&#39;]&quot;)
	defer doc.Free()

	for i := range res {
		postTitleRes, _ := res[i].Search(&quot;a[contains(@class,&#39;post-name-link&#39;)]//text()&quot;)
		fmt.Printf(&quot;%T: %v\n&quot;, postTitleRes, postTitleRes)
	}

}

Result:

<!-- language: lang-none -->

[]xml.Node: [#80 Martian Landscape]
[]xml.Node: [#79 MARTIAN terrain]
[]xml.Node: [#78 MARTIAN TERRAIN]

How can I obtain #79 MARTIAN terrain, etc., as strings for later use when saving files?

I've tried postTitle := postTitleRes.String() but the method apparently isn't available for xml.Node. I've spent some time looking through Gokogiri's source code and have found methods/instructions on coercing to strings, but I'm quite lost and would appreciate any pointers.

答案1

得分: 0

你那里有一个 xml.Node 结构的数组。你需要访问该数组中包含的节点。

如果你确定只有一个元素,你可以使用以下代码:

postTitleRes[0].Content()

如果要捕获所有这些节点,可以使用以下代码:

for _, node := range postTitleRes {
    fmt.Printf("%T: %v\n", node, node.Content())
}

你可以看到,一旦你拥有一个单独的 xml.Node,就可以使用 Content 函数。定义

英文:

You've got an array of xml.Node structs there. You would need to access the nodes contained in that array.

If you're sure you have one element then you can

postTitleRes[0].Content()

or to capture all of those nodes:

for _, node := range postTitleRes {
	fmt.Printf(&quot;%T: %v\n&quot;, node, node.Content())
}

You can see that the Content function should be available to you once you have a singular xml.Node. Definition.

huangapple
  • 本文由 发表于 2015年9月4日 03:11:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/32383662.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定