英文:
Get text from div without child elements
问题
我正在尝试使用goquery
来解析这个HTML。我无法弄清楚如何解析字符串"The string I need",同时丢弃其他内容。
<div class="outter-class">
<h1 class="inner-class">
The string I need
<span class="other-class">Some value I don't need</span>
<span class="other-class2" title="sometitle"></span>
</h1>
<div class="other-class3">
<h3>Some heading i don't need</h3>
</div>
</div>
我尝试使用类似于这样的代码:https://stackoverflow.com/a/8851526/989919,并将其调整为goquery的形式:
test := s.Clone().Children().Empty().End().Text()
fmt.Println(test.Text())
但是这并不起作用。我尝试了很多不同的变体,参考了API文档,但是我无法弄清楚。
英文:
I'm trying to parse this html with the help of goquery
. I can't figure out how to parse the string "The string I need" while throwing away everything else.
<div class="outter-class">
<h1 class="inner-class">
The string I need
<span class="other-class" >Some value I don't need</span>
<span class="other-class2" title="sometitle"></span>
</h1>
<div class="other-class3">
<h3>Some heading i don't need</h3>
</div>
</div>
I tried to use something like: https://stackoverflow.com/a/8851526/989919 by adapting it to goquery like that:
test := s.Clone().Children().Empty().End().Text()
fmt.Println(test.Text())
But that doesn't work. I tried a lot of different variants from the API but I can't figure it out.
答案1
得分: 4
我用以下代码使其工作:
// End() 允许我们跳回到 h1 选择器以获取其文本
text := doc.Find("h1").Children().Remove().End().Text()
text = strings.TrimSpace(text)
fmt.Println(text)
输出结果:
我需要的字符串
该代码从 h1 元素中移除子节点(即 span 元素),以获得正确的输出。可能有更简单的方法,但这个方法有效。
英文:
The way I got it to work was with:
// End() lets us jump back to the h1 selection to get its text
text := doc.Find("h1").Children().Remove().End().Text()
text = strings.TrimSpace(text)
fmt.Println(text)
Output:
> The string I need
The code removes the child nodes (the span elements) from the h1 element to get the correct output. There may be an easier way of doing this, but it works.
答案2
得分: 0
以下是翻译好的内容:
我找到的最好的方法是:
text := dom.Find(".inner-class").Nodes[0].FirstChild.Data
我花了很多时间使用goquery下的HTML解析库,所以对我来说这并不像是一种糟糕的方法,但对一些人来说可能会这样认为。
英文:
Best way I've found to accomplish this:
text := dom.Find(".inner-class").Nodes[0].FirstChild.Data
I've spent quite a bit of time using the HTML parsing library under goquery so this doesn't really seem hacky to me, but it might to some.
答案3
得分: -2
如何:
doc.Find(".outter-class .inner-class").Children().First().Text()
英文:
How about:
doc.Find(".outter-class .inner-class").Children().First().Text()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论