从没有子元素的 div 中获取文本。

huangapple go评论84阅读模式
英文:

Get text from div without child elements

问题

我正在尝试使用goquery来解析这个HTML。我无法弄清楚如何解析字符串"The string I need",同时丢弃其他内容。

<div class="outter-class">
    <h1 class="inner-class">
        The string I need
        <span class="other-class">Some value I don't need</span>
        <span class="other-class2" title="sometitle"></span>
    </h1>
    <div class="other-class3">
        <h3>Some heading i don't need</h3>
    </div>
</div>

我尝试使用类似于这样的代码:https://stackoverflow.com/a/8851526/989919,并将其调整为goquery的形式:

test := s.Clone().Children().Empty().End().Text()
fmt.Println(test.Text())

但是这并不起作用。我尝试了很多不同的变体,参考了API文档,但是我无法弄清楚。

英文:

I'm trying to parse this html with the help of goquery. I can't figure out how to parse the string "The string I need" while throwing away everything else.

  &lt;div class=&quot;outter-class&quot;&gt;
        &lt;h1 class=&quot;inner-class&quot;&gt;
        The string I need
        &lt;span class=&quot;other-class&quot; &gt;Some value I don&#39;t need&lt;/span&gt;
        &lt;span class=&quot;other-class2&quot; title=&quot;sometitle&quot;&gt;&lt;/span&gt;
        &lt;/h1&gt;
        &lt;div class=&quot;other-class3&quot;&gt;
            &lt;h3&gt;Some heading i don&#39;t need&lt;/h3&gt;
        &lt;/div&gt;
    &lt;/div&gt;

I tried to use something like: https://stackoverflow.com/a/8851526/989919 by adapting it to goquery like that:

test := s.Clone().Children().Empty().End().Text()
fmt.Println(test.Text()) 

But that doesn't work. I tried a lot of different variants from the API but I can't figure it out.

答案1

得分: 4

我用以下代码使其工作:

// End() 允许我们跳回到 h1 选择器以获取其文本
text := doc.Find("h1").Children().Remove().End().Text()
text = strings.TrimSpace(text)
fmt.Println(text)

输出结果:

我需要的字符串

该代码从 h1 元素中移除子节点(即 span 元素),以获得正确的输出。可能有更简单的方法,但这个方法有效。 从没有子元素的 div 中获取文本。

英文:

The way I got it to work was with:

// End() lets us jump back to the h1 selection to get its text
text := doc.Find(&quot;h1&quot;).Children().Remove().End().Text()
text = strings.TrimSpace(text)
fmt.Println(text)

Output:

> The string I need

The code removes the child nodes (the span elements) from the h1 element to get the correct output. There may be an easier way of doing this, but it works. 从没有子元素的 div 中获取文本。

答案2

得分: 0

以下是翻译好的内容:

我找到的最好的方法是:

text := dom.Find(".inner-class").Nodes[0].FirstChild.Data

我花了很多时间使用goquery下的HTML解析库,所以对我来说这并不像是一种糟糕的方法,但对一些人来说可能会这样认为。

英文:

Best way I've found to accomplish this:

text := dom.Find(&quot;.inner-class&quot;).Nodes[0].FirstChild.Data

I've spent quite a bit of time using the HTML parsing library under goquery so this doesn't really seem hacky to me, but it might to some.

答案3

得分: -2

如何:

doc.Find(".outter-class .inner-class").Children().First().Text()
英文:

How about:

doc.Find(&quot;.outter-class .inner-class&quot;).Children().First().Text()

huangapple
  • 本文由 发表于 2015年9月18日 00:56:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/32635943.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定