2015年9月18日 00:56:36go评论93阅读模式

英文:

Get text from div without child elements

问题

我正在尝试使用goquery来解析这个HTML。我无法弄清楚如何解析字符串"The string I need"，同时丢弃其他内容。

<div class="outter-class">
    <h1 class="inner-class">
        The string I need
        <span class="other-class">Some value I don't need</span>
        <span class="other-class2" title="sometitle"></span>
    </h1>
    <div class="other-class3">
        <h3>Some heading i don't need</h3>
    </div>
</div>

我尝试使用类似于这样的代码：https://stackoverflow.com/a/8851526/989919，并将其调整为goquery的形式：

test := s.Clone().Children().Empty().End().Text()
fmt.Println(test.Text())

但是这并不起作用。我尝试了很多不同的变体，参考了API文档，但是我无法弄清楚。

英文:

I'm trying to parse this html with the help of goquery. I can't figure out how to parse the string "The string I need" while throwing away everything else.

  &lt;div class=&quot;outter-class&quot;&gt;
        &lt;h1 class=&quot;inner-class&quot;&gt;
        The string I need
        &lt;span class=&quot;other-class&quot; &gt;Some value I don&#39;t need&lt;/span&gt;
        &lt;span class=&quot;other-class2&quot; title=&quot;sometitle&quot;&gt;&lt;/span&gt;
        &lt;/h1&gt;
        &lt;div class=&quot;other-class3&quot;&gt;
            &lt;h3&gt;Some heading i don&#39;t need&lt;/h3&gt;
        &lt;/div&gt;
    &lt;/div&gt;

I tried to use something like: https://stackoverflow.com/a/8851526/989919 by adapting it to goquery like that:

test := s.Clone().Children().Empty().End().Text()
fmt.Println(test.Text())

But that doesn't work. I tried a lot of different variants from the API but I can't figure it out.

答案1

得分: 4

我用以下代码使其工作：

// End() 允许我们跳回到 h1 选择器以获取其文本
text := doc.Find("h1").Children().Remove().End().Text()
text = strings.TrimSpace(text)
fmt.Println(text)

输出结果：

我需要的字符串

该代码从 h1 元素中移除子节点（即 span 元素），以获得正确的输出。可能有更简单的方法，但这个方法有效。

英文:

The way I got it to work was with:

// End() lets us jump back to the h1 selection to get its text
text := doc.Find(&quot;h1&quot;).Children().Remove().End().Text()
text = strings.TrimSpace(text)
fmt.Println(text)

Output:

> The string I need

The code removes the child nodes (the span elements) from the h1 element to get the correct output. There may be an easier way of doing this, but it works.

答案2

得分: 0

以下是翻译好的内容：

我找到的最好的方法是：

text := dom.Find(".inner-class").Nodes[0].FirstChild.Data

我花了很多时间使用goquery下的HTML解析库，所以对我来说这并不像是一种糟糕的方法，但对一些人来说可能会这样认为。

英文:

Best way I've found to accomplish this:

text := dom.Find(&quot;.inner-class&quot;).Nodes[0].FirstChild.Data

I've spent quite a bit of time using the HTML parsing library under goquery so this doesn't really seem hacky to me, but it might to some.

答案3

得分: -2

如何：

doc.Find(".outter-class .inner-class").Children().First().Text()

英文:

How about:

doc.Find(&quot;.outter-class .inner-class&quot;).Children().First().Text()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从没有子元素的 div 中获取文本。

问题

答案1

答案2

答案3

How can you get a reflect.Type instance of an struct without physically creating the struct?

Golang静态文件夹路径返回所有文件

in goLang, who to understand *var.Type

golang pkg/errors如何打印自定义的包装错误？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论