如何使用goquery从HTML页面中获取简单文本?

huangapple go评论113阅读模式
英文:

How to get simple text from HTML page with goquery?

问题

我是你的中文翻译助手,以下是翻译好的内容:

我刚开始学习Go语言。我正在使用goquery从HTML页面中提取数据。
但问题是,我要查找的数据没有被任何HTML标签包围。它是在<br>标签之后的简单文本。我该如何提取它?

编辑:这是HTML代码。

<div class="container">
    <div class="row">
      <div class="col-lg-8">
	    <p align="justify"><b>Name</b>Priyaka</p>
		<p align="justify"><b>Surname</b>Patil</p>
        <p align="justify"><b>Adress</b><br>India,Kolhapur</p>
        <p align="justify"><b>Hobbies </b><br>Playing</p>
        <p align="justify"><b>Eduction</b><br>12th</p>
        <p align="justify"><b>School</b><br>New Highschool</p>
       </div>
    </div>
</div>

我想要提取的是“Priyanka”和“12th”。

英文:

I am new to Go. I am using goquery to extract data from an HTML page.
But the problem is the data I am looking for is not bounded by any HTML tag. It is simple text after a <br> tag. How can I extract it?

Edit : Here is HTML code.

<div class="container">
    <div class="row">
      <div class="col-lg-8">
	    <p align="justify"><b>Name</b>Priyaka</p>
		<p align="justify"><b>Surname</b>Patil</p>
        <p align="justify"><b>Adress</b><br>India,Kolhapur</p>
        <p align="justify"><b>Hobbies </b><br>Playing</p>
        <p align="justify"><b>Eduction</b><br>12th</p>
        <p align="justify"><b>School</b><br>New Highschool</p>
       </div>
    </div>
</div>

From this I want "Priyanka" and "12th".

答案1

得分: 3

以下是您想要的内容:

doc.Find(".container").Find("[align=\"justify\"]").Each(func(_ int, s *goquery.Selection) {
    prefix := s.Find("b").Text()
    result := strings.TrimPrefix(s.Text(), prefix)
    println(result)
})

在您的代码前面导入 strings。如果您需要完整的代码示例,请查看这里

英文:

The following is what you want:

doc.Find(".container").Find("[align=\"justify\"]").Each(func(_ int, s *goquery.Selection) {
    prefix := s.Find("b").Text()
    result := strings.TrimPrefix(s.Text(), prefix)
    println(result)
})

import strings in front of your code. If you need complete code example, check here.

答案2

得分: 0

尝试查询<br>并获取其兄弟节点

http://godoc.org/github.com/PuerkitoBio/goquery#Selection.Siblings

英文:

Try query for <br> and get its siblings

http://godoc.org/github.com/PuerkitoBio/goquery#Selection.Siblings

huangapple
  • 本文由 发表于 2015年7月20日 18:40:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/31514414.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定