英文:
Go - Getting the text of a single particular HTML element from a document with a known structure
问题
在我正在编写的一个小脚本中,我向一个Web服务发送POST请求,并收到一个HTML文档作为响应。除了一个textarea
的内容之外,这个文档对我的需求来说基本无关紧要。这个textarea
是页面上唯一的textarea
,并且它有一个我事先知道的特定的name
。我想获取那个文本,而不用担心文档中的其他内容。目前我正在使用正则表达式来获取正确的行,然后删除标签,但我觉得可能有更好的方法。
这是文档的样子:
<html><body>
<form name="query" action="http://www.example.net/action.php" method="post">
<textarea type="text" name="nameiknow"/>The text I want</textarea>
<div id="button">
<input type="submit" value="Submit" />
</div>
</form>
</body></html>
这是我目前获取文本的方法:
s := string(body)
// 获取我想要的行
r, _ := regexp.Compile("<textarea.*name=(\"|')nameiknow(\"|').*textarea>")
s = r.FindString(s)
// 删除标签
r, _ = regexp.Compile("<[^>]*>")
s = r.ReplaceAllString(s, "")
我认为在这种情况下使用完整的HTML解析器可能有点过头了,这就是为什么我选择了这个方向,尽管我不知道是否有更好的方法。
感谢您可能提供的任何建议。
英文:
In a little script I'm writing, I make a POST to a web service and receive an HTML document in response. This document is largely irrelevant to my needs, with the exception of the contents of a single textarea
. This textarea
is the only textarea
in the page and it has a particular name
that I know ahead of time. I want to grab that text without worrying about anything else in the document. Currently I'm using regex to get the correct line and then to delete the tags, but I feel like there's probably a better way.
Here's what the document looks like:
<html><body>
<form name="query" action="http://www.example.net/action.php" method="post">
<textarea type="text" name="nameiknow"/>The text I want</textarea>
<div id="button">
<input type="submit" value="Submit" />
</div>
</form>
</body></html>
And here's how I'm currently getting the text:
s := string(body)
// Gets the line I want
r, _ := regexp.Compile("<textarea.*name=(\"|')nameiknow(\"|').*textarea>")
s = r.FindString(s)
// Deletes the tags
r, _ = regexp.Compile("<[^>]*>")
s = r.ReplaceAllString(s, "")
I think using a full HTML parser might be a bit too much in this case, which is why I went in this direction, though for all I know there's something much better out there.
I appreciate any advice you may have.
答案1
得分: 4
请看这个包:https://github.com/PuerkitoBio/goquery。它类似于Go语言的jQuery。它允许你做一些事情,比如:
text := doc.Find("strong").Text()
完整的工作示例:
package main
import (
"bytes"
"fmt"
"github.com/PuerkitoBio/goquery"
)
var s = `<html><body>
<form name="query" action="http://www.example.net/action.php" method="post">
<textarea type="text" name="nameiknow">The text I want</textarea>
<div id="button">
<input type="submit" value="Submit" />
</div>
</form>
</body></html>`
func main() {
r := bytes.NewReader([]byte(s))
doc, _ := goquery.NewDocumentFromReader(r)
text := doc.Find("textarea").Text()
fmt.Println(text)
}
输出结果为:"The text I want"。
英文:
Take a look at this package: https://github.com/PuerkitoBio/goquery. It's like jQuery but for Go. It allows you to do things like
text := doc.Find("strong").Text()
Full working example:
package main
import (
"bytes"
"fmt"
"github.com/PuerkitoBio/goquery"
)
var s = `<html><body>
<form name="query" action="http://www.example.net/action.php" method="post">
<textarea type="text" name="nameiknow">The text I want</textarea>
<div id="button">
<input type="submit" value="Submit" />
</div>
</form>
</body></html>`
func main() {
r := bytes.NewReader([]byte(s))
doc, _ := goquery.NewDocumentFromReader(r)
text := doc.Find("textarea").Text()
fmt.Println(text)
}
Prints: "The text I want".
答案2
得分: 2
尽管使用正则表达式解析HTML并不是最佳实践,但根据您的要求,以下是代码:
(<textarea\b[^>]*\bname\s*=\s*(?:\"|')\s*nameiknow\s*(?:\"|')[^<]*<\/textarea>)
英文:
Though this is not the best practice to parse HTML using regex. But as you wished, here it is:
(<textarea\b[^>]*\bname\s*=\s*(?:\"|')\s*nameiknow\s*(?:\"|')[^<]*<\/textarea>)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论