英文:
Equivalent to Python's HTML parsing function/module in Go?
问题
我现在正在学习Go,并且在获取和解析HTML/XML方面遇到了困难。在Python中,当我进行网页抓取时,通常会编写以下代码:
from urllib.request import urlopen, Request
url = "http://stackoverflow.com/"
req = Request(url)
html = urlopen(req).read()
然后我可以得到以string
或bytes
形式表示的原始HTML/XML,并继续处理它。在Go中,我该如何处理呢?我希望得到的是以string
或[]byte
形式存储的原始HTML数据(虽然可以轻松转换,但我不介意得到哪种形式)。我考虑使用gokogiri包来进行Go语言的网页抓取(不确定最终是否会使用它!),但它似乎需要在处理之前获得原始HTML文本...
那么我该如何获取这样的对象呢?
或者在Go中有更好的方法来进行网页抓取吗?
谢谢。
英文:
I'm now learning Go myself and am stuck in getting and parsing HTML/XML. In Python, I usually write the following code when I do web scraping:
from urllib.request import urlopen, Request
url = "http://stackoverflow.com/"
req = Request(url)
html = urlopen(req).read()
, then I can get raw HTML/XML in a form of either string
or bytes
and proceed to work with it. In Go, how can I cope with it? What I hope to get is raw HTML data which is stored either in string
or []byte
(though it can be easily converted, that I don't mind which to get at all). I consider using gokogiri package to do web scraping in Go (not sure I'll indeed end up with using it!), but it looks like it requires raw HTML text before doing any work with it...
So how can I acquire such object?
Or is there any better way to do web scraping work in Go?
Thanks.
答案1
得分: 2
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
res, err := http.Get("http://www.google.com/robots.txt")
if err != nil {
log.Fatal(err)
}
robots, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
}
将返回http://www.google.com/robots.txt
的内容存入字符串变量robots
。
要进行XML解析,请查看Go encoding/xml
包。
英文:
From the Go http.Get
Example:
package main
import (
"fmt"
"io/ioutil"
"log"
"net/http"
)
func main() {
res, err := http.Get("http://www.google.com/robots.txt")
if err != nil {
log.Fatal(err)
}
robots, err := ioutil.ReadAll(res.Body)
res.Body.Close()
if err != nil {
log.Fatal(err)
}
fmt.Printf("%s", robots)
}
Will return the contents of http://www.google.com/robots.txt
into the string variable robots
.
For XML parsing look into the Go encoding/xml
package.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论