How to efficiently store html response to a file in golang

huangapple go评论137阅读模式
英文:

How to efficiently store html response to a file in golang

问题

我正在尝试使用Golang构建一个网络爬虫。我正在使用net/http库从URL下载HTML文件。我想将http.resphttp.Header保存到文件中。

如何将这两个文件从它们各自的格式转换为字符串,以便可以写入文本文件?

我之前还看到有人问如何解析存储的HTML响应文件的问题。https://stackoverflow.com/questions/33963467/parse-http-requests-and-responses-from-text-file-in-go?rq=1。是否有办法以这种格式保存URL响应?

英文:

I'm trying to build a crawler in Golang. I'm using net/http library to download the html file from url. I'm trying to save http.resp and http.Header into file.

How to convert these two file from their respective format into string so that, it could be written to a text file.

I also see a question asked earlier on parsing a stored html response file. https://stackoverflow.com/questions/33963467/parse-http-requests-and-responses-from-text-file-in-go?rq=1 . Is there any way to save the url response in this format.

答案1

得分: 5

Go语言中有一个httputil包,其中包含一个响应转储(response dump)的功能。
https://golang.org/pkg/net/http/httputil/#DumpResponse
响应转储的第二个参数是一个布尔值,用于指定是否包含响应体。如果你只想保存头部信息到文件中,将该参数设置为false即可。

下面是一个将响应转储到文件的示例函数:

import (
    "io/ioutil"
    "net/http"
    "net/http/httputil"
)

func dumpResponse(resp *http.Response, filename string) error {
    dump, err := httputil.DumpResponse(resp, true)
    if err != nil {
        return err
    }

    return ioutil.WriteFile(filename, dump, 0644)
}
英文:

Go has an httputil package with a response dump.
https://golang.org/pkg/net/http/httputil/#DumpResponse.
The second argument of response dump is a bool of whether or not to include the body. So if you want to save just the header to a file, set that to false.

An example function that would dump the response to a file could be:

import (
    "io/ioutil"
    "net/http"
    "net/http/httputil"
)

func dumpResponse(resp *http.Response, filename string) error {
    dump, err := httputil.DumpResponse(resp, true)
    if err != nil {
        return err
    }

    return ioutil.WriteFile(filename, dump, 0644)
}

答案2

得分: 4

编辑:感谢@JimB指出了http.Response.Write方法,这比我一开始提出的方法要简单得多:

resp, err := http.Get("http://google.com/")

if err != nil{
	log.Panic(err)
}

f, err := os.Create("output.txt")
defer f.Close()

resp.Write(f)

这是我的第一个答案:

你可以这样做:

resp, err := http.Get("http://google.com/")

body, err := ioutil.ReadAll(resp.Body)

// 写入整个响应体
err = ioutil.WriteFile("body.txt", body, 0644)
if err != nil {
	panic(err)
}

这是我对第一个答案的修改:

感谢@Hector Correa添加了头部部分。这是一个更全面的代码片段,针对你的整个问题。它将头部和请求的响应体写入output.txt文件中。

// 获取响应
resp, err := http.Get("http://google.com/")

// 响应体
body, err := ioutil.ReadAll(resp.Body)

// 头部
var header string
for h, v := range resp.Header {
	for _, v := range v {
		header += fmt.Sprintf("%s %s \n", h, v)
	}
}

// 将所有内容追加到一个切片中
var write []byte
write = append(write, []byte(header)...)
write = append(write, body...)

// 将内容写入文件
err = ioutil.WriteFile("output.txt", write, 0644)
if err != nil {
	panic(err)
}
英文:

Edit: Thanks to @JimB for pointing to the http.Response.Write method which makes this a lot easier than I proposed in the beginning:

resp, err := http.Get("http://google.com/")

if err != nil{
	log.Panic(err)
}

f, err := os.Create("output.txt")
defer f.Close()

resp.Write(f)

This was my first Answer

You could do something like this:

resp, err := http.Get("http://google.com/")

body, err := ioutil.ReadAll(resp.Body)

// write whole the body
err = ioutil.WriteFile("body.txt", body, 0644)
if err != nil {
	panic(err)
}

This was the edit to my first answer:

Thanks to @Hector Correa who added the header part. Here is a more comprehensive snippet, targeting your whole question. This writes header followed by the body of the request to output.txt

//get the response
resp, err := http.Get("http://google.com/")

//body
body, err := ioutil.ReadAll(resp.Body)

//header
var header string
for h, v := range resp.Header {
	for _, v := range v {
		header += fmt.Sprintf("%s %s \n", h, v)
	}
}

//append all to one slice
var write []byte
write = append(write, []byte(header)...)
write = append(write, body...)

//write it to a file
err = ioutil.WriteFile("output.txt", write, 0644)
if err != nil {
	panic(err)
}

答案3

得分: 2

根据@Riscie的回答,你也可以使用以下代码从响应中获取头部信息:

for header, values := range resp.Header {
    for _, value := range values {
        log.Printf("\t\t %s %s", header, value)
    }
}
英文:

Following on the answer by @Riscie you could also pick up the headers from the response with something like this:

for header, values := range resp.Header {
	for _, value := range values {
		log.Printf("\t\t %s %s", header, value)
	}
}

huangapple
  • 本文由 发表于 2016年1月25日 22:31:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/34995071.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定