如何使用chromedp获取HTTP响应体?

huangapple go评论92阅读模式
英文:

How to get the HTTP response body using chromedp?

问题

使用github.com/knq/chromedp这个go包来使用Chrome调试协议驱动Web浏览器,我可以导航到网页,更新表单和提交表单,但是我需要获取HTTP响应体,但还没有弄清楚如何做到。我想要能够获取JSON响应的HTTP响应体(而不是HTML)。

从代码中看,HTTP响应体似乎在CachedResponse.Body属性中:

https://github.com/knq/chromedp/blob/b9e4c14157325be092c1c1137edbd584648d8c72/cdp/cachestorage/types.go#L30

并且可以使用以下方式访问:

func (p *RequestCachedResponseParams) Do(ctxt context.Context, h cdp.Handler) (response *CachedResponse, err error)

https://github.com/knq/chromedp/blob/b9e4c14157325be092c1c1137edbd584648d8c72/cdp/cachestorage/cachestorage.go#L168

示例中使用了cdp.Tasks,例如简单示例中的以下代码:

func googleSearch(q, text string, site, res *string) cdp.Tasks {
    var buf []byte
    sel := fmt.Sprintf(`//a[text()[contains(., '%s')]]`, text)
    return cdp.Tasks{
	    cdp.Navigate(`https://www.google.com`),
	    cdp.Sleep(2 * time.Second),
    	cdp.WaitVisible(`#hplogo`, cdp.ByID),
	    cdp.SendKeys(`#lst-ib`, q+"\n", cdp.ByID),
    	cdp.WaitVisible(`#res`, cdp.ByID),
	    cdp.Text(sel, res),
	    cdp.Click(sel),
    	cdp.Sleep(2 * time.Second),
	    cdp.WaitVisible(`#footer`, cdp.ByQuery),
    	cdp.WaitNotVisible(`div.v-middle > div.la-ball-clip-rotate`, cdp.ByQuery),
	    cdp.Location(site),
    	cdp.Screenshot(`#testimonials`, &buf, cdp.ByID),
	    cdp.ActionFunc(func(context.Context, cdptypes.Handler) error {
		    return ioutil.WriteFile("testimonials.png", buf, 0644)
	    }),
    }
}

https://github.com/knq/chromedp/blob/b9e4c14157325be092c1c1137edbd584648d8c72/examples/simple/main.go

似乎可以通过调用RequestCachedResponseParams.Do()并引用RequestCachedResponseParams.CacheID来访问CachedResponse.Body,但仍然需要以下内容:

  1. 如何在cdp.Tasks中调用RequestCachedResponseParams.Do() - 可以使用cdp.ActionFunc()实现
  2. 如何访问RequestCachedResponseParams.CacheID
英文:

Using github.com/knq/chromedp, a go package to drive web browsers using Chrome Debugging Protocol, I can navigate to webpages, update forms and submit forms, but I need to retrieve a HTTP response body and haven't figured out how to yet. I'd like to be able to retrieve the HTTP response body for a JSON response (not HTML).

From looking in the code, it seems the HTTP response body is in the CachedResponse.Body property:

https://github.com/knq/chromedp/blob/b9e4c14157325be092c1c1137edbd584648d8c72/cdp/cachestorage/types.go#L30

And that it should be accessible using:

func (p *RequestCachedResponseParams) Do(ctxt context.Context, h cdp.Handler) (response *CachedResponse, err error)

https://github.com/knq/chromedp/blob/b9e4c14157325be092c1c1137edbd584648d8c72/cdp/cachestorage/cachestorage.go#L168

The examples use cdp.Tasks such as the following from the simple example.

func googleSearch(q, text string, site, res *string) cdp.Tasks {
    var buf []byte
    sel := fmt.Sprintf(`//a[text()[contains(., '%s')]]`, text)
    return cdp.Tasks{
	    cdp.Navigate(`https://www.google.com`),
	    cdp.Sleep(2 * time.Second),
    	cdp.WaitVisible(`#hplogo`, cdp.ByID),
	    cdp.SendKeys(`#lst-ib`, q+"\n", cdp.ByID),
    	cdp.WaitVisible(`#res`, cdp.ByID),
	    cdp.Text(sel, res),
	    cdp.Click(sel),
    	cdp.Sleep(2 * time.Second),
	    cdp.WaitVisible(`#footer`, cdp.ByQuery),
    	cdp.WaitNotVisible(`div.v-middle > div.la-ball-clip-rotate`, cdp.ByQuery),
	    cdp.Location(site),
    	cdp.Screenshot(`#testimonials`, &buf, cdp.ByID),
	    cdp.ActionFunc(func(context.Context, cdptypes.Handler) error {
		    return ioutil.WriteFile("testimonials.png", buf, 0644)
	    }),
    }
}

https://github.com/knq/chromedp/blob/b9e4c14157325be092c1c1137edbd584648d8c72/examples/simple/main.go

It seems like the CachedResponse.Body can be accessed by calling RequestCachedResponseParams.Do() by referencing RequestCachedResponseParams.CacheID, but the following is still needed::

  1. how to call RequestCachedResponseParams.Do() in cdp.Tasks - seems possible using cdp.ActionFunc()
  2. how to get access to RequestCachedResponseParams.CacheID

答案1

得分: 5

如果你想获取请求的响应,我是这样做的。

这个示例调用了http://www.google.com,并监听EventResponseReceived事件来获取包含头部信息的响应。

package main

import (
	"context"
	"io/ioutil"
	"log"
	"os"
	"time"

	"github.com/chromedp/cdproto/network"
	"github.com/chromedp/chromedp"
)

func main() {
	dir, err := ioutil.TempDir("", "chromedp-example")
	if err != nil {
		panic(err)
	}
	defer os.RemoveAll(dir)

	opts := append(chromedp.DefaultExecAllocatorOptions[:],
		chromedp.DisableGPU,
		chromedp.NoDefaultBrowserCheck,
		chromedp.Flag("headless", false),
		chromedp.Flag("ignore-certificate-errors", true),
		chromedp.Flag("window-size", "50,400"),
		chromedp.UserDataDir(dir),
	)

	allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
	defer cancel()

	// 设置自定义日志记录器
	taskCtx, cancel := chromedp.NewContext(allocCtx, chromedp.WithLogf(log.Printf))
	defer cancel()

	// 创建超时
	taskCtx, cancel = context.WithTimeout(taskCtx, 10*time.Second)
	defer cancel()

	// 确保浏览器进程已启动
	if err := chromedp.Run(taskCtx); err != nil {
		panic(err)
	}

	// 监听网络事件
	listenForNetworkEvent(taskCtx)

	chromedp.Run(taskCtx,
		network.Enable(),
		chromedp.Navigate(`http://www.google.com`),
		chromedp.WaitVisible(`body`, chromedp.BySearch),
	)

}

func listenForNetworkEvent(ctx context.Context) {
	chromedp.ListenTarget(ctx, func(ev interface{}) {
		switch ev := ev.(type) {

		case *network.EventResponseReceived:
			resp := ev.Response
			if len(resp.Headers) != 0 {
				log.Printf("received headers: %s", resp.Headers)
			}

		}
		// 其他需要的网络事件
	})
}

以上是获取请求响应的示例代码。

英文:

If you want to get request response, that's how I managed to do it.

This sample call http://www.google.com and listen EventResponseReceived to keep Response that contains Headers for example.

package main
import (
"context"
"io/ioutil"
"log"
"os"
"time"
"github.com/chromedp/cdproto/network"
"github.com/chromedp/chromedp"
)
func main() {
dir, err := ioutil.TempDir("", "chromedp-example")
if err != nil {
panic(err)
}
defer os.RemoveAll(dir)
opts := append(chromedp.DefaultExecAllocatorOptions[:],
chromedp.DisableGPU,
chromedp.NoDefaultBrowserCheck,
chromedp.Flag("headless", false),
chromedp.Flag("ignore-certificate-errors", true),
chromedp.Flag("window-size", "50,400"),
chromedp.UserDataDir(dir),
)
allocCtx, cancel := chromedp.NewExecAllocator(context.Background(), opts...)
defer cancel()
// also set up a custom logger
taskCtx, cancel := chromedp.NewContext(allocCtx, chromedp.WithLogf(log.Printf))
defer cancel()
// create a timeout
taskCtx, cancel = context.WithTimeout(taskCtx, 10*time.Second)
defer cancel()
// ensure that the browser process is started
if err := chromedp.Run(taskCtx); err != nil {
panic(err)
}
// listen network event
listenForNetworkEvent(taskCtx)
chromedp.Run(taskCtx,
network.Enable(),
chromedp.Navigate(`http://www.google.com`),
chromedp.WaitVisible(`body`, chromedp.BySearch),
)
}
func listenForNetworkEvent(ctx context.Context) {
chromedp.ListenTarget(ctx, func(ev interface{}) {
switch ev := ev.(type) {
case *network.EventResponseReceived:
resp := ev.Response
if len(resp.Headers) != 0 {
log.Printf("received headers: %s", resp.Headers)
}
}
// other needed network Event
})
}

huangapple
  • 本文由 发表于 2017年8月22日 12:13:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/45808799.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定