在Go中查询URL而不进行重定向

huangapple go评论83阅读模式
英文:

Query URL without redirect in Go

问题

我正在为一个重定向脚本编写基准测试。

我希望我的程序查询某个重定向到AppStore的URL。但我不希望下载AppStore页面。我只想记录重定向的URL或错误。

如何告诉Go在没有第二次重定向查询的情况下查询URL?


更新

两个答案都是正确的,但是:

我尝试了两种解决方案。我正在进行基准测试。
我运行1个或多个Go进程,每个进程有10到500个Go协程。它们在循环中查询URL。
我的服务器也是用Go编写的。它每秒报告请求数量。

  • 第一种解决方案:http.DefaultTransport.RoundTrip - 速度慢,会出现错误。
    前4秒运行良好。进行300-500次查询后,性能下降到每秒80次查询。

然后下降到每秒0-5次查询,并且查询脚本开始出现如下错误:

dial tcp IP:80: A connection attempt failed because the connected 
party did not properly respond after a period of time, or established 
connection failed because connected host has failed to respond.

我猜它重用了已关闭的连接。

  • 第二种解决方案:CheckRedirect字段以恒定的性能工作。我不确定它是否重用连接,还是为每个请求打开一个新连接。我在循环中为每个请求创建一个client。这是它在实际生活中的行为方式(每个请求都是一个新连接)。有没有办法确保每次查询后连接都被关闭而不是重用?

这就是为什么我打算将第二种解决方案标记为回答我的问题。但对于我的研究来说,每个查询都是一个新连接非常重要。如何确保第二种解决方案中的每个查询都是一个新连接?

英文:

I am writing a benchmark test for a redirect script.

I wisg my program to query certain URL that redirects to AppStore. But I do not wish to download AppStore page. I just wish to log redirect URL or error.

How do I tell Go to query URL without second redirect query?


UPDATE

Both answers are correct BUT:

I tried both solutions. I am doing benchmarking.
I run 1 or many go processes with 10 - 500 go routines. They query URL in a loop.
My server is also written in go. It reports number of requests every second.

  • First solution: http.DefaultTransport.RoundTrip - works slow, gives errors.
    First 4 seconds works fine. Making 300-500 queries then performance drops to 80 query per second.

Then drops to 0-5 query per second and queryies script start getting errors like

dial tcp IP:80: A connection attempt failed because the connected 
party did not properly respond after a period of time, or established 
connection failed because connected host has failed to respond.

I guess it re-use connection that is closed.

  • Second solution: CheckRedirect field works with constant performance. I am not sure if it re-uses connections or it opens a new connection for every request. I create client for every request in a loop. It is how it will behave in a real life (every request is a new connection). Is there way to ensure that connections are closed after each query and not re-used?

That is why I am going to mark second solution as such that answer my question. But for my research it is very important that each query was a new connection. How can I ensure with second solution?

答案1

得分: 20

你需要使用http.Transport而不是http.Client。Transport是更低级的,不会跟随重定向。

req, err := http.NewRequest("GET", "http://example.com/redirectToAppStore", nil)
// ...
resp, err := http.DefaultTransport.RoundTrip(req)
英文:

You need to use an http.Transport instead of an http.Client. Transport is lower-level and does not follow redirects.

req, err := http.NewRequest("GET", "http://example.com/redirectToAppStore", nil)
// ...
resp, err := http.DefaultTransport.RoundTrip(req)

答案2

得分: 14

为了完整起见,您可以使用http.Client而不是跟随重定向。http.Client有一个CheckRedirect字段,它是一个函数。在跟随任何重定向之前,会调用此函数。

如果此函数返回错误,则httpClient.Do(...)不会跟随重定向(请参见Go源代码中的doFollowingRedirects()函数),而是返回一个错误(其具体类型将是url.Error,其URL字段将是重定向到的URL,也就是Location头的值,请参见此代码)。

您可以查看我的gocrawl库,以获取此用法的具体示例。

英文:

For completeness' sake, you can use an http.Client and not follow redirects. http.Client has a CheckRedirect field which is a function. It is called before following any redirection.

If this function returns an error, then httpClient.Do(...) will not follow the redirect (see doFollowingRedirects() function in Go's source code) and instead will return an error (its concrete type will be url.Error, and its URL field will be the redirect-to URL, aka the Location header value, see this code).

You can see my gocrawl library for a concrete example of this use.

huangapple
  • 本文由 发表于 2013年1月20日 07:55:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/14420222.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定