英文:
Selectively Follow Redirects in Go
问题
我正在尝试编写一个Twitter阅读器,它可以解析链接缩短器等的最终URL,但在此过程中给我提供一个URL列表,其中包含手动定义的主机模式。这样做的原因是我不想得到付费墙的URL,而是得到它之前的URL。
据我所知,要实现这一点,可以基于默认的RoundTripper
编写自己的客户端,因为从自定义的CheckRedirect
函数返回错误会中止客户端而不产生响应。
是否有办法在自定义的checkRedirect
函数中使用默认的client
并记录URL列表/特定URL?
英文:
I'm trying to write a twitter reader that resolves the final URLs of link shorteners etc, but gives me a URL along the way for a list of manually defined host patterns. The reason to do this is that i don't want to end up with the paywall URL but the one before.
As far as i can tell the way to do this is write my own client based on the default RoundTripper
because returning an error from a custom CheckRedirect
function aborts the client without yielding a response.
Is there a way to use the default client
and record a list of URLs/specific URL from a custom checkRedirect
function?
答案1
得分: 3
客户端请求在自定义的CheckResponse
返回错误的情况下,实际上仍然会返回最后一个有效的Response
(如评论中所提到的)。
http://golang.org/pkg/net/http/#Client
如果
CheckRedirect
返回错误,Client
的Get
方法将返回先前的Response
和CheckRedirect
的错误(包装在url.Error
中),而不是发出请求req
。
如果你维护一个“已知”的付费墙网址列表,你可以在CheckResponse
中使用自定义的error
类型(在下面的示例中为Paywalled
)中断付费墙重定向。你的错误处理代码稍后必须将该错误类型视为特殊的(非错误)情况。
示例:
package main
import (
"errors"
"fmt"
"net/http"
"net/url"
)
var Paywalled = errors.New("下一个重定向将遇到付费墙")
var badHosts = map[string]error{
"registration.ft.com": Paywalled,
}
var client = &http.Client{
CheckRedirect: func(req *http.Request, via []*http.Request) error {
// 注意:在生产环境中,还要检查重定向循环
return badHosts[req.URL.Host]
},
}
func main() {
resp, err := client.Get("http://on.ft.com/14pQBYE")
// 如果err是包装在url.Error中的`Paywalled`,则忽略非nil的err
if e, ok := err.(*url.Error); (ok && e.Err != Paywalled) || (!ok && err != nil) {
fmt.Println("错误:", err)
return
}
resp.Body.Close()
fmt.Println(resp.Request.URL)
}
英文:
The client request will actually still return the last valid Response
in cases where your custom CheckResponse
yields an error (As mentioned in the comments).
http://golang.org/pkg/net/http/#Client
> If CheckRedirect returns an error, the Client's Get method returns both the previous Response and CheckRedirect's error (wrapped in a url.Error) instead of issuing the Request req.
If you maintain a list of "known" paywall-urls, you can abort the paywall-redirect in your CheckResponse
with a custom error
type (Paywalled
in the example below).
Your error handling code later has to consider that error type as a special (non-erroneous) case.
Example:
package main
import (
"errors"
"fmt"
"net/http"
"net/url"
)
var Paywalled = errors.New("next redirect would hit a paywall")
var badHosts = map[string]error{
"registration.ft.com": Paywalled,
}
var client = &http.Client{
CheckRedirect: func(req *http.Request, via []*http.Request) error {
// N.B.: when used in production, also check for redirect loops
return badHosts[req.URL.Host]
},
}
func main() {
resp, err := client.Get("http://on.ft.com/14pQBYE")
// ignore non-nil err if it's a `Paywalled` wrapped in url.Error
if e, ok := err.(*url.Error); (ok && e.Err != Paywalled) || (!ok && err != nil) {
fmt.Println("error: ", err)
return
}
resp.Body.Close()
fmt.Println(resp.Request.URL)
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论