在Go语言中选择性地跟随重定向

huangapple go评论83阅读模式
英文:

Selectively Follow Redirects in Go

问题

我正在尝试编写一个Twitter阅读器,它可以解析链接缩短器等的最终URL,但在此过程中给我提供一个URL列表,其中包含手动定义的主机模式。这样做的原因是我不想得到付费墙的URL,而是得到它之前的URL。

据我所知,要实现这一点,可以基于默认的RoundTripper编写自己的客户端,因为从自定义的CheckRedirect函数返回错误会中止客户端而不产生响应。

是否有办法在自定义的checkRedirect函数中使用默认的client并记录URL列表/特定URL?

英文:

I'm trying to write a twitter reader that resolves the final URLs of link shorteners etc, but gives me a URL along the way for a list of manually defined host patterns. The reason to do this is that i don't want to end up with the paywall URL but the one before.

As far as i can tell the way to do this is write my own client based on the default RoundTripper because returning an error from a custom CheckRedirect function aborts the client without yielding a response.

Is there a way to use the default client and record a list of URLs/specific URL from a custom checkRedirect function?

答案1

得分: 3

客户端请求在自定义的CheckResponse返回错误的情况下,实际上仍然会返回最后一个有效的Response(如评论中所提到的)。

http://golang.org/pkg/net/http/#Client

如果CheckRedirect返回错误,ClientGet方法将返回先前的ResponseCheckRedirect的错误(包装在url.Error中),而不是发出请求req

如果你维护一个“已知”的付费墙网址列表,你可以在CheckResponse中使用自定义的error类型(在下面的示例中为Paywalled)中断付费墙重定向。你的错误处理代码稍后必须将该错误类型视为特殊的(非错误)情况。

示例:

package main

import (
    "errors"
    "fmt"
    "net/http"
    "net/url"
)

var Paywalled = errors.New("下一个重定向将遇到付费墙")
var badHosts = map[string]error{
    "registration.ft.com": Paywalled,
}

var client = &http.Client{
    CheckRedirect: func(req *http.Request, via []*http.Request) error {
        // 注意:在生产环境中,还要检查重定向循环
        return badHosts[req.URL.Host]
    },
}

func main() {
    resp, err := client.Get("http://on.ft.com/14pQBYE")
    // 如果err是包装在url.Error中的`Paywalled`,则忽略非nil的err
    if e, ok := err.(*url.Error); (ok && e.Err != Paywalled) || (!ok && err != nil) {
        fmt.Println("错误:", err)
        return
    }
    resp.Body.Close()
    fmt.Println(resp.Request.URL)
}
英文:

The client request will actually still return the last valid Response in cases where your custom CheckResponse yields an error (As mentioned in the comments).

http://golang.org/pkg/net/http/#Client
> If CheckRedirect returns an error, the Client's Get method returns both the previous Response and CheckRedirect's error (wrapped in a url.Error) instead of issuing the Request req.

If you maintain a list of "known" paywall-urls, you can abort the paywall-redirect in your CheckResponse with a custom error type (Paywalled in the example below).
Your error handling code later has to consider that error type as a special (non-erroneous) case.

Example:

package main

import (
    "errors"
    "fmt"
    "net/http"
    "net/url"
)

var Paywalled = errors.New("next redirect would hit a paywall")

var badHosts = map[string]error{
    "registration.ft.com": Paywalled,
}

var client = &http.Client{
    CheckRedirect: func(req *http.Request, via []*http.Request) error {
        // N.B.: when used in production, also check for redirect loops
        return badHosts[req.URL.Host]
    },
}

func main() {
    resp, err := client.Get("http://on.ft.com/14pQBYE")
    // ignore non-nil err if it's a `Paywalled` wrapped in url.Error
    if e, ok := err.(*url.Error); (ok && e.Err != Paywalled) || (!ok && err != nil) {
        fmt.Println("error: ", err)
        return
    }   
    resp.Body.Close()
    fmt.Println(resp.Request.URL)
}                                                                                                                                     

huangapple
  • 本文由 发表于 2015年1月7日 16:09:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/27814942.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定