英文:
Redigo: Fail fast when redis server is down
问题
我在处理当我连接的Redis服务器宕机时如何使Go快速失败的问题上遇到了困难,我想要一个健壮的解决方案。
我正在使用redigo,并且正在设置一个连接池,代码如下:
// 这里的代码中还有其他内容,我们将其用作在内存中存储的中央仓库
type State struct {
redisPool *redis.Pool
}
func (state *State) GetRedisConn() redis.Conn {
return state.redisPool.Get()
}
func main() {
state.redisPool = &redis.Pool{
MaxIdle: 200,
MaxActive: 9000,
IdleTimeout: time.Minute,
Dial: func() (redis.Conn, error) {
return redis.Dial("tcp", *redisAddress,
redis.DialConnectTimeout(1*time.Second),
redis.DialReadTimeout(100*time.Millisecond),
redis.DialWriteTimeout(100*time.Millisecond),
)
},
}
}
然后像这样请求新的连接并使用它们:
t0 := time.Now()
conn := state.GetRedisConn()
if conn != nil && conn.Err() == nil {
defer conn.Close()
// 做一些操作
} else {
log.Printf("没有Redis服务器")
}
log.Println(time.Now().Sub(t0).Seconds())
当Redis服务器正常运行时,这个方法效果很好,操作在毫秒级别完成。但是,一旦我关闭Redis服务器,第75个百分位数的响应时间就会增加到7秒以上,第99个百分位数的响应时间增加到10秒(我可以在Prometheus上看到这一点)。
我做错了什么?为什么超时时间没有更快地触发?
我原本以为redis.DialConnectTimeout(1*time.Second)
会在1秒钟内解决问题,但事实并非如此。
编辑:事实证明,这是因为我在Prometheus中犯了一个错误,将桶设置得太大了,所以虽然Redis在1秒后超时,但我的桶设置为1秒和10秒,所以我的请求(稍微超过1秒)最终进入了10秒的桶中,导致结果不准确。我相信这个讨论对某些人会有用的。
英文:
I'm struggling with getting go to fail fast when the redis server I'm connected to goes down, want to have a robust solution.
I'm using redigo and I'm setting up a connection pool like so:
// This has other stuff in it in the code, use it as a
// central repository for things we want in memory
type State struct{
redisPool *redis.Pool
}
func (state *State) GetRedisConn() redis.Conn {
return state.redisPool.Get()
}
func main() {
state.redisPool = &redis.Pool{
MaxIdle: 200,
MaxActive: 9000,
IdleTimeout: time.Minute,
Dial: func() (redis.Conn, error) {
return redis.Dial("tcp", *redisAddress,
redis.DialConnectTimeout(1*time.Second),
redis.DialReadTimeout(100*time.Millisecond),
redis.DialWriteTimeout(100*time.Millisecond),
)
},
}
}
And requesting new connections and using them like so:
t0 := time.Now()
conn := state.GetRedisConn()
if conn != nil && conn.Err() == nil {
defer conn.Close()
// Do stuff
else {
log.Printf("no redis probably")
}
log.Println(time.Now().Sub(t0).Seconds())
While redis is up, this works great, things happen in milliseconds. The moment I take redis down my 75th percentile goes up to 7+ seconds, with my 99th percentile going up to 10s (I can see this on prometheus)
What am I doing wrong? Why does this not timeout faster?
I was under the impression that redis.DialConnectTimeout(1*time.Second)
would cap the issue at 1 second, but it doesn't seem to be the case.
EDIT: It turns out this was due to a mistake I was making in Prometheus, setting the buckets too big, so while redis was timing out fine after a second, my buckets had been set up with a 1s bucket and a 10s bucket, so my requests (which were just over 1s) ended up in the 10s bucket, skewing the results. I'm sure this discussion will be useful to someone at some point though.
答案1
得分: 2
在失败后限制拨号尝试次数:
func main() {
var (
nextDial time.Time
mu sync.Mutex
)
state.redisPool = &redis.Pool{
MaxIdle: 200,
MaxActive: 9000,
IdleTimeout: time.Minute,
Dial: func() (redis.Conn, error) {
mu.Lock() // Dial可以并发调用
defer mu.Unlock()
if time.Now().Before(nextDial) {
return nil, errors.New("等待拨号")
}
c, err := redis.Dial("tcp", *redisAddress,
redis.DialConnectTimeout(1*time.Second),
redis.DialReadTimeout(100*time.Millisecond),
redis.DialWriteTimeout(100*time.Millisecond),
)
if err == nil {
nextDial = time.Time{}
} else {
nextDial = time.Now().Add(time.Second) // 一秒后不再尝试拨号
}
return c, err
},
}
}
希望这个翻译对你有帮助!
英文:
Rate limit dial attempts after a failure:
func main() {
var (
nextDial time.Time
mu sync.Mutex
)
state.redisPool = &redis.Pool{
MaxIdle: 200,
MaxActive: 9000,
IdleTimeout: time.Minute,
Dial: func() (redis.Conn, error) {
mu.Lock() // Dial can be called concurrently
defer mu.Unlock()
if time.Now().Before(nextDial) {
return nil, errors.New("waiting for dial")
}
c, err := redis.Dial("tcp", *redisAddress,
redis.DialConnectTimeout(1*time.Second),
redis.DialReadTimeout(100*time.Millisecond),
redis.DialWriteTimeout(100*time.Millisecond),
)
if err == nil {
nextDial = time.Time{}
} else {
nextDial = time.Now().Add(time.Second) // don't attempt dial for one second
}
return c, err
},
}
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论