英文:
golang request header concurrent read and write
问题
Go env:
> GOARCH="amd64"
>
>GOBIN="/root/"
>
>GOEXE=""
>
>GOHOSTARCH="amd64"
>
>GOHOSTOS="linux"
>
>GOOS="linux"
>
>GOPATH="/data/workspace/kubernetes"
>
>GORACE=""
>
>GOROOT="/usr/local/go"
>
>GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
>
>GO15VENDOREXPERIMENT="1"
>
>CC="gcc"
>
>GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
>
>CXX="g++"
>
>CGO_ENABLED="1"
Go版本:
>go version go1.6.3 linux/amd64
这个问题发生在一个负载较高的“性能测试环境”kube-apiserver上。kube-apiserver发生了恐慌并退出:
fatal error: concurrent map read and map write
goroutine 77930636 [running]:
runtime.throw(0x2f4c4c0, 0x21)
/root/.gvm/gos/go1.6.3/src/runtime/panic.go:547 +0x90 fp=0xca67b477f0 sp=0xca67b477d8
runtime.mapaccess1_faststr(0x2a8e520, 0xc9e29000f0, 0x2c11220, 0xa, 0x433e360)
/root/.gvm/gos/go1.6.3/src/runtime/hashmap_fast.go:202 +0x5b fp=0xca67b47850 sp=0xca67b477f0
k8s.io/kubernetes/pkg/httplog.(*respLogger).Log(0xcbddf2ae70)
/data/gerrit/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/httplog/log.go:180 +0x43d fp=0xca67b47af8 sp=0xca67b47850
k8s.io/kubernetes/pkg/apiserver.RecoverPanics.func1(0x7f099f157090, 0xcbddf2ae70, 0xcd7569e380)
/data/gerrit/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:174 +0x15d fp=0xca67b47b50 sp=0xca67b47af8
net/http.HandlerFunc.ServeHTTP(0xc821a4eac0, 0x7f099f157058, 0xca0f4eb450, 0xcd7569e380)
/root/.gvm/gos/go1.6.3/src/net/http/server.go:1618 +0x3a fp=0xca67b47b70 sp=0xca67b47b50
net/http.serverHandler.ServeHTTP(0xc8215a7b80, 0x7f099f157058, 0xca0f4eb450, 0xcd7569e380)
/root/.gvm/gos/go1.6.3/src/net/http/server.go:2081 +0x19e fp=0xca67b47bd0 sp=0xca67b47b70
net/http.(*conn).serve(0xc8b5d6b980)
/root/.gvm/gos/go1.6.3/src/net/http/server.go:1472 +0xf2e fp=0xca67b47f98 sp=0xca67b47bd0
runtime.goexit()
/root/.gvm/gos/go1.6.3/src/runtime/asm_amd64.s:1998 +0x1 fp=0xca67b47fa0 sp=0xca67b47f98
created by net/http.(*Server).Serve
/root/.gvm/gos/go1.6.3/src/net/http/server.go:2137 +0x44e
对应的源代码:
pkg/apiserver/handlers.go
145 func RecoverPanics(handler http.Handler) http.Handler {
146 return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
147 defer func() {
148 if x := recover(); x != nil {
149 http.Error(w, "apis panic. Look in log for details.", http.StatusInternalServerError)
150 glog.Errorf("APIServer panic'd on %v %v: %v\n%s\n", req.Method, req.RequestURI, x, debug.Stack())
151 }
152 }()
153 defer httplog.NewLogged(req, &w).StacktraceWhen(
httplog.StatusIsNot(
http.StatusOK,
http.StatusCreated,
http.StatusAccepted,
http.StatusBadRequest,
http.StatusMovedPermanently,
http.StatusTemporaryRedirect,
http.StatusConflict,
http.StatusNotFound,
http.StatusUnauthorized,
http.StatusForbidden,
errors.StatusUnprocessableEntity,
http.StatusSwitchingProtocols,
http.StatusRequestTimeout,
errors.StatusTooManyRequests,
),
170 ).Log()
// Dispatch to the internal handler
handler.ServeHTTP(w, req)
174 })
}
pkg/httplog/log.go:
159 func (rl *respLogger) Log() {
160 latency := time.Since(rl.startTime)
161 if glog.V(2) {
162 extraInfo := ""
163 if latency >= time.Millisecond*200 && latency < time.Second {
extraInfo = fmt.Sprintf("%d00.Millisecond", latency/(time.Millisecond*100))
} else if latency >= time.Second && latency < time.Minute { // Warning
extraInfo = fmt.Sprintf("%d.Second", latency/(time.Second))
} else if latency >= time.Minute { // nce will timeout
extraInfo = fmt.Sprintf("%d.Minutes", latency/(time.Minute))
}
method := rl.req.Method
if len(rl.req.Header["Detailed-Method"]) > 0 {
method = rl.req.Header["Detailed-Method"][0]
}
remoteIP := rl.getXForwardIPAdress(rl.req)
if !rl.hijacked {
//glog.InfoDepth(1, fmt.Sprintf("%s %s: (%v) %v%v%v [%s %s]", rl.req.Method, rl.req.RequestURI, latency, rl.status, rl.statusStack, rl.addedInfo, rl.req.Header["User-Agent"], rl.req.RemoteAddr))
180 glog.InfoDepth(1, fmt.Sprintf("%v %s %s: (%sms) %v%v [%s %s]-%s %v", rl.req.Header["X-Requestid"], method, rl.req.RequestURI, GetMilliLatency(latency), rl.status, rl.addedInfo, rl.req.Header["User-Agent"], remoteIP, extraInfo, rl.statusStack))
} else {
//glog.InfoDepth(1, fmt.Sprintf("%s %s: (%v) hijacked [%s %s]", rl.req.Method, rl.req.RequestURI, latency, rl.req.Header["User-Agent"], rl.req.RemoteAddr))
glog.InfoDepth(1, fmt.Sprintf("%v %s %s: (%sms) hijacked [%s %s]-%s", rl.req.Header["X-Requestid"], method, rl.req.RequestURI, GetMilliLatency(latency), rl.req.Header["User-Agent"], remoteIP, extraInfo))
}
}
}
在handler.ServeHTTP中,我改变了request.Header。我找不到原因:
(1) "serveHTTP"和"defer log"以串行方式工作。因此不存在"并发读写"。
(2) 即使存在"并发读写",也存在"recover"函数来处理恐慌,所以kube-apiserver不应该退出。
这个问题花费了我很多时间,谁能帮助我?谢谢。
英文:
Go env:
> GOARCH="amd64"
>
>GOBIN="/root/"
>
>GOEXE=""
>
>GOHOSTARCH="amd64"
>
>GOHOSTOS="linux"
>
>GOOS="linux"
>
>GOPATH="/data/workspace/kubernetes"
>
>GORACE=""
>
>GOROOT="/usr/local/go"
>
>GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
>
>GO15VENDOREXPERIMENT="1"
>
>CC="gcc"
>
>GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0"
>
>CXX="g++"
>
>CGO_ENABLED="1"
Go version:
>go version go1.6.3 linux/amd64
This issues is happend on a “performance test env” kube-apiserver with high load. kube-apiserver panic and exit:
fatal error: concurrent map read and map write
goroutine 77930636 [running]:
runtime.throw(0x2f4c4c0, 0x21)
/root/.gvm/gos/go1.6.3/src/runtime/panic.go:547 +0x90 fp=0xca67b477f0 sp=0xca67b477d8
runtime.mapaccess1_faststr(0x2a8e520, 0xc9e29000f0, 0x2c11220, 0xa, 0x433e360)
/root/.gvm/gos/go1.6.3/src/runtime/hashmap_fast.go:202 +0x5b fp=0xca67b47850 sp=0xca67b477f0
k8s.io/kubernetes/pkg/httplog.(*respLogger).Log(0xcbddf2ae70)
/data/gerrit/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/httplog/log.go:180 +0x43d fp=0xca67b47af8 sp=0xca67b47850
k8s.io/kubernetes/pkg/apiserver.RecoverPanics.func1(0x7f099f157090, 0xcbddf2ae70, 0xcd7569e380)
/data/gerrit/src/k8s.io/kubernetes/_output/local/go/src/k8s.io/kubernetes/pkg/apiserver/handlers.go:174 +0x15d fp=0xca67b47b50 sp=0xca67b47af8
net/http.HandlerFunc.ServeHTTP(0xc821a4eac0, 0x7f099f157058, 0xca0f4eb450, 0xcd7569e380)
/root/.gvm/gos/go1.6.3/src/net/http/server.go:1618 +0x3a fp=0xca67b47b70 sp=0xca67b47b50
net/http.serverHandler.ServeHTTP(0xc8215a7b80, 0x7f099f157058, 0xca0f4eb450, 0xcd7569e380)
/root/.gvm/gos/go1.6.3/src/net/http/server.go:2081 +0x19e fp=0xca67b47bd0 sp=0xca67b47b70
net/http.(*conn).serve(0xc8b5d6b980)
/root/.gvm/gos/go1.6.3/src/net/http/server.go:1472 +0xf2e fp=0xca67b47f98 sp=0xca67b47bd0
runtime.goexit()
/root/.gvm/gos/go1.6.3/src/runtime/asm_amd64.s:1998 +0x1 fp=0xca67b47fa0 sp=0xca67b47f98
created by net/http.(*Server).Serve
/root/.gvm/gos/go1.6.3/src/net/http/server.go:2137 +0x44e
corresponding source code:
pkg/apiserver/handlers.go
145 func RecoverPanics(handler http.Handler) http.Handler {
146 return http.HandlerFunc(func(w http.ResponseWriter, req *http.Request) {
147 defer func() {
148 if x := recover(); x != nil {
149 http.Error(w, "apis panic. Look in log for details.", http.StatusInternalServerError)
150 glog.Errorf("APIServer panic'd on %v %v: %v\n%s\n", req.Method, req.RequestURI, x, debug.Stack())
151 }
152 }()
153 defer httplog.NewLogged(req, &w).StacktraceWhen(
httplog.StatusIsNot(
http.StatusOK,
http.StatusCreated,
http.StatusAccepted,
http.StatusBadRequest,
http.StatusMovedPermanently,
http.StatusTemporaryRedirect,
http.StatusConflict,
http.StatusNotFound,
http.StatusUnauthorized,
http.StatusForbidden,
errors.StatusUnprocessableEntity,
http.StatusSwitchingProtocols,
http.StatusRequestTimeout,
errors.StatusTooManyRequests,
),
170 ).Log()
// Dispatch to the internal handler
handler.ServeHTTP(w, req)
174 })
}
pkg/httplog/log.go:
159 func (rl *respLogger) Log() {
160 latency := time.Since(rl.startTime)
161 if glog.V(2) {
162 extraInfo := ""
163 if latency >= time.Millisecond*200 && latency < time.Second {
extraInfo = fmt.Sprintf("%d00.Millisecond", latency/(time.Millisecond*100))
} else if latency >= time.Second && latency < time.Minute { // Warning
extraInfo = fmt.Sprintf("%d.Second", latency/(time.Second))
} else if latency >= time.Minute { // nce will timeout
extraInfo = fmt.Sprintf("%d.Minutes", latency/(time.Minute))
}
method := rl.req.Method
if len(rl.req.Header["Detailed-Method"]) > 0 {
method = rl.req.Header["Detailed-Method"][0]
}
remoteIP := rl.getXForwardIPAdress(rl.req)
if !rl.hijacked {
//glog.InfoDepth(1, fmt.Sprintf("%s %s: (%v) %v%v%v [%s %s]", rl.req.Method, rl.req.RequestURI, latency, rl.status, rl.statusStack, rl.addedInfo, rl.req.Header["User-Agent"], rl.req.RemoteAddr))
180 glog.InfoDepth(1, fmt.Sprintf("%v %s %s: (%sms) %v%v [%s %s]-%s %v", rl.req.Header["X-Requestid"], method, rl.req.RequestURI, GetMilliLatency(latency), rl.status, rl.addedInfo, rl.req.Header["User-Agent"], remoteIP, extraInfo, rl.statusStack))
} else {
//glog.InfoDepth(1, fmt.Sprintf("%s %s: (%v) hijacked [%s %s]", rl.req.Method, rl.req.RequestURI, latency, rl.req.Header["User-Agent"], rl.req.RemoteAddr))
glog.InfoDepth(1, fmt.Sprintf("%v %s %s: (%sms) hijacked [%s %s]-%s", rl.req.Header["X-Requestid"], method, rl.req.RequestURI, GetMilliLatency(latency), rl.req.Header["User-Agent"], remoteIP, extraInfo))
}
}
}
In handler.ServeHTTP, I changed request.Header. I cann't find the reason:
(1) "serveHTTP" and "defer log" works in a serial manner. So not exist "concurrent read and write"
(2) Even if exist "concurrent read and write", there exist "recover" func to deal with panics, so the kube-apiserver shoud not exit.
The question has cost me lots of time, who can help me? Thanks
答案1
得分: 1
我的猜测是你已经意识到全局超时处理程序实际上无法停止正在处理请求的goroutine。因此,当连接超时(默认为1分钟,除非你已经更改),它会返回一个错误。如果goroutine实际上在同一时间完成,它也可能尝试写入连接。我以为我们已经添加了一个锁来防止这个问题,但也许头部并不总是受到锁的保护。如果你能在一个干净的源代码树中重现这个问题,请在Kubernetes的GitHub存储库中提交一个问题。
英文:
My guess is that you've stepped on the fact that the global timeout handler can't actually stop the goroutines working on the request. So when the connection times out (1m unless you've changed it), it returns an error. If the goroutine actually completes at the same time, it could also attempt to write to the connection. I thought we had added a lock to prevent this from causing a problem, but perhaps the headers are not always protected by the lock. If you can reproduce this in a clean source tree, then please file an issue at the Kubernetes github repo.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论