英文:
"Connection refused" error in a flaky unit test which involves starting an HTTP server and connecting to it
问题
我正在尝试调试一个类似于以下代码的不稳定单元测试:
package main
import (
"net"
"net/http"
"testing"
"time"
"github.com/stretchr/testify/require"
)
func TestFlaky(t *testing.T) {
// 模拟一个API
http.HandleFunc("/foo/bar", func(w http.ResponseWriter, r *http.Request) {
_, err := w.Write([]byte("foobar"))
require.NoError(t, err)
})
go func() {
require.NoError(t, http.ListenAndServe("localhost:7777", nil))
}()
// 等待(最多1秒)模拟的API可用
conn, err := net.DialTimeout("tcp", "localhost:7777", time.Second)
require.NoError(t, err)
require.NoError(t, conn.Close())
}
然而,在DialTimeout
错误之后的require.NoError()
行处,我在CI环境中收到以下错误:
--- FAIL: TestFlaky (0.00s)
main_test.go:24:
Error Trace: main_test.go:24
Error: Received unexpected error:
dial tcp [::1]:7777: connect: connection refused
Test: TestFlaky
由于测试立即失败,我猜测这不是调整超时的问题。我应该如何使这个测试不再不稳定?我正在考虑将最后三行替换为类似以下代码的require.Eventually
:
var conn net.Conn
require.Eventually(t, func() bool {
var err error
conn, err = net.DialTimeout("tcp", "localhost:7777", time.Second)
if err != nil {
t.Logf("DialTimeout error: %v. Retrying...", err)
return false
}
return true
}, time.Second, 100*time.Millisecond)
require.NoError(t, conn.Close())
这样做是否足以消除测试的不稳定性?
英文:
I'm trying to debug a flaky unit test which is similar to the following:
package main
import (
"net"
"net/http"
"testing"
"time"
"github.com/stretchr/testify/require"
)
func TestFlaky(t *testing.T) {
// Mock an API
http.HandleFunc("/foo/bar", func(w http.ResponseWriter, r *http.Request) {
_, err := w.Write([]byte("foobar"))
require.NoError(t, err)
})
go func() {
require.NoError(t, http.ListenAndServe("localhost:7777", nil))
}()
// Wait (up to 1 second) for the mocked API to be available
conn, err := net.DialTimeout("tcp", "localhost:7777", time.Second)
require.NoError(t, err)
require.NoError(t, conn.Close())
}
However, from the require.NoError()
line right after the DialTimeout
error, I'm getting the following error (in a CI environment only):
--- FAIL: TestFlaky (0.00s)
main_test.go:24:
Error Trace: main_test.go:24
Error: Received unexpected error:
dial tcp [::1]:7777: connect: connection refused
Test: TestFlaky
Since the test fails immediately, I'm guessing this is not a matter of adjusting the timeout. How should I make this test non-flaky? I'm considering replacing the last three lines with a require.Eventually
similar to the following:
var conn net.Conn
require.Eventually(t, func() bool {
var err error
conn, err = net.DialTimeout("tcp", "localhost:7777", time.Second)
if err != nil {
t.Logf("DialTimeout error: %v. Retrying...", err)
return false
}
return true
}, time.Second, 100*time.Millisecond)
require.NoError(t, conn.Close())
Would this suffice to remove test flakiness?
答案1
得分: 3
goroutine内部的代码不能保证在Dial之前执行。(如果在goroutine之后加上sleep,它应该可以工作,但这是一个不太好的解决方案)。
此外,请注意,“带超时的拨号”在建立连接时等待TCP数据包,但被拒绝的连接实际上是RST数据包。
提示:看看httptest包是如何工作的。
编辑:这是httptest的工作原理:https://cs.opensource.google/go/go/+/refs/tags/go1.16.6:src/net/http/httptest/server.go;l=304
func (s *Server) goServe() {
s.wg.Add(1)
go func() {
defer s.wg.Done()
s.Config.Serve(s.Listener)
}()
}
英文:
The code inside the goroutine does not guarantee to be executed before the Dial. (If you put a sleep just after the goroutine it should work but it is an ugly solution).
Also, notice that "dial with timeout" is waiting for tcp packets when connection is being stablished, but the refused connection is actually the RST packet.
Hint: see how httptest package works.
Edit: here is how httptest works: https://cs.opensource.google/go/go/+/refs/tags/go1.16.6:src/net/http/httptest/server.go;l=304
func (s *Server) goServe() {
s.wg.Add(1)
go func() {
defer s.wg.Done()
s.Config.Serve(s.Listener)
}()
}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论