英文:
ARGOCD ssh: handshake failed: read tcp 10.#.3.21:36808->20.#.#.#:22: read: connection reset by peer and failed to get git client for repo
问题
I have translated the provided text:
创建了一个 argocd 应用程序,提到了两个来源,它得到了同步良好的状态,但是每隔几秒钟开始出现以下错误:
ssh: 握手失败:read tcp 10.254.3.21:36808->20.41.6.26:22:read:连接被对等方重置
并且无法获取用于存储库的 git 客户端
有任何建议吗?
project: default
destination:
server: 'https://kubernetes.default.svc'
namespace: akv2k8s
syncPolicy:
automated:
prune: true
selfHeal: true
sources:
- repoURL: 'http://charts.spvapi.no'
targetRevision: 2.3.2
helm:
valueFiles:
- $values/charts/akv2k8s.yaml
chart: akv2k8s
- repoURL: 'git@ssh.##.azure.com:v3/####'
targetRevision: helm_chart_test
ref: values
我已经添加了包含 SSH 密钥的 repo-cred 密钥,如果我只使用一个存储库作为来源,它可以正常工作。
英文:
created an argocd-application, mentioned two sources, it got sync ok status, but every few seconds start getting
ssh: handshake failed: read tcp 10.254.3.21:36808->20.41.6.26:22: read: connection reset by peer
and failed to get git client for repo
errors.
Any Suggestions
project: default
destination:
server: 'https://kubernetes.default.svc'
namespace: akv2k8s
syncPolicy:
automated:
prune: true
selfHeal: true
sources:
- repoURL: 'http://charts.spvapi.no'
targetRevision: 2.3.2
helm:
valueFiles:
- $values/charts/akv2k8s.yaml
chart: akv2k8s
- repoURL: 'git@ssh.##.azure.com:v3/####'
targetRevision: helm_chart_test
ref: values
i have added repo-cred secret already with sshkey which works fine if i use just one repo as source.
答案1
得分: 0
原因是函数LsRemote中的并发。当两个请求同时访问存储库时,其中一个会失败。这种行为不是立即发生的,一些并发请求会先成功。
目前的解决办法是将maxAttemptsCount
从默认值1增加到50,通过设置ARGOCD_GIT_ATTEMPTS_COUNT
环境变量。
观察到重试次数会增加到12次,直到最终成功。需要检查是否可以控制这种节流。如果不能,也许可以改进这段 ArgoCD 代码,例如,随机化重试间隔可能会产生更好的结果。
英文:
Turns out the root cause is concurrency in the function LsRemote:
func (m *nativeGitClient) LsRemote(revision string) (res string, err error) {
for attempt := 0; attempt < maxAttemptsCount; attempt++ {
res, err = m.lsRemote(revision)
if err == nil {
return
} else if apierrors.IsInternalError(err) || apierrors.IsTimeout(err) || apierrors.IsServerTimeout(err) ||
apierrors.IsTooManyRequests(err) || utilnet.IsProbableEOF(err) || utilnet.IsConnectionReset(err) {
// Formula: timeToWait = duration * factor^retry_number
// Note that timeToWait should equal to duration for the first retry attempt.
// When timeToWait is more than maxDuration retry should be performed at maxDuration.
timeToWait := float64(retryDuration) * (math.Pow(float64(factor), float64(attempt)))
if maxRetryDuration > 0 {
timeToWait = math.Min(float64(maxRetryDuration), timeToWait)
}
time.Sleep(time.Duration(timeToWait))
}
}
return
}
It seems that when 2 requests hit the repo concurrently, one of them fails. But this behavior does not start immediately, some amount of concurrent requests succeeds first.
So this looks very much like a deliberate throttling by Azure DevOps.
For now the resolution is to increase the maxAttemptsCount
from the default of 1 to 50 by setting the ARGOCD_GIT_ATTEMPTS_COUNT
environment variable.
I observed the retry count to rise to 12 until it finally succeeds. Need to check if this throttling can be controlled. If not, maybe this ArgoCD code could be improved. For example, randomizing the pause between retries may yield better results.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论