ARGOCD ssh: handshake failed: read tcp 10.#.3.21:36808->20.#.#.#:22: read: connection reset by peer and failed to get git client for repo

huangapple go评论69阅读模式
英文:

ARGOCD ssh: handshake failed: read tcp 10.#.3.21:36808->20.#.#.#:22: read: connection reset by peer and failed to get git client for repo

问题

I have translated the provided text:

创建了一个 argocd 应用程序,提到了两个来源,它得到了同步良好的状态,但是每隔几秒钟开始出现以下错误:

ssh: 握手失败:read tcp 10.254.3.21:36808->20.41.6.26:22:read:连接被对等方重置
并且无法获取用于存储库的 git 客户端

有任何建议吗?

project: default
destination:
  server: 'https://kubernetes.default.svc'
  namespace: akv2k8s
syncPolicy:
  automated:
    prune: true
    selfHeal: true
sources:
  - repoURL: 'http://charts.spvapi.no'
    targetRevision: 2.3.2
    helm:
      valueFiles:
        - $values/charts/akv2k8s.yaml
    chart: akv2k8s
  - repoURL: 'git@ssh.##.azure.com:v3/####'
    targetRevision: helm_chart_test
    ref: values

我已经添加了包含 SSH 密钥的 repo-cred 密钥,如果我只使用一个存储库作为来源,它可以正常工作。

英文:

created an argocd-application, mentioned two sources, it got sync ok status, but every few seconds start getting

ssh: handshake failed: read tcp 10.254.3.21:36808->20.41.6.26:22: read: connection reset by peer 
and failed to get git client for repo 

errors.
Any Suggestions

project: default
destination:
  server: 'https://kubernetes.default.svc'
  namespace: akv2k8s
syncPolicy:
  automated:
    prune: true
    selfHeal: true
sources:
  - repoURL: 'http://charts.spvapi.no'
    targetRevision: 2.3.2
    helm:
      valueFiles:
        - $values/charts/akv2k8s.yaml
    chart: akv2k8s
  - repoURL: 'git@ssh.##.azure.com:v3/####'
    targetRevision: helm_chart_test
    ref: values

i have added repo-cred secret already with sshkey which works fine if i use just one repo as source.

答案1

得分: 0

原因是函数LsRemote中的并发。当两个请求同时访问存储库时,其中一个会失败。这种行为不是立即发生的,一些并发请求会先成功。

目前的解决办法是将maxAttemptsCount从默认值1增加到50,通过设置ARGOCD_GIT_ATTEMPTS_COUNT环境变量。

观察到重试次数会增加到12次,直到最终成功。需要检查是否可以控制这种节流。如果不能,也许可以改进这段 ArgoCD 代码,例如,随机化重试间隔可能会产生更好的结果。

英文:

Turns out the root cause is concurrency in the function LsRemote:

func (m *nativeGitClient) LsRemote(revision string) (res string, err error) {
	for attempt := 0; attempt < maxAttemptsCount; attempt++ {
		res, err = m.lsRemote(revision)
		if err == nil {
			return
		} else if apierrors.IsInternalError(err) || apierrors.IsTimeout(err) || apierrors.IsServerTimeout(err) ||
			apierrors.IsTooManyRequests(err) || utilnet.IsProbableEOF(err) || utilnet.IsConnectionReset(err) {
			// Formula: timeToWait = duration * factor^retry_number
			// Note that timeToWait should equal to duration for the first retry attempt.
			// When timeToWait is more than maxDuration retry should be performed at maxDuration.
			timeToWait := float64(retryDuration) * (math.Pow(float64(factor), float64(attempt)))
			if maxRetryDuration > 0 {
				timeToWait = math.Min(float64(maxRetryDuration), timeToWait)
			}
			time.Sleep(time.Duration(timeToWait))
		}
	}
	return
}

It seems that when 2 requests hit the repo concurrently, one of them fails. But this behavior does not start immediately, some amount of concurrent requests succeeds first.

So this looks very much like a deliberate throttling by Azure DevOps.

For now the resolution is to increase the maxAttemptsCount from the default of 1 to 50 by setting the ARGOCD_GIT_ATTEMPTS_COUNT environment variable.

I observed the retry count to rise to 12 until it finally succeeds. Need to check if this throttling can be controlled. If not, maybe this ArgoCD code could be improved. For example, randomizing the pause between retries may yield better results.

huangapple
  • 本文由 发表于 2023年4月20日 07:03:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76059417.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定