Redis instance on docker disconnect once per few month randomly from ioredis, how to debug?

huangapple go评论69阅读模式
英文:

Redis instance on docker disconnect once per few month randomly from ioredis, how to debug?

问题

I am using the ioredis node library with Redis configured in a Docker-compose file. The Redis server starts successfully, but after a few days, I encounter a "ECONNREFUSED" error when attempting to connect. Restarting the Docker containers resolves the issue temporarily, but I'm looking for a more elegant solution and understanding the root cause of the problem.

英文:

I am using node library ioredis. And have redis on docker-compose configured like this:

  redis:
    image: "redis:latest"
    volumes:
      - redis_data:/data

It is simplest possible config, so I hope nothing is broken here.

My connection is also simplest possible

import Redis from "ioredis";

export const redis = new Redis(process.env.REDIS_URL ?? '');

when I typing docker-compose up I can see logs

redis_1  | 1:C 09 Jan 2023 06:00:49.251 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1  | 1:C 09 Jan 2023 06:00:49.252 # Redis version=7.0.10, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1  | 1:C 09 Jan 2023 06:00:49.252 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1  | 1:M 09 Jan 2023 06:00:49.254 * monotonic clock: POSIX clock_gettime
redis_1  | 1:M 09 Jan 2023 06:00:49.258 * Running mode=standalone, port=6379.
redis_1  | 1:M 09 Jan 2023 06:00:49.258 # Server initialized
redis_1  | 1:M 09 Jan 2023 06:00:49.259 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1  | 1:M 09 Jan 2023 06:00:49.260 * Loading RDB produced by version 7.0.10
redis_1  | 1:M 09 Jan 2023 06:00:49.261 * RDB age 120617 seconds
redis_1  | 1:M 09 Jan 2023 06:00:49.261 * RDB memory usage when created 274.70 Mb
redis_1  | 1:M 09 Jan 2023 06:00:51.257 * Done loading RDB, keys loaded: 1201, keys expired: 0.
redis_1  | 1:M 09 Jan 2023 06:00:51.258 * DB loaded from disk: 1.998 seconds
redis_1  | 1:M 09 Jan 2023 06:00:51.259 * Ready to accept connections

then I see many days repeating

redis_1  | 1:M 09 May 2023 15:49:24.506 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 15:49:24.517 * Background saving started by pid 207
redis_1  | 207:C 09 May 2023 15:49:29.023 * DB saved on disk
redis_1  | 207:C 09 May 2023 15:49:29.025 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
redis_1  | 1:M 09 May 2023 15:49:29.094 * Background saving terminated with success
redis_1  | 1:M 09 May 2023 16:49:30.043 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 16:49:30.061 * Background saving started by pid 208
redis_1  | 208:C 09 May 2023 16:49:31.606 * DB saved on disk
redis_1  | 208:C 09 May 2023 16:49:31.608 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1  | 1:M 09 May 2023 16:49:31.666 * Background saving terminated with success

app operate normally and suddenly in app I can see logs

app_1    | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1    |     at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1    | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1    |     at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1    | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1    |     at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1    | finished in 1875996ms
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1    |     at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1    | /opt/app/node_modules/ioredis/built/redis/event_handler.js:182
app_1    |                     self.flushQueue(new errors_1.MaxRetriesPerRequestError(maxRetriesPerRequest));
app_1    |                                     ^
app_1    | 
app_1    | MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
app_1    |     at Socket.<anonymous> (/opt/app/node_modules/ioredis/built/redis/event_handler.js:182:37)
app_1    |     at Object.onceWrapper (node:events:628:26)
app_1    |     at Socket.emit (node:events:513:28)
app_1    |     at TCP.<anonymous> (node:net:322:12)

but 2 hours later there are produced new logs from redis showing that redis works

redis_1  | 1:M 09 May 2023 18:38:33.833 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 18:38:33.842 * Background saving started by pid 209
redis_1  | 209:C 09 May 2023 18:38:35.505 * DB saved on disk
redis_1  | 209:C 09 May 2023 18:38:35.506 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1  | 1:M 09 May 2023 18:38:35.553 * Background saving terminated with success
redis_1  | 1:M 09 May 2023 19:38:36.096 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 09 May 2023 19:38:36.108 * Background saving started by pid 210
redis_1  | 210:C 09 May 2023 19:38:37.452 * DB saved on disk
redis_1  | 210:C 09 May 2023 19:38:37.454 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1  | 1:M 09 May 2023 19:38:37.512 * Background saving terminated with success
redis_1  | 1:M 10 May 2023 09:19:02.490 * 1 changes in 3600 seconds. Saving...
redis_1  | 1:M 10 May 2023 09:19:02.538 * Background saving started by pid 211
redis_1  | 211:C 10 May 2023 09:19:06.152 * DB saved on disk

My current strategy is:

  • ping server every few minutes checking if I can connect with redis, if no, then login to server and run
docker-compose down
docker-compose up

It always works perfectly, but I would like to fix this problem in more elegant way, and understand what is reason of this error.

I was able to reproduce this behaviour on few independent services that I am maintaining, but it is very hard to predict when error will occur.

答案1

得分: 0

Logs显示超时,但真正的原因是服务器上的内存有限,而这并没有被监控到。当我在事故发生后登录时,内存处于正常水平,但内存不足是这些问题的直接原因。

长期解决方案:

  • 限制Redis容器可用的内存,并选择正确的清理策略。
  • 设置内存监控以防止这种情况发生。
英文:

Logs shows timeouts but real reason was in limited memory on server, that was not monitored. When I was logged in after incident, memory was on normal level, but lack of memory was direct reason of these problems.

Redis instance on docker disconnect once per few month randomly from ioredis, how to debug?

Long term solutions:

  • limit memory available for redis container and select correct eviction policy
  • setup monitoring of memory to prevent these situation

huangapple
  • 本文由 发表于 2023年5月14日 14:22:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76246117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定