英文:
Redis instance on docker disconnect once per few month randomly from ioredis, how to debug?
问题
I am using the ioredis
node library with Redis configured in a Docker-compose file. The Redis server starts successfully, but after a few days, I encounter a "ECONNREFUSED" error when attempting to connect. Restarting the Docker containers resolves the issue temporarily, but I'm looking for a more elegant solution and understanding the root cause of the problem.
英文:
I am using node library ioredis
. And have redis on docker-compose configured like this:
redis:
image: "redis:latest"
volumes:
- redis_data:/data
It is simplest possible config, so I hope nothing is broken here.
My connection is also simplest possible
import Redis from "ioredis";
export const redis = new Redis(process.env.REDIS_URL ?? '');
when I typing docker-compose up
I can see logs
redis_1 | 1:C 09 Jan 2023 06:00:49.251 # oO0OoO0OoO0Oo Redis is starting oO0OoO0OoO0Oo
redis_1 | 1:C 09 Jan 2023 06:00:49.252 # Redis version=7.0.10, bits=64, commit=00000000, modified=0, pid=1, just started
redis_1 | 1:C 09 Jan 2023 06:00:49.252 # Warning: no config file specified, using the default config. In order to specify a config file use redis-server /path/to/redis.conf
redis_1 | 1:M 09 Jan 2023 06:00:49.254 * monotonic clock: POSIX clock_gettime
redis_1 | 1:M 09 Jan 2023 06:00:49.258 * Running mode=standalone, port=6379.
redis_1 | 1:M 09 Jan 2023 06:00:49.258 # Server initialized
redis_1 | 1:M 09 Jan 2023 06:00:49.259 # WARNING Memory overcommit must be enabled! Without it, a background save or replication may fail under low memory condition. Being disabled, it can can also cause failures without low memory condition, see https://github.com/jemalloc/jemalloc/issues/1328. To fix this issue add 'vm.overcommit_memory = 1' to /etc/sysctl.conf and then reboot or run the command 'sysctl vm.overcommit_memory=1' for this to take effect.
redis_1 | 1:M 09 Jan 2023 06:00:49.260 * Loading RDB produced by version 7.0.10
redis_1 | 1:M 09 Jan 2023 06:00:49.261 * RDB age 120617 seconds
redis_1 | 1:M 09 Jan 2023 06:00:49.261 * RDB memory usage when created 274.70 Mb
redis_1 | 1:M 09 Jan 2023 06:00:51.257 * Done loading RDB, keys loaded: 1201, keys expired: 0.
redis_1 | 1:M 09 Jan 2023 06:00:51.258 * DB loaded from disk: 1.998 seconds
redis_1 | 1:M 09 Jan 2023 06:00:51.259 * Ready to accept connections
then I see many days repeating
redis_1 | 1:M 09 May 2023 15:49:24.506 * 1 changes in 3600 seconds. Saving...
redis_1 | 1:M 09 May 2023 15:49:24.517 * Background saving started by pid 207
redis_1 | 207:C 09 May 2023 15:49:29.023 * DB saved on disk
redis_1 | 207:C 09 May 2023 15:49:29.025 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 1 MB
redis_1 | 1:M 09 May 2023 15:49:29.094 * Background saving terminated with success
redis_1 | 1:M 09 May 2023 16:49:30.043 * 1 changes in 3600 seconds. Saving...
redis_1 | 1:M 09 May 2023 16:49:30.061 * Background saving started by pid 208
redis_1 | 208:C 09 May 2023 16:49:31.606 * DB saved on disk
redis_1 | 208:C 09 May 2023 16:49:31.608 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1 | 1:M 09 May 2023 16:49:31.666 * Background saving terminated with success
app operate normally and suddenly in app I can see logs
app_1 | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1 | at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1 | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1 | at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1 | [ioredis] Unhandled error event: Error: connect ECONNREFUSED 172.18.0.11:6379
app_1 | at TCPConnectWrap.afterConnect [as oncomplete] (node:net:1494:16)
app_1 | finished in 1875996ms
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | [ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN redis
app_1 | at GetAddrInfoReqWrap.onlookup [as oncomplete] (node:dns:107:26)
app_1 | /opt/app/node_modules/ioredis/built/redis/event_handler.js:182
app_1 | self.flushQueue(new errors_1.MaxRetriesPerRequestError(maxRetriesPerRequest));
app_1 | ^
app_1 |
app_1 | MaxRetriesPerRequestError: Reached the max retries per request limit (which is 20). Refer to "maxRetriesPerRequest" option for details.
app_1 | at Socket.<anonymous> (/opt/app/node_modules/ioredis/built/redis/event_handler.js:182:37)
app_1 | at Object.onceWrapper (node:events:628:26)
app_1 | at Socket.emit (node:events:513:28)
app_1 | at TCP.<anonymous> (node:net:322:12)
but 2 hours later there are produced new logs from redis showing that redis works
redis_1 | 1:M 09 May 2023 18:38:33.833 * 1 changes in 3600 seconds. Saving...
redis_1 | 1:M 09 May 2023 18:38:33.842 * Background saving started by pid 209
redis_1 | 209:C 09 May 2023 18:38:35.505 * DB saved on disk
redis_1 | 209:C 09 May 2023 18:38:35.506 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1 | 1:M 09 May 2023 18:38:35.553 * Background saving terminated with success
redis_1 | 1:M 09 May 2023 19:38:36.096 * 1 changes in 3600 seconds. Saving...
redis_1 | 1:M 09 May 2023 19:38:36.108 * Background saving started by pid 210
redis_1 | 210:C 09 May 2023 19:38:37.452 * DB saved on disk
redis_1 | 210:C 09 May 2023 19:38:37.454 * Fork CoW for RDB: current 1 MB, peak 1 MB, average 0 MB
redis_1 | 1:M 09 May 2023 19:38:37.512 * Background saving terminated with success
redis_1 | 1:M 10 May 2023 09:19:02.490 * 1 changes in 3600 seconds. Saving...
redis_1 | 1:M 10 May 2023 09:19:02.538 * Background saving started by pid 211
redis_1 | 211:C 10 May 2023 09:19:06.152 * DB saved on disk
My current strategy is:
- ping server every few minutes checking if I can connect with redis, if no, then login to server and run
docker-compose down
docker-compose up
It always works perfectly, but I would like to fix this problem in more elegant way, and understand what is reason of this error.
I was able to reproduce this behaviour on few independent services that I am maintaining, but it is very hard to predict when error will occur.
答案1
得分: 0
Logs显示超时,但真正的原因是服务器上的内存有限,而这并没有被监控到。当我在事故发生后登录时,内存处于正常水平,但内存不足是这些问题的直接原因。
长期解决方案:
- 限制Redis容器可用的内存,并选择正确的清理策略。
- 设置内存监控以防止这种情况发生。
英文:
Logs shows timeouts but real reason was in limited memory on server, that was not monitored. When I was logged in after incident, memory was on normal level, but lack of memory was direct reason of these problems.
Long term solutions:
- limit memory available for redis container and select correct eviction policy
- setup monitoring of memory to prevent these situation
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论