英文:
Inadequate RAM usage by Redis
问题
我正在使用Go和Redis开发一个API。问题是RAM的使用不足,我找不到问题的根源。
TL;DR版本
有数百/数千个哈希对象。每个1 KB的对象(键+值)占用约0.5 MB的RAM。然而,没有内存碎片(INFO
显示没有)。
此外,dump.rdb文件的大小比RAM集合小70倍(50个对象的360KB dump.rdb vs 25MB RAM,5000个对象的35.5MB vs 2.47GB RAM)。
详细版本
Redis实例主要填充了以下类型的task:123
哈希对象:
"task_id" : int
"client_id" : int
"worker_id" : int
"text" : string (0..255 chars)
"is_processed" : boolean
"timestamp" : int
"image" : byte array (1 kbyte)
此外,还有一些整数计数器、一个列表和一个有序集合(都由task_id组成)。
RAM使用量与任务对象数量呈线性关系。
50个任务的INFO输出:
# Memory
used_memory:27405872
used_memory_human:26.14M
used_memory_rss:45215744
used_memory_peak:31541400
used_memory_peak_human:30.08M
used_memory_lua:35840
mem_fragmentation_ratio:1.65
mem_allocator:jemalloc-3.6.0
和5000个任务的INFO输出:
# Memory
used_memory:2647515776
used_memory_human:2.47G
used_memory_rss:3379187712
used_memory_peak:2651672840
used_memory_peak_human:2.47G
used_memory_lua:35840
mem_fragmentation_ratio:1.28
mem_allocator:jemalloc-3.6.0
50个任务的dump.rdb
文件大小为360kB,5000个任务的大小为35553kB。
每个任务对象的序列化长度约为7KB:
127.0.0.1:6379> DEBUG OBJECT task:2000
Value at:0x7fcb403f5880 refcount:1 encoding:hashtable serializedlength:7096 lru:6497592 lru_seconds_idle:180
我编写了一个Python脚本来尝试复现这个问题:
import redis
import time
import os
from random import randint
img_size = 1024 * 1 # 1 kb
r = redis.StrictRedis(host='localhost', port=6379, db=0)
for i in range(0, 5000):
values = {
"task_id" : randint(0, 65536),
"client_id" : randint(0, 65536),
"worker_id" : randint(0, 65536),
"text" : "",
"is_processed" : False,
"timestamp" : int(time.time()),
"image" : bytearray(os.urandom(img_size)),
}
key = "task:" + str(i)
r.hmset(key, values)
if i % 500 == 0: print(i)
它只消耗80MB的RAM!
我会感激任何关于如何找出问题所在的想法。
英文:
I'm developing an API using Go and Redis. The problem is that RAM usage is inadequate and I can't find the root of the problem.
TL;DR version
There are hundreds/thousands of hash objects. Each one of 1 KB objects (key+value) takes ~0.5 MB of RAM. However, there is no memory fragmentation (INFO
shows none).
Also, dump.rdb is 70x times smaller than the RAM set (360KB dump.rdb vs 25MB RAM for 50 objects, and 35.5MB vs 2.47GB for 5000 objects).
Long version
Redis instance is filled mostly with task:123
hashes of the following kind:
"task_id" : int
"client_id" : int
"worker_id" : int
"text" : string (0..255 chars)
"is_processed" : boolean
"timestamp" : int
"image" : byte array (1 kbyte)
Also, there are a couple of integer counters, one list and one sorted set (both consist of task_id's).
RAM usage has a linear dependency on the number of task objects.
INFO output for 50 tasks:
# Memory
used_memory:27405872
used_memory_human:26.14M
used_memory_rss:45215744
used_memory_peak:31541400
used_memory_peak_human:30.08M
used_memory_lua:35840
mem_fragmentation_ratio:1.65
mem_allocator:jemalloc-3.6.0
and 5000 tasks:
# Memory
used_memory:2647515776
used_memory_human:2.47G
used_memory_rss:3379187712
used_memory_peak:2651672840
used_memory_peak_human:2.47G
used_memory_lua:35840
mem_fragmentation_ratio:1.28
mem_allocator:jemalloc-3.6.0
Size of dump.rdb
for 50 tasks is 360kB and for 5000 tasks it's 35553kB.
Every task object has serializedlength of ~7KB:
127.0.0.1:6379> DEBUG OBJECT task:2000
Value at:0x7fcb403f5880 refcount:1 encoding:hashtable serializedlength:7096 lru:6497592 lru_seconds_idle:180
I've written a Python script trying to reproduce the problem:
import redis
import time
import os
from random import randint
img_size = 1024 * 1 # 1 kb
r = redis.StrictRedis(host='localhost', port=6379, db=0)
for i in range(0, 5000):
values = {
"task_id" : randint(0, 65536),
"client_id" : randint(0, 65536),
"worker_id" : randint(0, 65536),
"text" : "",
"is_processed" : False,
"timestamp" : int(time.time()),
"image" : bytearray(os.urandom(img_size)),
}
key = "task:" + str(i)
r.hmset(key, values)
if i % 500 == 0: print(i)
And it consumes just 80MB of RAM!
I would appreciate any ideas on how to figure out what's going on.
答案1
得分: 7
你有很多很多小的HASH对象,这是可以的。但是每个对象在Redis内存中都有很多开销,因为它们有一个单独的字典。有一个小的优化方法通常可以显著改善这个问题,就是将HASH存储在一个内存优化但稍慢一些的数据结构中,对于这些对象的大小来说,速度应该不会有太大影响。从配置中可以看到:
# 当HASH对象的条目数较少且最大条目不超过给定阈值时,使用一种内存高效的数据结构对HASH进行编码。
# 这些阈值可以通过以下指令进行配置。
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
现在,你有很大的值,导致这个优化方法无法起作用。
我建议将hash-max-ziplist-value
设置为几KB(取决于最大对象的大小),这样应该会改善情况(在这个HASH大小下,你不应该看到任何性能下降)。
此外,请记住,Redis相对于内存对其RDB文件进行了压缩,所以预计内存使用量会减少约50%。
[编辑] 重新阅读你的问题后,发现这是一个仅限于Go语言的问题,并且考虑到压缩后的RDB文件很小,我有一种感觉你写入的大小比你预期的图像要大。有没有可能你是从一个[]byte
切片中写入的?如果是这样,也许你没有对其进行修剪,或者你正在写入一个更大的缓冲区或类似的东西?我以前经常使用redigo这样的库,从未遇到你所描述的情况。
英文:
You have lots and lots of small HASH objects, and that's fine. But each of them has a lot of overhead in the redis memory, since it has a separate dictionary. There is a small optimization for this that usually improves things significantly, and it's to keep hashes in a memory optimized but slightly slower data structure, which at these object sizes should not matter much. From the config:
# Hashes are encoded using a memory efficient data structure when they have a
# small number of entries, and the biggest entry does not exceed a given
# threshold. These thresholds can be configured using the following directives.
hash-max-ziplist-entries 512
hash-max-ziplist-value 64
Now, you have large values which causes this optimization not to work.
I'd set hash-max-ziplist-value
to a few kbs (depending on the size of your largest object), and it should improve this (you should not see any performance degradation in this HASH size).
Also, keep in mind that redis compresses its RDB files relative to what you have in memory, so a ~50% reduction over memory is to be expected anyway.
[EDIT] After re-reading your question and seeing it's a go only problem, and considering the fact that the compressed rdb is small, something tells me you're writing a bigger size than you'd expect for the image. Any chance you're writing that off a []byte
slice? If so, perhaps you did not trim it and you're writing a much bigger buffer or something similar? I've worked like this with redigo tons of times and never seen what you're describing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论