是一个简单的Python列表比Redis更快地推送数据吗?

huangapple go评论65阅读模式
英文:

is a simple python list faster than redis for pushing data in?

问题

I have a time sensitive data stream coming through and the structure is basically one line of dict coming through at a time but there can be thousands or tens of thousands of lines each second.

So I have been told to use Redis for speed in order to get the data in. The processing will happen in a different thread that is not speed sensitive. However comparing the performance of redis for this simple task I'm at a loss why I should use redis instead of a simple python list of dicts.

chatgpt came up with this test of speed.


import time
import redis

redis_pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
redis_conn = redis.Redis(connection_pool=redis_pool)

# Pushing dictionaries into Redis
start_time = time.time()
for i in range(100000):
    data = {'id': i, 'name': 'user {}'.format(i), 'age': i % 100}
    redis_conn.hmset('user:{}'.format(i), data)
end_time = time.time()
print('Time taken to push data into Redis:', end_time - start_time)

# Using Python list
start_time = time.time()
data_list = []
for i in range(100000):
    data = {'id': i, 'name': 'user {}'.format(i), 'age': i % 100}
    data_list.append(data)
end_time = time.time()
print('Time taken to create list:', end_time - start_time)

start_time = time.time()
for data in data_list:
    # Process data here
    pass
end_time = time.time()
print('Time taken to process list:', end_time - start_time)

when i ran it redis came to 7.5 seconds while python lists only 0.09. Is this representative of likely actual performance? Am i missing something here?

英文:

I have a time sensitive data stream coming through and the structure is basically one line of dict coming through at a time but there can be thousands or tens of thousands of lines each second.

So I have been told to use Redis for speed in order to get the data in. The processing will happen in a different thread that is not speed sensitive. However comparing the performance of redis for this simple task I'm at a loss why I should use redis instead of a simple python list of dicts.

chatgpt came up with this test of speed.


    import time
    import redis
    
    redis_pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
    redis_conn = redis.Redis(connection_pool=redis_pool)
    
    # Pushing dictionaries into Redis
    start_time = time.time()
    for i in range(100000):
        data = {'id': i, 'name': 'user {}'.format(i), 'age': i % 100}
        redis_conn.hmset('user:{}'.format(i), data)
    end_time = time.time()
    print('Time taken to push data into Redis:', end_time - start_time)
    
    # Using Python list
    start_time = time.time()
    data_list = []
    for i in range(100000):
        data = {'id': i, 'name': 'user {}'.format(i), 'age': i % 100}
        data_list.append(data)
    end_time = time.time()
    print('Time taken to create list:', end_time - start_time)
    
    start_time = time.time()
    for data in data_list:
        # Process data here
        pass
    end_time = time.time()
    print('Time taken to process list:', end_time - start_time)

when i ran it redis came to 7.5 seconds while python lists only 0.09. Is this representative of likely actual performance? Am i missing something here?

答案1

得分: 0

依赖于你的需求... Redis可以持久化数据,也可以让它轻松与不同的Python或其他编程语言进程共享。使用内存中的Python列表,如果进程崩溃、机器重启等情况发生,你的数据将丢失。

Redis是一个网络服务器,所以即使它在本地机器上,你也会产生一些与Redis服务器的往返时间。如果你在云中运行,最好在网络术语中将代码运行得靠近服务器以减少延迟。

你还可以修改你的代码以利用Redis的流水线技术,这是一种允许将多个命令(在你的情况下是hmset)一次性发送到Redis服务器的技术。

Redis流水线:https://redis.io/docs/manual/pipelining/

你的代码使用的redis-py客户端中的流水线实现:https://github.com/redis/redis-py#pipelines

英文:

Depends what you want here... Redis can persist the data, and also allow it to be shared easily with different Python or other language processes. With your in memory Python list, you've lost your data if the process crashes, machine reboots etc.

Redis is a network server so you're going to incur some round trip time to the Redis server, even if it is on your local machine. If you're running in the cloud, you should ideally run your code close (in network terms) to the server to minimize latency.

You could also modify your code to take advantage of Redis pipelining, which is a technique that allows multple commands (hmset in your case) to be sent to the Redis server in one network round trip.

Redis pipelining: https://redis.io/docs/manual/pipelining/

Implementation of pipelining in the redis-py client your code uses: https://github.com/redis/redis-py#pipelines

huangapple
  • 本文由 发表于 2023年3月9日 18:46:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/75683541.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定