问题

I have a time sensitive data stream coming through and the structure is basically one line of dict coming through at a time but there can be thousands or tens of thousands of lines each second.

So I have been told to use Redis for speed in order to get the data in. The processing will happen in a different thread that is not speed sensitive. However comparing the performance of redis for this simple task I'm at a loss why I should use redis instead of a simple python list of dicts.

chatgpt came up with this test of speed.


import time
import redis

redis_pool = redis.ConnectionPool(host='localhost', port=6379, db=0)
redis_conn = redis.Redis(connection_pool=redis_pool)

# Pushing dictionaries into Redis
start_time = time.time()
for i in range(100000):
    data = {'id': i, 'name': 'user {}'.format(i), 'age': i % 100}
    redis_conn.hmset('user:{}'.format(i), data)
end_time = time.time()
print('Time taken to push data into Redis:', end_time - start_time)

# Using Python list
start_time = time.time()
data_list = []
for i in range(100000):
    data = {'id': i, 'name': 'user {}'.format(i), 'age': i % 100}
    data_list.append(data)
end_time = time.time()
print('Time taken to create list:', end_time - start_time)

start_time = time.time()
for data in data_list:
    # Process data here
    pass
end_time = time.time()
print('Time taken to process list:', end_time - start_time)

when i ran it redis came to 7.5 seconds while python lists only 0.09. Is this representative of likely actual performance? Am i missing something here?

英文:

I have a time sensitive data stream coming through and the structure is basically one line of dict coming through at a time but there can be thousands or tens of thousands of lines each second.

chatgpt came up with this test of speed.


    import time
    import redis
    
    redis_pool = redis.ConnectionPool(host=&#39;localhost&#39;, port=6379, db=0)
    redis_conn = redis.Redis(connection_pool=redis_pool)
    
    # Pushing dictionaries into Redis
    start_time = time.time()
    for i in range(100000):
        data = {&#39;id&#39;: i, &#39;name&#39;: &#39;user {}&#39;.format(i), &#39;age&#39;: i % 100}
        redis_conn.hmset(&#39;user:{}&#39;.format(i), data)
    end_time = time.time()
    print(&#39;Time taken to push data into Redis:&#39;, end_time - start_time)
    
    # Using Python list
    start_time = time.time()
    data_list = []
    for i in range(100000):
        data = {&#39;id&#39;: i, &#39;name&#39;: &#39;user {}&#39;.format(i), &#39;age&#39;: i % 100}
        data_list.append(data)
    end_time = time.time()
    print(&#39;Time taken to create list:&#39;, end_time - start_time)
    
    start_time = time.time()
    for data in data_list:
        # Process data here
        pass
    end_time = time.time()
    print(&#39;Time taken to process list:&#39;, end_time - start_time)

when i ran it redis came to 7.5 seconds while python lists only 0.09. Is this representative of likely actual performance? Am i missing something here?

答案1

得分: 0

依赖于你的需求... Redis可以持久化数据，也可以让它轻松与不同的Python或其他编程语言进程共享。使用内存中的Python列表，如果进程崩溃、机器重启等情况发生，你的数据将丢失。

Redis是一个网络服务器，所以即使它在本地机器上，你也会产生一些与Redis服务器的往返时间。如果你在云中运行，最好在网络术语中将代码运行得靠近服务器以减少延迟。

你还可以修改你的代码以利用Redis的流水线技术，这是一种允许将多个命令（在你的情况下是hmset）一次性发送到Redis服务器的技术。

Redis流水线：https://redis.io/docs/manual/pipelining/

你的代码使用的redis-py客户端中的流水线实现：https://github.com/redis/redis-py#pipelines

英文:

Depends what you want here... Redis can persist the data, and also allow it to be shared easily with different Python or other language processes. With your in memory Python list, you've lost your data if the process crashes, machine reboots etc.

Redis is a network server so you're going to incur some round trip time to the Redis server, even if it is on your local machine. If you're running in the cloud, you should ideally run your code close (in network terms) to the server to minimize latency.

You could also modify your code to take advantage of Redis pipelining, which is a technique that allows multple commands (hmset in your case) to be sent to the Redis server in one network round trip.

Redis pipelining: https://redis.io/docs/manual/pipelining/

Implementation of pipelining in the redis-py client your code uses: https://github.com/redis/redis-py#pipelines

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

是一个简单的Python列表比Redis更快地推送数据吗？

问题

答案1

Prefect在Python中的部署，“no-start”选项

如何在返回到Dash标签时保留相同的内容

我需要一个同时显示三个变量的条形图。

PydanticUserError: 在Airflow数据库初始化命令中检测到了未注释的属性

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论