英文:
Why does multiprocessing.Queue.put() seem faster at pickling a numpy array than actual pickle?
问题
看起来我可以在2.5毫秒内调用q.put
1000次。当只是对同一个数组进行1000次pickle操作需要超过2秒的时候,这是怎么可能的?
a = np.random.rand(1024,1024)
q = Queue()
timeit.timeit(lambda: q.put(a), number=1000)
0.0025581769878044724
timeit.timeit(lambda: pickle.dumps(a), number=1000)
2.690145633998327
显然,我对Queue.put
的工作原理有一些误解。有人能给我解释一下吗?
我还观察到以下情况:
def f():
q.put(a)
q.get()
timeit.timeit(lambda: f(), number=1000)
42.33058542700019
这看起来更加真实,提示我简单地调用q.put()
会在对象实际序列化之前返回。这正确吗?
英文:
It appears that I can call q.put
1000 times in under 2.5ms. How is that possible when just pickling that very same array 1000 times takes over 2 seconds?
>>> a = np.random.rand(1024,1024)
>>> q = Queue()
>>> timeit.timeit(lambda: q.put(a), number=1000)
0.0025581769878044724
>>> timeit.timeit(lambda: pickle.dumps(a), number=1000)
2.690145633998327
Obviously, I am not understanding something about how Queue.put
works. Can anyone enlighten me?
I also observed the following:
>>> def f():
... q.put(a)
... q.get()
>>> timeit.timeit(lambda: f(), number=1000)
42.33058542700019
This appears to be more realistic and suggests to me that simply calling q.put()
will return before the object is actually serialized. Is that correct?
答案1
得分: 3
multiprocessing
实现在内部有许多不断变化的部分。在这里,处理 multiprocessing.Queue
大部分是在一个对最终用户隐藏的工作线程中完成的。.put()
只是将一个对象指针放在队列上(快速且恒定时间),当工作线程在适当的时候开始进行 pickling 操作。
然而,这可能会引发问题:如果在您的示例中,主程序在 .put()
之后继续对 np
数组进行修改,那么可能会捕获到不确定数量的这些变更,因为用户级的 .put()
仅捕获了对象指针,而不是对象的状态。
英文:
The multiprocessing
implementation has a number of moving parts under the covers. Here, dealing with a multiprocessing.Queue
is mostly done in a hidden (to the end user) worker thread. .put()
just puts an object pointer on a queue (fast and constant-time), and that worker thread does the actual pickling, when it gets around to it.
This can burn you, though: if, in your example, the main program goes on to mutate the np
array, after the .put()
, an undefined number of those mutations may be captured by the eventually pickled state. The user-level .put()
only captures the object pointer, nothing about the object's state.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论