Why does multiprocessing.Queue.put() seem faster at pickling a numpy array than actual pickle?

huangapple go评论52阅读模式
英文:

Why does multiprocessing.Queue.put() seem faster at pickling a numpy array than actual pickle?

问题

看起来我可以在2.5毫秒内调用q.put 1000次。当只是对同一个数组进行1000次pickle操作需要超过2秒的时候,这是怎么可能的?

a = np.random.rand(1024,1024)
q = Queue()
timeit.timeit(lambda: q.put(a), number=1000)
0.0025581769878044724
timeit.timeit(lambda: pickle.dumps(a), number=1000)
2.690145633998327

显然,我对Queue.put的工作原理有一些误解。有人能给我解释一下吗?

我还观察到以下情况:

def f():
  q.put(a)
  q.get()
timeit.timeit(lambda: f(), number=1000)
42.33058542700019

这看起来更加真实,提示我简单地调用q.put()会在对象实际序列化之前返回。这正确吗?

英文:

It appears that I can call q.put 1000 times in under 2.5ms. How is that possible when just pickling that very same array 1000 times takes over 2 seconds?

>>> a = np.random.rand(1024,1024)
>>> q = Queue()
>>> timeit.timeit(lambda: q.put(a), number=1000)
0.0025581769878044724
>>> timeit.timeit(lambda: pickle.dumps(a), number=1000)
2.690145633998327

Obviously, I am not understanding something about how Queue.put works. Can anyone enlighten me?

I also observed the following:

>>> def f():
...   q.put(a)
...   q.get()
>>> timeit.timeit(lambda: f(), number=1000)
42.33058542700019

This appears to be more realistic and suggests to me that simply calling q.put() will return before the object is actually serialized. Is that correct?

答案1

得分: 3

multiprocessing 实现在内部有许多不断变化的部分。在这里,处理 multiprocessing.Queue 大部分是在一个对最终用户隐藏的工作线程中完成的。.put() 只是将一个对象指针放在队列上(快速且恒定时间),当工作线程在适当的时候开始进行 pickling 操作。

然而,这可能会引发问题:如果在您的示例中,主程序在 .put() 之后继续对 np 数组进行修改,那么可能会捕获到不确定数量的这些变更,因为用户级的 .put() 仅捕获了对象指针,而不是对象的状态。

英文:

The multiprocessing implementation has a number of moving parts under the covers. Here, dealing with a multiprocessing.Queue is mostly done in a hidden (to the end user) worker thread. .put() just puts an object pointer on a queue (fast and constant-time), and that worker thread does the actual pickling, when it gets around to it.

This can burn you, though: if, in your example, the main program goes on to mutate the np array, after the .put(), an undefined number of those mutations may be captured by the eventually pickled state. The user-level .put() only captures the object pointer, nothing about the object's state.

huangapple
  • 本文由 发表于 2023年2月19日 12:08:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/75497932.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定