加速 np.sum 使用多进程

huangapple go评论89阅读模式
英文:

Speeding Up np.sum With multiprocessing

问题

如果我有一个大小在10^8到10^9之间的NumPy数组,是否有可能比np.sum更快地计算其总和?

我尝试过使用multiprocessingfork,但无论工作进程的数量(1-4)如何,似乎都比只调用np.sum慢。我在一台配备2 GHz双核英特尔Core i5处理器的Mac上使用Python 3.8。不确定如果我有更多的CPU核心是否会有不同的结果。

我的代码:

  1. import concurrent.futures
  2. import multiprocessing as mp
  3. import time
  4. from concurrent.futures.process import ProcessPoolExecutor
  5. import numpy as np
  6. # 基于:https://luis-sena.medium.com/sharing-big-numpy-arrays-across-python-processes-abf0dc2a0ab2
  7. def np_sum_global(start, stop):
  8. return np.sum(data[start:stop])
  9. def benchmark():
  10. st = time.time()
  11. ARRAY_SIZE = int(3e8)
  12. print("数组大小 =", ARRAY_SIZE)
  13. global data
  14. data = np.random.random(ARRAY_SIZE)
  15. print("生成数组时间 =", time.time() - st)
  16. print("CPU核心数 =", mp.cpu_count())
  17. for trial in range(5):
  18. print("试验 =", trial)
  19. st = time.time()
  20. s = np.sum(data)
  21. print("方法1", time.time() - st, s)
  22. for NUM_WORKERS in range(1, 5):
  23. st = time.time()
  24. futures = []
  25. with ProcessPoolExecutor(max_workers=NUM_WORKERS) as executor:
  26. for i in range(0, NUM_WORKERS):
  27. futures.append(
  28. executor.submit(
  29. np_sum_global,
  30. ARRAY_SIZE * i // NUM_WORKERS,
  31. ARRAY_SIZE * (i + 1) // NUM_WORKERS,
  32. )
  33. )
  34. futures, _ = concurrent.futures.wait(futures)
  35. s = sum(future.result() for future in futures)
  36. print("工作进程数 =", NUM_WORKERS, time.time() - st, s)
  37. print()
  38. if __name__ == "__main__":
  39. mp.set_start_method("fork")
  40. benchmark()

输出:

  1. 数组大小 = 300000000
  2. 生成数组时间 5.1455769538879395
  3. CPU核心数 = 4
  4. 试验 = 0
  5. 方法1 0.29593801498413086 150004049.39847052
  6. 工作进程数 = 1 1.8904719352722168 150004049.39847052
  7. 工作进程数 = 2 1.2082111835479736 150004049.39847034
  8. 工作进程数 = 3 1.2650330066680908 150004049.39847082
  9. 工作进程数 = 4 1.233708143234253 150004049.39847046
  10. 试验 = 1
  11. 方法1 0.5861320495605469 150004049.39847052
  12. 工作进程数 = 1 1.801928997039795 150004049.39847052
  13. 工作进程数 = 2 1.165492057800293 150004049.39847034
  14. 工作进程数 = 3 1.2669389247894287 150004049.39847082
  15. 工作进程数 = 4 1.2941789627075195 150004049.39847043
  16. 试验 = 2
  17. 方法1 0.44912219047546387 150004049.39847052
  18. 工作进程数 = 1 1.8038971424102783 150004049.39847052
  19. 工作进程数 = 2 1.1491520404815674 150004049.39847034
  20. 工作进程数 = 3 1.3324410915374756 150004049.39847082
  21. 工作进程数 = 4 1.4198641777038574 150004049.39847046
  22. 试验 = 3
  23. 方法1 0.5163640975952148 150004049.39847052
  24. 工作进程数 = 1 3.248213052749634 150004049.39847052
  25. 工作进程数 = 2 2.5148861408233643 150004049.39847034
  26. 工作进程数 = 3 1.0224149227142334 150004049.39847082
  27. 工作进程数 = 4 1.20924711227417 150004049.39847046
  28. 试验 = 4
  29. 方法1 1.2363107204437256 150004049.39847052
  30. 工作进程数 = 1 1.8627309799194336 150004049.39847052
  31. 工作进程数 = 2 1.233341932296753 150004049.39847034
  32. 工作进程数 = 3 1.3235111236572266 150004049.39847082
  33. 工作进程数 = 4 1.344843864440918 150004049.39847046

我查阅了一些链接:

英文:

If I have a numpy array of size 10^8 to 10^9, is it possible to compute its sum faster than np.sum?

I've tried using multiprocessing with fork, but it seems to be slower than just calling np.sum, regardless of the number of workers (1-4). I'm using Python 3.8 on a Mac with a 2 GHz Dual-Core Intel Core i5 processor. Not sure whether the results would be different if I had more CPUs.

My code:

  1. import concurrent.futures
  2. import multiprocessing as mp
  3. import time
  4. from concurrent.futures.process import ProcessPoolExecutor
  5. import numpy as np
  6. # based on: https://luis-sena.medium.com/sharing-big-numpy-arrays-across-python-processes-abf0dc2a0ab2
  7. def np_sum_global(start, stop):
  8. return np.sum(data[start:stop])
  9. def benchmark():
  10. st = time.time()
  11. ARRAY_SIZE = int(3e8)
  12. print("array size =", ARRAY_SIZE)
  13. global data
  14. data = np.random.random(ARRAY_SIZE)
  15. print("generated", time.time() - st)
  16. print("CPU Count =", mp.cpu_count())
  17. for trial in range(5):
  18. print("TRIAL =", trial)
  19. st = time.time()
  20. s = np.sum(data)
  21. print("method 1", time.time() - st, s)
  22. for NUM_WORKERS in range(1, 5):
  23. st = time.time()
  24. futures = []
  25. with ProcessPoolExecutor(max_workers=NUM_WORKERS) as executor:
  26. for i in range(0, NUM_WORKERS):
  27. futures.append(
  28. executor.submit(
  29. np_sum_global,
  30. ARRAY_SIZE * i // NUM_WORKERS,
  31. ARRAY_SIZE * (i + 1) // NUM_WORKERS,
  32. )
  33. )
  34. futures, _ = concurrent.futures.wait(futures)
  35. s = sum(future.result() for future in futures)
  36. print("workers =", NUM_WORKERS, time.time() - st, s)
  37. print()
  38. if __name__ == "__main__":
  39. mp.set_start_method("fork")
  40. benchmark()

Output:

  1. array size = 300000000
  2. generated 5.1455769538879395
  3. CPU Count = 4
  4. TRIAL = 0
  5. method 1 0.29593801498413086 150004049.39847052
  6. workers = 1 1.8904719352722168 150004049.39847052
  7. workers = 2 1.2082111835479736 150004049.39847034
  8. workers = 3 1.2650330066680908 150004049.39847082
  9. workers = 4 1.233708143234253 150004049.39847046
  10. TRIAL = 1
  11. method 1 0.5861320495605469 150004049.39847052
  12. workers = 1 1.801928997039795 150004049.39847052
  13. workers = 2 1.165492057800293 150004049.39847034
  14. workers = 3 1.2669389247894287 150004049.39847082
  15. workers = 4 1.2941789627075195 150004049.39847043
  16. TRIAL = 2
  17. method 1 0.44912219047546387 150004049.39847052
  18. workers = 1 1.8038971424102783 150004049.39847052
  19. workers = 2 1.1491520404815674 150004049.39847034
  20. workers = 3 1.3324410915374756 150004049.39847082
  21. workers = 4 1.4198641777038574 150004049.39847046
  22. TRIAL = 3
  23. method 1 0.5163640975952148 150004049.39847052
  24. workers = 1 3.248213052749634 150004049.39847052
  25. workers = 2 2.5148861408233643 150004049.39847034
  26. workers = 3 1.0224149227142334 150004049.39847082
  27. workers = 4 1.20924711227417 150004049.39847046
  28. TRIAL = 4
  29. method 1 1.2363107204437256 150004049.39847052
  30. workers = 1 1.8627309799194336 150004049.39847052
  31. workers = 2 1.233341932296753 150004049.39847034
  32. workers = 3 1.3235111236572266 150004049.39847082
  33. workers = 4 1.344843864440918 150004049.39847046

Some links I've looked at:

答案1

得分: 1

这是一个使用Numba进行性能基准测试的示例。它首先需要编译代码,这会导致第一次运行速度较慢。后续运行通常比NumPy快两到三倍。因此,是否值得使用Numba取决于你运行代码的频率。

  1. import numba
  2. import numpy as np
  3. import time
  4. # 基于:https://luis-sena.medium.com/sharing-big-numpy-arrays-across-python-processes-abf0dc2a0ab2
  5. @numba.jit(nopython=True, parallel=True, cache=True)
  6. def numba_sum(data):
  7. return np.sum(data)
  8. def benchmark():
  9. st = time.time()
  10. ARRAY_SIZE = int(3e8)
  11. print("数组大小 =", ARRAY_SIZE)
  12. global data
  13. data = np.random.random(ARRAY_SIZE)
  14. print("生成了", time.time() - st)
  15. for trial in range(5):
  16. print("试验 =", trial)
  17. st = time.time()
  18. s = np.sum(data)
  19. print("方法 1", time.time() - st, s)
  20. print("试验 =", trial)
  21. st = time.time()
  22. s = numba_sum(data)
  23. print("方法 2", time.time() - st, s)
  24. if __name__ == "__main__":
  25. benchmark()

希望这对你有帮助。

英文:

Here's a benchmark with numba. It first has to compile the code which makes the first run a lot slower. The next runs are about twice to 3 times faster than numpy. So it depends on how often you run the code, if numba is worth it for you or not.

  1. import numba
  2. import numpy as np
  3. import time
  4. # based on: https://luis-sena.medium.com/sharing-big-numpy-arrays-across-python-processes-abf0dc2a0ab2
  5. @numba.jit(nopython=True, parallel=True, cache=True)
  6. def numba_sum(data):
  7. return np.sum(data)
  8. def benchmark():
  9. st = time.time()
  10. ARRAY_SIZE = int(3e8)
  11. print("array size =", ARRAY_SIZE)
  12. global data
  13. data = np.random.random(ARRAY_SIZE)
  14. print("generated", time.time() - st)
  15. for trial in range(5):
  16. print("TRIAL =", trial)
  17. st = time.time()
  18. s = np.sum(data)
  19. print("method 1", time.time() - st, s)
  20. print("TRIAL =", trial)
  21. st = time.time()
  22. s = numba_sum(data)
  23. print("method 2", time.time() - st, s)
  24. if __name__ == "__main__":
  25. benchmark()

huangapple
  • 本文由 发表于 2023年4月4日 09:35:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75924868.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定