优化使用zip()函数处理大数据计算的for循环

huangapple go评论96阅读模式
英文:

Optimisation of For looping with zip() for handling huge data computations

问题

我正在为处理大量数据的研究代码编写Python代码。我使用3D数组和1D数组。我经常在代码中使用for循环。我想用更好的选项来优化for循环,比如使用zip()函数,如果可能的话,还有其他更好的编码思路。我在这里写了一个示例代码,包含我在编码中使用的所有功能。例如:在循环中调用一个函数,2.将2D数组的每个元素与1D数组的每个元素相乘,如代码所示。请指导我。

  1. #*******************************************************************
  2. import numpy as np
  3. import time
  4. import random
  5. #*******************************************************************
  6. Nx = 5 # x方向分辨率
  7. Ny = 5 # y方向分辨率
  8. NL = 7
  9. weights = np.array([0.2,0.5,0.8,1.5,2.8,5.9,1.5]) # 总和为1
  10. ux = np.array([[1, 2,3,4,5], [3, 4,6,8,9], [1, 4,3,6,2], [4, 5,7,4,3], [1, 2,6,5,11]])
  11. print(ux)
  12. #+++++++++++++++++++++++++++++++++++++++++++
  13. def add(a,b):
  14. c = a+b
  15. return(c)
  16. #++++++++++++++++++++++++++++++++++++++++++++
  17. # 普通的for循环
  18. def main():
  19. start = time.time()
  20. for i in range(Nx):
  21. for j in range(Ny):
  22. sum = 0.0
  23. for k in range(NL):
  24. a= add(i,j)
  25. sum= sum + weights[k]*ux[i,j]+a
  26. ux[i,j]= sum
  27. end = time.time()
  28. print('Time:',end-start)
  29. #+++++++++++++++++++++++++++++++++++++++++++
  30. print(ux)
  31. #
  32. return 0
  33. #++++++++++++++++++++++
  34. if __name__== "__main__":
  35. main()
  36. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
英文:

I am writing a code in python for my research code which is used to handle huge data. I use 3D arrays along with 1D arrays. I use for loops frequently in my code. I would like to optimise the for loops with better options such as zip() and if possible any other better coding idea. I wrote a sample code here, with all features I use in my coding. For example: Call a function with in a loop, 2. multiplication of each element of a 2D code with each element of a 1D array as shown in the code. Kindly guide me.

  1. #*******************************************************************
  2. import numpy as np
  3. import time
  4. import random
  5. #*******************************************************************
  6. Nx = 5 # resolution x-dir
  7. Ny = 5 # resolution y-dir
  8. NL = 7
  9. weights = np.array([0.2,0.5,0.8,1.5,2.8,5.9,1.5]) # sums to 1
  10. ux = np.array([[1, 2,3,4,5], [3, 4,6,8,9], [1, 4,3,6,2], [4, 5,7,4,3], [1, 2,6,5,11]])
  11. print(ux)
  12. #+++++++++++++++++++++++++++++++++++++++++++
  13. def add(a,b):
  14. c = a+b
  15. return(c)
  16. #++++++++++++++++++++++++++++++++++++++++++++
  17. # Normal For Looping
  18. def main():
  19. start = time.time()
  20. for i in range(Nx):
  21. for j in range(Ny):
  22. sum = 0.0
  23. for k in range(NL):
  24. a= add(i,j)
  25. sum= sum + weights[k]*ux[i,j]+a
  26. ux[i,j]= sum
  27. end = time.time()
  28. print('Time:',end-start)
  29. #+++++++++++++++++++++++++++++++++++++++++++
  30. print(ux)
  31. #
  32. return 0
  33. #++++++++++++++++++++++
  34. if __name__== "__main__":
  35. main()
  36. #%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

答案1

得分: 1

一个很明显的问题是,你的内部循环中有两个不变量:ux[i,j]a。这些可以移到内部循环之外,减少加法和查找的次数:

  1. def main():
  2. start = time.time()
  3. for i in range(Nx):
  4. for j in range(Ny):
  5. tot = 0.0
  6. val = ux[i, j] + i + j
  7. for k in range(NL):
  8. tot += weights[k] * val
  9. ux[i,j]= tot
  10. end = time.time()
  11. print('Time:',end-start)

现在我们可以看到,我们实际上可以只做weights * val,并完全消除内部循环:

  1. def main():
  2. for i in range(Nx):
  3. for j in range(Ny):
  4. val = ux[i, j] + i + j
  5. ux[i,j]= (weights * val).sum()

但由于weightsval的乘法是可交换的,weights.sum()是一个常数,可以简化为:

  1. weight = weights.sum()
  2. def main():
  3. for i in range(Nx):
  4. for j in range(Ny):
  5. val = ux[i, j] + i + j
  6. ux[i,j] = val * weight

这样应该能大大减少计算时间。

最后一部分使用np.grid来获取ux中每个位置的索引:

  1. grid = np.indices((5, 5))
  2. # grid[0]是行索引,grid[1]是列索引
  3. arr = grid[0] + grid[1] + ux
  4. # 最后,将每个值与权重常数相乘
  5. result = arr * weight

所以你的整个代码可以简化为:

  1. def main():
  2. grid = np.indices((Nx, Ny))
  3. weight = weights.sum()
  4. arr = ux + grid[0] + grid[1]
  5. result = arr * weight
英文:

A big thing that stands out, your inner loop over k contains two invariants: ux[i,j] and a. These could be pushed out of the inner loop, reducing the amount of additions and lookups:

  1. def main():
  2. start = time.time()
  3. for i in range(Nx):
  4. for j in range(Ny):
  5. tot = 0.0
  6. val = ux[i, j] + i + j
  7. for k in range(NL):
  8. tot += weights[k] * val
  9. ux[i,j]= tot
  10. end = time.time()
  11. print('Time:',end-start)

Now that this has happened, we can see that we really can just do weights * val and eliminate the inner loops entirely:

  1. def main():
  2. for i in range(Nx):
  3. for j in range(Ny):
  4. val = ux[i, j] + i + j
  5. ux[i,j]= (weights * val).sum()

But since the multiplication of weights and val being added is commutative, weights.sum() is a constant, and can just be:

  1. weight = weights.sum()
  2. def main():
  3. for i in range(Nx):
  4. for j in range(Ny):
  5. val = ux[i, j] + i + j
  6. ux[i,j] = val * weight

This should cut your computation time by a decent amount already.

The last bit uses np.grid to get the indexes of each position in ux:

  1. grid = np.indices((5, 5))
  2. # grid[0] is row indices and grid[1] is column indices
  3. arr = grid[0] + grid[1] + ux
  4. # and finally, the multiplication of each value against
  5. # the weight constant
  6. result = arr * weight

So your whole code boils down to:

  1. def main():
  2. grid = np.indices((Nx, Ny))
  3. weight = weights.sum()
  4. arr = ux + grid[0] + grid[1]
  5. result = arr * weight
  6. </details>

huangapple
  • 本文由 发表于 2023年8月9日 03:46:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862787.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定