优化使用zip()函数处理大数据计算的for循环

huangapple go评论130阅读模式
英文:

Optimisation of For looping with zip() for handling huge data computations

问题

我正在为处理大量数据的研究代码编写Python代码。我使用3D数组和1D数组。我经常在代码中使用for循环。我想用更好的选项来优化for循环,比如使用zip()函数,如果可能的话,还有其他更好的编码思路。我在这里写了一个示例代码,包含我在编码中使用的所有功能。例如:在循环中调用一个函数,2.将2D数组的每个元素与1D数组的每个元素相乘,如代码所示。请指导我。

#*******************************************************************
import numpy as np
import time
import random
#*******************************************************************
Nx = 5    # x方向分辨率
Ny = 5    # y方向分辨率
NL = 7
weights = np.array([0.2,0.5,0.8,1.5,2.8,5.9,1.5]) # 总和为1
ux = np.array([[1, 2,3,4,5], [3, 4,6,8,9], [1, 4,3,6,2], [4, 5,7,4,3], [1, 2,6,5,11]])
print(ux)
#+++++++++++++++++++++++++++++++++++++++++++
def add(a,b):
    c = a+b
    return(c)
#++++++++++++++++++++++++++++++++++++++++++++
# 普通的for循环
def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            sum = 0.0
            for k in range(NL):
                a= add(i,j)
                sum= sum + weights[k]*ux[i,j]+a
            ux[i,j]= sum
    end = time.time()
    print('Time:',end-start)
#+++++++++++++++++++++++++++++++++++++++++++
    print(ux)
#
    return 0
#++++++++++++++++++++++
if __name__== "__main__":
  main()
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
英文:

I am writing a code in python for my research code which is used to handle huge data. I use 3D arrays along with 1D arrays. I use for loops frequently in my code. I would like to optimise the for loops with better options such as zip() and if possible any other better coding idea. I wrote a sample code here, with all features I use in my coding. For example: Call a function with in a loop, 2. multiplication of each element of a 2D code with each element of a 1D array as shown in the code. Kindly guide me.

#*******************************************************************
import numpy as np
import time
import random
#*******************************************************************
Nx = 5    # resolution x-dir
Ny = 5    # resolution y-dir
NL = 7
weights = np.array([0.2,0.5,0.8,1.5,2.8,5.9,1.5]) # sums to 1
ux = np.array([[1, 2,3,4,5], [3, 4,6,8,9], [1, 4,3,6,2], [4, 5,7,4,3], [1, 2,6,5,11]])
print(ux)
#+++++++++++++++++++++++++++++++++++++++++++
def add(a,b):
    c = a+b
    return(c)
#++++++++++++++++++++++++++++++++++++++++++++
# Normal For Looping
def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            sum = 0.0
            for k in range(NL):
                a= add(i,j)
                sum= sum + weights[k]*ux[i,j]+a
            ux[i,j]= sum
    end = time.time()
    print('Time:',end-start)
#+++++++++++++++++++++++++++++++++++++++++++
    print(ux)
#
    return 0
#++++++++++++++++++++++
if __name__== "__main__":
  main()
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

答案1

得分: 1

一个很明显的问题是,你的内部循环中有两个不变量:ux[i,j]a。这些可以移到内部循环之外,减少加法和查找的次数:

def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            tot = 0.0
            val = ux[i, j] + i + j

            for k in range(NL):
                tot += weights[k] * val

            ux[i,j]= tot

    end = time.time()
    print('Time:',end-start)

现在我们可以看到,我们实际上可以只做weights * val,并完全消除内部循环:

def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j

            ux[i,j]= (weights * val).sum()

但由于weightsval的乘法是可交换的,weights.sum()是一个常数,可以简化为:

weight = weights.sum()

def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j
            ux[i,j] = val * weight

这样应该能大大减少计算时间。

最后一部分使用np.grid来获取ux中每个位置的索引:

grid = np.indices((5, 5))

# grid[0]是行索引,grid[1]是列索引
arr = grid[0] + grid[1] + ux

# 最后,将每个值与权重常数相乘
result = arr * weight

所以你的整个代码可以简化为:

def main():
    grid = np.indices((Nx, Ny))
    weight = weights.sum()

    arr = ux + grid[0] + grid[1]

    result = arr * weight
英文:

A big thing that stands out, your inner loop over k contains two invariants: ux[i,j] and a. These could be pushed out of the inner loop, reducing the amount of additions and lookups:

def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            tot = 0.0
            val = ux[i, j] + i + j

            for k in range(NL):
                tot += weights[k] * val

            ux[i,j]= tot

    end = time.time()
    print('Time:',end-start)

Now that this has happened, we can see that we really can just do weights * val and eliminate the inner loops entirely:

def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j

            ux[i,j]= (weights * val).sum()

But since the multiplication of weights and val being added is commutative, weights.sum() is a constant, and can just be:

weight = weights.sum()

def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j
            ux[i,j] = val * weight

This should cut your computation time by a decent amount already.

The last bit uses np.grid to get the indexes of each position in ux:

grid = np.indices((5, 5))

# grid[0] is row indices and grid[1] is column indices
arr = grid[0] + grid[1] + ux

# and finally, the multiplication of each value against
# the weight constant
result = arr * weight

So your whole code boils down to:

def main():
    grid = np.indices((Nx, Ny))
    weight = weights.sum()

    arr = ux + grid[0] + grid[1]

    result = arr * weight

</details>



huangapple
  • 本文由 发表于 2023年8月9日 03:46:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76862787.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定