2023年8月9日 03:46:55go评论96阅读模式

英文:

Optimisation of For looping with zip() for handling huge data computations

问题

我正在为处理大量数据的研究代码编写Python代码。我使用3D数组和1D数组。我经常在代码中使用for循环。我想用更好的选项来优化for循环，比如使用zip()函数，如果可能的话，还有其他更好的编码思路。我在这里写了一个示例代码，包含我在编码中使用的所有功能。例如：在循环中调用一个函数，2.将2D数组的每个元素与1D数组的每个元素相乘，如代码所示。请指导我。

#*******************************************************************
import numpy as np
import time
import random
#*******************************************************************
Nx = 5    # x方向分辨率
Ny = 5    # y方向分辨率
NL = 7
weights = np.array([0.2,0.5,0.8,1.5,2.8,5.9,1.5]) # 总和为1
ux = np.array([[1, 2,3,4,5], [3, 4,6,8,9], [1, 4,3,6,2], [4, 5,7,4,3], [1, 2,6,5,11]])
print(ux)
#+++++++++++++++++++++++++++++++++++++++++++
def add(a,b):
    c = a+b
    return(c)
#++++++++++++++++++++++++++++++++++++++++++++
# 普通的for循环
def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            sum = 0.0
            for k in range(NL):
                a= add(i,j)
                sum= sum + weights[k]*ux[i,j]+a
            ux[i,j]= sum
    end = time.time()
    print('Time:',end-start)
#+++++++++++++++++++++++++++++++++++++++++++
    print(ux)
#
    return 0
#++++++++++++++++++++++
if __name__== "__main__":
  main()
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

英文:

I am writing a code in python for my research code which is used to handle huge data. I use 3D arrays along with 1D arrays. I use for loops frequently in my code. I would like to optimise the for loops with better options such as zip() and if possible any other better coding idea. I wrote a sample code here, with all features I use in my coding. For example: Call a function with in a loop, 2. multiplication of each element of a 2D code with each element of a 1D array as shown in the code. Kindly guide me.

#*******************************************************************
import numpy as np
import time
import random
#*******************************************************************
Nx = 5    # resolution x-dir
Ny = 5    # resolution y-dir
NL = 7
weights = np.array([0.2,0.5,0.8,1.5,2.8,5.9,1.5]) # sums to 1
ux = np.array([[1, 2,3,4,5], [3, 4,6,8,9], [1, 4,3,6,2], [4, 5,7,4,3], [1, 2,6,5,11]])
print(ux)
#+++++++++++++++++++++++++++++++++++++++++++
def add(a,b):
    c = a+b
    return(c)
#++++++++++++++++++++++++++++++++++++++++++++
# Normal For Looping
def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            sum = 0.0
            for k in range(NL):
                a= add(i,j)
                sum= sum + weights[k]*ux[i,j]+a
            ux[i,j]= sum
    end = time.time()
    print(&#39;Time:&#39;,end-start)
#+++++++++++++++++++++++++++++++++++++++++++
    print(ux)
#
    return 0
#++++++++++++++++++++++
if __name__== &quot;__main__&quot;:
  main()
#%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

答案1

得分: 1

一个很明显的问题是，你的内部循环中有两个不变量：ux[i,j]和a。这些可以移到内部循环之外，减少加法和查找的次数：

def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            tot = 0.0
            val = ux[i, j] + i + j
            for k in range(NL):
                tot += weights[k] * val
            ux[i,j]= tot
    end = time.time()
    print('Time:',end-start)

现在我们可以看到，我们实际上可以只做weights * val，并完全消除内部循环：

def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j
            ux[i,j]= (weights * val).sum()

但由于weights和val的乘法是可交换的，weights.sum()是一个常数，可以简化为：

weight = weights.sum()
def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j
            ux[i,j] = val * weight

这样应该能大大减少计算时间。

最后一部分使用np.grid来获取ux中每个位置的索引：

grid = np.indices((5, 5))
# grid[0]是行索引，grid[1]是列索引
arr = grid[0] + grid[1] + ux
# 最后，将每个值与权重常数相乘
result = arr * weight

所以你的整个代码可以简化为：

def main():
    grid = np.indices((Nx, Ny))
    weight = weights.sum()
    arr = ux + grid[0] + grid[1]
    result = arr * weight

英文:

A big thing that stands out, your inner loop over k contains two invariants: ux[i,j] and a. These could be pushed out of the inner loop, reducing the amount of additions and lookups:

def main():
    start = time.time()
    for i in range(Nx):
        for j in range(Ny):
            tot = 0.0
            val = ux[i, j] + i + j
            for k in range(NL):
                tot += weights[k] * val
            ux[i,j]= tot
    end = time.time()
    print(&#39;Time:&#39;,end-start)

Now that this has happened, we can see that we really can just do weights * val and eliminate the inner loops entirely:

def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j
            ux[i,j]= (weights * val).sum()

But since the multiplication of weights and val being added is commutative, weights.sum() is a constant, and can just be:

weight = weights.sum()
def main():
    for i in range(Nx):
        for j in range(Ny):
            val = ux[i, j] + i + j
            ux[i,j] = val * weight

This should cut your computation time by a decent amount already.

The last bit uses np.grid to get the indexes of each position in ux:

grid = np.indices((5, 5))
# grid[0] is row indices and grid[1] is column indices
arr = grid[0] + grid[1] + ux
# and finally, the multiplication of each value against
# the weight constant
result = arr * weight

So your whole code boils down to:

def main():
    grid = np.indices((Nx, Ny))
    weight = weights.sum()
    arr = ux + grid[0] + grid[1]
    result = arr * weight
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

优化使用zip()函数处理大数据计算的for循环

问题

答案1

在Java中对字符串进行Base64编码会使其长度翻倍。

将样式化的数据框导出到Excel（背景颜色）

强制数据流工作者使用Python 3。

Jupyter Notebook导出时省略了Markdown。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。