在循环遍历NumPy数组时性能不佳。

huangapple go评论109阅读模式
英文:

Poor performance when looping over numpy array

问题

以下是您要翻译的部分:

"使用get_height函数,我计算每次扫描(例如我的示例中的y、z坐标)两个点云之间的高度差异。

我的算法有效,但平均需要1.93秒。如何提高性能?

编辑:我附上了一个完全工作的示例

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. def generate_random_dataset(N,x_max):
  4. # 创建'x'列
  5. unique_x = np.linspace(0, x_max, x_max*10+1)
  6. x = np.random.choice(unique_x, N) # 生成带有重复值的数组
  7. # 创建'y'列
  8. y = np.random.uniform(-5, 5, N)
  9. # 创建'z'列
  10. z = - y**2 + 5 + np.random.normal(0, 1, N)
  11. # 创建'A'数组
  12. A = np.column_stack((x, y, z))
  13. return A
  14. def get_height(A0,A1):
  15. # 获取两个扫描中都存在的唯一x值
  16. ux0 = np.unique(A0[:,0])
  17. ux1 = np.unique(A1[:,0])
  18. ux = np.intersect1d(ux0,ux1)
  19. # 获取每个唯一x值的高度
  20. h = []
  21. for x in ux:
  22. # 获取较低扫描的切片
  23. mask0 = (A0[:,0] == x)
  24. z0 = A0[mask0,2]
  25. # 获取较高扫描的切片
  26. mask1 = (A1[:,0] == x)
  27. z1 = A1[mask1,2]
  28. # 获取高度差异
  29. height = np.max(z1) - np.max(z0)
  30. # 将结果附加到列表中
  31. h.append(height)
  32. # 将列表转换为数组
  33. h = np.array(h)
  34. return ux, h
  35. # 运行脚本
  36. A0 = generate_random_dataset(N=300000,x_max=100)
  37. A1 = generate_random_dataset(N=310000,x_max=120)
  38. A1[:,2] = A1[:,2] - 0.001*(A1[:,0]-50)**2 + 5 # 使A1比A0更高且不同
  39. # 应用函数
  40. %timeit ux,h = get_height(A0,A1)
  41. ux0 = np.unique(A0[:,0])
  42. ux1 = np.unique(A1[:,0])
  43. ux = np.intersect1d(ux0,ux1)
  44. # 绘图
  45. fig = plt.figure(figsize=(4.24*1.5,3*1.5))
  46. ax = plt.subplot(111)
  47. ax.scatter(ux,h)
  48. ax.set_xlabel('x [mm]')
  49. ax.set_ylabel('h [mm]')
  50. plt.show()

我尝试过使用之前提到的np.lexsort方法,但对于两个数组,该方法不起作用。

我希望以不同的方式解决这个问题(而不是循环遍历唯一的x值),但我无法找到解决方案。"

英文:

With the function get_height I calculate the height difference between two point clouds for every scan (y,z-coordinates in my example).

My algorithm works but takes 1.93 s on average. How can I improve the performance?

EDIT: I attached a fully working example

  1. import numpy as np
  2. import matplotlib.pyplot as plt
  3. def generate_random_dataset(N,x_max):
  4. # Create the 'x' column
  5. unique_x = np.linspace(0, x_max, x_max*10+1)
  6. x = np.random.choice(unique_x, N) # Generate the array with repeated values
  7. # Create the 'y' column
  8. y = np.random.uniform(-5, 5, N)
  9. # Create the 'z' column
  10. z = - y**2 + 5 + np.random.normal(0, 1, N)
  11. # Create the 'A' array
  12. A = np.column_stack((x, y, z))
  13. return A
  14. def get_height(A0,A1):
  15. # get unique x values that are in both scans
  16. ux0 = np.unique(A0[:,0])
  17. ux1 = np.unique(A1[:,0])
  18. ux = np.intersect1d(ux0,ux1)
  19. # get height at each unique x value
  20. h = []
  21. for x in ux:
  22. # get slice of lower scan
  23. mask0 = (A0[:,0] == x)
  24. z0 = A0[mask0,2]
  25. # get slice of upper scan
  26. mask1 = (A1[:,0] == x)
  27. z1 = A1[mask1,2]
  28. # get height difference
  29. height = np.max(z1) - np.max(z0)
  30. # append results to list
  31. h.append(height)
  32. # convert list to array
  33. h = np.array(h)
  34. return ux, h
  35. # run script
  36. A0 = generate_random_dataset(N=300000,x_max=100)
  37. A1 = generate_random_dataset(N=310000,x_max=120)
  38. A1[:,2] = A1[:,2] - 0.001*(A1[:,0]-50)**2 + 5 # make A1 higher and different than A0
  39. # apply function
  40. %timeit ux,h = get_height(A0,A1)
  41. ux0 = np.unique(A0[:,0])
  42. ux1 = np.unique(A1[:,0])
  43. ux = np.intersect1d(ux0,ux1)
  44. # plot
  45. fig = plt.figure(figsize=(4.24*1.5,3*1.5))
  46. ax = plt.subplot(111)
  47. ax.scatter(ux,h)
  48. ax.set_xlabel('x [mm]')
  49. ax.set_ylabel('h [mm]')
  50. plt.show()

I've tried using np.lexsort approach from a previous question of mine but that approach doesn't work for two arrays.

I want to approach this problem differently (without looping over unique x values) but I can't figure out a solution.

答案1

得分: 0

以下是您要翻译的内容:

可能有一个 numpy 解决方案,但与每次迭代中的查找的 python 循环相比,使用 pandas 目前要快得多,甚至包括将数组转换为数据帧的开销。

  1. import pandas as pd
  2. def get_height_pd(A0, A1):
  3. df0 = pd.DataFrame(A0)
  4. df1 = pd.DataFrame(A1)
  5. m0 = df0.groupby(0)[2].max()
  6. m1 = df1.groupby(0)[2].max()
  7. return (m1 - m0).dropna() # dropna会去掉不相交的部分

或者,可能稍微更快一些,使用系列。

  1. def get_height_s(A0, A1):
  2. s0 = pd.Series(A0[:, 2])
  3. s1 = pd.Series(A1[:, 2])
  4. m0 = s0.groupby(A0[:, 0]).max()
  5. m1 = s1.groupby(A1[:, 0]).max()
  6. return (m1 - m0).dropna()
英文:

There is probably a numpy solution, but in the meantime using pandas is much faster than a python loop with lookup in each iteration, even including the overhead of converting the arrays into dataframes.

  1. import pandas as pd
  2. def get_height_pd(A0, A1):
  3. df0 = pd.DataFrame(A0)
  4. df1 = pd.DataFrame(A1)
  5. m0 = df0.groupby(0)[2].max()
  6. m1 = df1.groupby(0)[2].max()
  7. return (m1 - m0).dropna() # dropna gets rid of the non-intersecting ones

Alternatively, possibly a little faster, use series.

  1. def get_height_s(A0, A1):
  2. s0 = pd.Series(A0[:, 2])
  3. s1 = pd.Series(A1[:, 2])
  4. m0 = s0.groupby(A0[:, 0]).max()
  5. m1 = s1.groupby(A1[:, 0]).max()
  6. return (m1 - m0).dropna()

答案2

得分: 0

这里是一个使用numpy的不太美观的解决方案,使用此函数来获取最小值和最大值。将其中一个数组转置,使其位于另一个数组的下方并以相反方向(y' = offset - y,其中offset是一个合适的较小数值),然后将这两个数组连接在一起,然后找到每个x的最小值和最大值。每行中的最小值将是从A1的最大值到offset - maximum,每行中的最大值将是从A0的最大值。然后反转转置以获取高度差异。

  1. def agg_minmax(a): # 来自 https://stackoverflow.com/a/58908648/567595
  2. sidx = np.lexsort(a[:,::-1].T)
  3. b = a[sidx]
  4. m = np.r_[True,b[:-1,0]!=b[1:,0],True]
  5. return np.c_[b[m[:-1],:2], b[m[1:],1]]
  6. def get_height(A0, A1):
  7. min0 = A0[:, 2].min()
  8. offset = min0 + A1[:, 2].min() - 1
  9. b0 = A0[:, [0, 2]]
  10. b1 = np.array([A1[:, 0], offset - A1[:, 2]]).T
  11. c = np.concatenate((b0, b1))
  12. agg = agg_minmax(c)
  13. f = agg[(agg[:, 1] < min0) & (agg[:, 2] >= min0)] # 过滤掉不适用的行
  14. return f[:, 0], offset - f[:, 1] - f[:, 2]

它比pandas解决方案慢,但也许可以进行调整。

英文:

Here's an ugly numpy solution using this function to get the min and max. Transpose one of the arrays so that it is all below and in opposite direction to the other (y&#39; = offset - y where offset is a suitable low number), concatenate the two arrays together, then find the min and max for each x. The min in each row will be offset - maximum from A1, and the max in each row will be the maximum from A0. Then reverse the transposition to get the difference in heights.

  1. def agg_minmax(a): # from https://stackoverflow.com/a/58908648/567595
  2. sidx = np.lexsort(a[:,::-1].T)
  3. b = a[sidx]
  4. m = np.r_[True,b[:-1,0]!=b[1:,0],True]
  5. return np.c_[b[m[:-1],:2], b[m[1:],1]]
  6. def get_height(A0, A1):
  7. min0 = A0[:, 2].min()
  8. offset = min0 + A1[:, 2].min() - 1
  9. b0 = A0[:, [0, 2]]
  10. b1 = np.array([A1[:, 0], offset - A1[:, 2]]).T
  11. c = np.concatenate((b0, b1))
  12. agg = agg_minmax(c)
  13. f = agg[(agg[:, 1] &lt; min0) &amp; (agg[:, 2] &gt;= min0)] # filter out the not-applicable rows
  14. return f[:, 0], offset - f[:, 1] - f[:, 2]

It's slower than the pandas solutions but perhaps can be tweaked.

huangapple
  • 本文由 发表于 2023年6月26日 19:39:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76556331.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定