我如何高效地找到两个非常大的三维坐标数组中点的最小距离值?

huangapple go评论69阅读模式
英文:

How can I efficiently find the minimum distance values for points in two very large 3D coordinate arrays?

问题

让我们假设我们有两个3D坐标点的数组。

A = (1000000, 3) 类型为 float,
B = (100000, 3) 类型为 float

对于A中的每个坐标,我想找到到B中任何坐标的最小欧氏距离。这意味着它应该计算A[0]与B中所有坐标之间的欧氏距离,然后取最小值。

我编写了使用循环执行此操作的代码。它有效,但由于我的数组大小,需要超过一个小时才能完成。伪代码大致如下:

minDistances = np.zeros(A.shape)
for i in range(len(A)):
  queriedPoint = A[i]
  distances = B - queriedPoint
  euclideanDistances = np.linalg.norm(distances, axis=1)
  minDistance = np.min(euclideanDistances)
  minDistances[i] = minDistance

理想情况下,我希望将这个向量化,但这样做似乎会导致我的程序因RAM使用而崩溃。有没有关于如何更高效地完成这个任务的想法或提示?我在思考是否可以以某种方式将问题简化,或者重新思考如何解决它。谢谢!

英文:

Let's pretend we have two arrays of 3D coordinate points.

A = (1000000, 3) of type float,
B = (100000, 3) of type float

For each coordinate in A, I would like to find the minimum euclidean distance to any coordinate in B. This means that it should calculate the euclidean distance between A[0], for instance, with all of B, and then take the smallest value.

I wrote up code to do this using loops. It works, but due to the size of my arrays it takes more than an hour to complete. The pseudocode looks something like this:

minDistances = np.zeros(A.shape)
for i in range(len(A)):
  queriedPoint = A[i]
  distances = B - queriedPoint
  euclideanDistances = np.linalg.norm(distances, axis=1)
  minDistance = np.min(euclideanDistances)
  minDistances[i] = minDistance

Ideally, I'd like to vectorize this, but in doing so it seems to crash my program from RAM usage. Any ideas or tips for how to do this more efficiently? I was wondering if maybe I could reduce the problem to something easier somehow, or else rethink how to approach it alltogether. Thanks!

答案1

得分: 3

最快的方法可能是使用scipy.spatial.KDTree建议的重复问题推荐使用scipy.spatial.distance.cdist,但是你的数组太大了,会消耗太多内存。

import numpy as np
from scipy.spatial import KDTree

rng = np.random.default_rng(42)
A = rng.uniform(low=-100, high=100, size=(1_000_000, 3))
B = rng.uniform(low=-100, high=100, size=(100_000, 3))

tree = KDTree(B)
distances = tree.query(A)[0]

我不知道AB中实际值的范围,所以我只是使用了(-100, 100)。这需要大约2.2秒来运行。

英文:

The fastest way to do this is probably using scipy.spatial.KDTree. The proposed duplicate recommends using scipy.spatial.distance.cdist, but your arrays are too large that it would consume too much memory.

import numpy as np
from scipy.spatial import KDTree

rng = np.random.default_rng(42)
A = rng.uniform(low=-100, high=100, size=(1_000_000, 3))
B = rng.uniform(low=-100, high=100, size=(100_000, 3))

tree = KDTree(B)
distances = tree.query(A)[0]

I don't know the actual range of values in A and B, so I just used (-100, 100). This takes about 2.2 seconds to run.

答案2

得分: 1

蛮力方法将需要NxN个距离计算。

一个更简单的方法是使用“分桶”,即使用一些特殊的盒子。

构建盒子大约需要4N个计算。例如,首先确定每个数组的最大和最小X、Y、Z坐标。然后,将“空间”分割为64个盒子,例如对于array_1有64个盒子,对于array_2有64个盒子。

通过简单的顶点比较,您可以得到两个盒子(每个数组一个)中距离较近的盒子。是的,这是蛮力方法,但适用于小数据。
注意:如果盒子相交或有多个近距离的对,那么您需要一个候选列表,更多的盒子,但仍然不是最初的大数据。

然后在数组上运行新的遍历。只获取在盒子列表中的那些点。

最后,您可以对选择的点运行蛮力方法。

对于最糟糕的情况,即array_1的大多数盒子与array_2的某些盒子相交(或距离相同),然后只需将每个盒子分成八个更小的盒子并再次检查。最坏的情况可能比与数组一起使用蛮力方法更糟糕,但很少发生。

英文:

Brute force will require NxN distance calculations.

A simpler method is to use "bucketing", i.e. use some special boxes.

Building boxes requires about 4N calculations. For example, first pass determine the max & min X,Y,Z, coords for each array. Then you split the "space" into say 4x4x4=64 boxes for array_1 and other 64 for array_2.

By simple vertices comparisons you can get the two boxes (one for each array) that are closer. Yes, this is brute force, but with small data.
<br>Note: If boxes intersect or there are more than one close pair, then you need a list of candidates, more boxes, but still not the initial big data.

Then run new pass on the arrays. Get only those points that lay in the list of boxes.

Finally you can run a brute-force with that chosen points.

For a pesimal case, where most of boxes of array_1 intersect (or are at same distance) with some boxes of array_2, then just split each box into eight smaller boxes and check again. The worst case may be (but rare) worst that brute force with the arrays.

答案3

得分: 0

是的,所以最好的方法是将问题分解成子问题,使用分治算法;因此,根据上面的伪代码,我们可以尝试使用字典和列表推导来实现它。

import numpy as np
A = np.random.rand(1000000, 3)
B = np.random.rand(1000000, 3)

minDistance = {i: np.linalg.norm(B - A[i], axis=1).min() for i in range(len(A))}
minDistance = [minDistance[i] for i in range(len(A))]

print(minDistance)

通过使用字典和列表推导,它们有助于优化时间和复杂性。希望这对你有所帮助。

或者,你也可以尝试基于循环的方法。

import numpy as np
from scipy.spatial.distance import cdist

A = np.random.rand(1000000, 3)
B = np.random.rand(1000000, 3)

minDist = np.zeros(len(A))

for i, coord in enumerate(A):
    distances = cdist(np.expand_dims(coord, axis=0), B)
    minDist[i] = np.min(distances)
print(minDist)
英文:

Yes, so the best way is to divide the problem into subproblems, divide and conquer algorithm 我如何高效地找到两个非常大的三维坐标数组中点的最小距离值? so given the pseudocode above, we can try to attempt it by utilizing dict & list comprehension

import numpy as np
A = np.random.rand(1000000,3)
B = np.random.rand(1000000,3)

minDistance = {i: np.linalg.norm(B-A[i],axis=1).min() for i in range(len(A))}
minDistance = [minDistance[i] for i in range(len(A))]

print(minDistance)

by using dict & list comprehension, they help to optimize both time and complexity. hope that helps

Alternatively, you can try the loop-based approach

import numpy as np
from scipy.spatial.distance import cdist

A = np.random.rand(1000000,3)
B = np.random.rand(1000000,3)

minDist = np.zeros(len(A))

for i, coord in enumerate(A):
    distances = cdist(np.expand_dims(coord,axis=0),B)
    minDist[i] = np.min(distances)
print(minDist)

答案4

得分: 0

如果这两个数组被填充了完全随机的数字,那么也许没有什么可以做的。如果每个数组对应于例如车辆轨迹,那么你应该查看豪斯多夫距离和弗雷歇度量。

英文:

If the two arrays are filled with completely random numbers, then perhaps there’s little to be done. If each array corresponds to for example a vehicle track, then you should look at the Hausdorff distance and the Fréchet metric.

huangapple
  • 本文由 发表于 2023年7月11日 06:27:47
  • 转载请务必保留本文链接:https://go.coder-hub.com/76657702.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定