Julia: 计算两组点之间的最小距离的快速方法

huangapple go评论51阅读模式
英文:

Julia: FAST way of calculating the smallest distances between two sets of points

问题

以下是您请求的翻译部分:

我在矩阵`A`中有5000个3D点,还有另一个矩阵`B`中的5000个3D点。

对于`A`中的每个点,我想找到与`B`中的点之间的最小距离。这些距离应存储在一个包含5000个条目的数组中。

到目前为止,我有以下解决方案,大约在`0.145342秒(23次分配:191.079 MiB)`内运行。如何进一步改进这个解决方案?

使用 Distances

A = rand(5000, 3)
B = rand(5000, 3)

mis = @time minimum(Distances.pairwise(SqEuclidean(), A, B, dims=1), dims=2)

英文:

I have 5000 3D points in a Matrix A and another 5000 3D point in a matrix B.

For each point in A i want to find the smallest distance to a point in B. These distances should be stored in an array with 5000 entries.

So far I have this solution, running in about 0.145342 seconds (23 allocations: 191.079 MiB). How can I improve this further?

using Distances

A = rand(5000, 3)
B = rand(5000, 3)

mis = @time minimum(Distances.pairwise(SqEuclidean(), A, B, dims=1), dims=2)

答案1

得分: 8

这是一个标准的方法,因为它具有更好的时间复杂度(尤其对于更大的数据而言):

using NearestNeighbors
nn(KDTree(B'; leafsize = 10), A')[2] .^ 2

两点说明:

  • 默认情况下,计算的是欧几里德距离(所以我对其进行了平方处理)
  • 默认情况下,NearestNeigbors.jl 假定观测数据以列存储(所以在解决方案中我需要 B'A';如果您的原始数据已经进行了转置,则不需要;设计这种方式的原因是因为 Julia 使用列主要矩阵存储方式)
英文:

This is a standard way to do it as it will have a better time complexity (especially for larger data):

using NearestNeighbors
nn(KDTree(B'; leafsize = 10), A')[2] .^ 2

Two comments:

  • by default Euclidean distance is computed (so I square it)
  • by default NearestNeigbors.jl assumes observations are stored in columns (so I need B' and A' in the solution; if your original data were transposed it would not be needed; the reason why it is designed this way is that Julia uses column major matrix storage)

答案2

得分: 6

使用Distances.pairwise(SqEuclidean(), A, B, dims=1)生成一个大的距离矩阵并不高效,因为主内存相对于CPU缓存和现代CPU的计算能力来说速度相当慢,而且短时间内不太可能变得更好(参见"内存墙"(memory wall))。使用两个基本的嵌套for循环来实时计算最小值更快。此外,可以使用多线程来更快地计算这个值。

function computeMinDist(A, B)
    n, m = size(A, 1), size(B, 1)
    result = zeros(n)
    Threads.@threads for i = 1:n
        minSqDist = Inf
        @inbounds for j = 1:m
            dx = A[i,1] - B[j,1]
            dy = A[i,2] - B[j,2]
            dz = A[i,3] - B[j,3]
            sqDist = dx*dx + dy*dy + dz*dz
            if sqDist < minSqDist
                minSqDist = sqDist
            end
        end
        result[i] = minSqDist
    end
    return result
end

mis = @time computeMinDist(A, B)

请注意,Julia解释器默认使用1个线程,但可以通过环境变量JULIA_NUM_THREADS=auto进行调整,或者通过使用标志--threads=auto来运行。有关更多信息,请参阅多线程文档


性能结果

以下是在我的i5-9600KF机器上进行的性能测试结果,该机器有6个核心(使用两个5000x3矩阵):

初始实现:  93.4 毫秒
此实现:      4.4 毫秒

因此,这个实现快了21倍。结果与少量ULP相同。

请注意,可以进一步优化代码,使用循环平铺,并可能通过转置 AB,以便JIT可以生成使用SIMD指令的更高效实现。

英文:

Generating a big distance matrix using Distances.pairwise(SqEuclidean(), A, B, dims=1) is not efficient because the main memory is pretty slow nowadays compared to CPU caches and the computing power of modern CPUs and this is not gonna be better any time soon (see "memory wall"). It is faster to compute the minimum on-the-fly using two basic nested for loops. Additionally, one can use multiple cores to compute this faster using multiple threads.

function computeMinDist(A, B)
    n, m = size(A, 1), size(B, 1)
    result = zeros(n)
    Threads.@threads for i = 1:n
        minSqDist = Inf
        @inbounds for j = 1:m
            dx = A[i,1] - B[j,1]
            dy = A[i,2] - B[j,2]
            dz = A[i,3] - B[j,3]
            sqDist = dx*dx + dy*dy + dz*dz
            if sqDist < minSqDist
                minSqDist = sqDist
            end
        end
        result[i] = minSqDist
    end
    return result
end

mis = @time computeMinDist(A, B)

Note the Julia interpreter uses 1 thread by default but this can be tuned using the environment variable JULIA_NUM_THREADS=auto or just by running it using the flag --threads=auto. See the multi-threading documentation for more information.


Performance results

Here are performance results on my i5-9600KF machine with 6 cores (with two 5000x3 matrices):

Initial implementation:  93.4 ms
This implementation:      4.4 ms

This implementation is thus 21 times faster.
Results are the same to few ULP.

Note the code can certainly be optimized further using loop tiling, and possibly by transposing A and B so the JIT can generate a more efficient implementation using SIMD instructions.

huangapple
  • 本文由 发表于 2023年2月18日 03:47:36
  • 转载请务必保留本文链接:https://go.coder-hub.com/75488652.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定