Finding Distant Pairs in Python taking advantage of pandas

huangapple go评论48阅读模式
英文:

Finding Distant Pairs in Python taking advantage of pandas

问题

我有这个文件:

这可以被解读为:

data = np.loadtxt('test_2_stack_overflow.csv', delimiter=',')

或者

dfr = pd.read_csv('test_2_stack_overflow.csv', header=None)

列索引代表x坐标,行代表y坐标,其像素值在文件中报告。

我想选择所有具有相同距离(+-a容差)的点对。最简单的方法可能是应用两个循环,但我想利用numpy或pandas的功能。

我考虑应用一种类似于以下的方法:

diff = dfr.apply(lambda col: ((col.name - col.index)**2 + (col.name - col.index)**2)**0.5) 

然而,这似乎不可行。我也考虑采取另一种方法。我可以创建一个二维坐标向量,这可能也更通用。

我可以为坐标创建一个向量:

xx = np.arange(4)
yy = np.arange(4)

coord = np.array(np.meshgrid(xx, yy)).T.reshape(-1, 2)

之后,我可以计算每对之间的距离:

ref = pd.DataFrame.from_records(coord, columns=('x', 'y'))

diff2 = dfr.apply(
    lambda col: np.sqrt(
        abs(ref.iloc[col.name]['x'] - ref.iloc[col.index]['x']) ** 2 +
        abs(ref.iloc[col.name]['y'] - ref.iloc[col.index]['y']) ** 2
    )
)

然而,我得到了以下结果:

你可以注意到,所有的值都是整数,这是不可能的。例如,坐标为(0,2)和(3,1)的点之间的距离应为(sqrt(10))。

你认为呢?我应该选择更好的策略和/或是否有任何错误?

谢谢。

英文:

I have this file:

Finding Distant Pairs in Python taking advantage of pandas

This can be read as:

data = np.loadtxt('test_2_stack_overflow.csv', delimiter=',')

or

dfr = pd.read_csv('test_2_stack_overflow.csv',header=None)

The column index represent the x coordinate and the row the y coordinate of a pixel whose value is reported in the file.

I would like to select all the pairs that have the same distance (+-a tollerance). The easiest way could be to apply two loops but I would like to take advantage of the numpy or pandas.

I am thinking to apply a sort of

diff = dfr.apply(lambda col: ((col.name - col.index)**2 + (col.name - col.index)**2)**0.5) 

This however seems not feasible. I am also thinking to follow another path. I could create a two dimensional vector of coordinates, this could be also be more general.

I could create a vector for the coordinates:

xx = np.arange(4)
yy = np.arange(4)

coord = np.array(np.meshgrid(xx,yy)).T.reshape(-1,2)

After that I could compute the distances between each pair:


ref = pd.DataFrame.from_records(coord, columns=('x', 'y'))

diff2 = dfr.apply(
    lambda col: np.sqrt(
        abs(ref.iloc[col.name]['x'] - ref.iloc[col.index]['x']) ** 2 +
        abs(ref.iloc[col.name]['y'] - ref.iloc[col.index]['y']) ** 2
    )
)

However, I get the following:

Finding Distant Pairs in Python taking advantage of pandas

As you can notice, the values are all integer and that is not possible. For example, the points with coordinates (0,2) and (3,1) should have a distance of (sqrt(10))

What do you think? Should I choose a better strategy ans/or is there any error?

Thanks

答案1

得分: 1

也许你正在寻找cdist

from scipy.spatial.distance import cdist

xx = np.arange(4)
yy = np.arange(4)

grid = np.array(np.meshgrid(xx, yy)).T.reshape(-1, 2)
dist = cdist(grid, grid)

输出:

>>> dist
array([[0.        , 1.        , 2.        , 3.        , 1.        ,
        1.41421356, 2.23606798, 3.16227766, 2.        , 2.23606798,
        2.82842712, 3.60555128, 3.        , 3.16227766, 3.60555128,
        4.24264069],
       [1.        , 0.        , 1.        , 2.        , 1.41421356,
        1.        , 1.41421356, 2.23606798, 2.23606798, 2.        ,
        2.23606798, 2.82842712, 3.16227766, 3.        , 3.16227766,
        3.60555128],
       [2.        , 1.        , 0.        , 1.        , 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.82842712, 2.23606798,
        2.        , 2.23606798, 3.60555128, 3.16227766, 3.        ,
        3.16227766],
       [3.        , 2.        , 1.        , 0.        , 3.16227766,
        2.23606798, 1.41421356, 1.        , 3.60555128, 2.82842712,
        2.23606798, 2.        , 4.24264069, 3.60555128, 3.16227766,
        3.        ],
       [1.        , 1.41421356, 2.23606798, 3.16227766, 0.        ,
        1.        , 2.        , 3.        , 1.        , 1.41421356,
        2.23606798, 3.16227766, 2.        , 2.23606798, 2.82842712,
        3.60555128],
       [1.41421356, 1.        , 1.41421356, 2.23606798, 1.        ,
        0.        , 1.        , 2.        , 1.41421356, 1.        ,
        1.41421356, 2.23606798, 2.23606798, 2.        , 2.23606798,
        2.82842712],
       [2.23606798, 1.41421356, 1.        , 1.41421356, 2.        ,
        1.        , 0.        , 1.        , 2.23606798, 1.41421356,
        1.        , 1.41421356, 2.82842712, 2.23606798, 2.        ,
        2.23606798],
       [3.16227766, 2.23606798, 1.41421356, 1.        , 3.        ,
        2.        , 1.        , 0.        , 3.16227766, 2.23606798,
        1.41421356, 1.        , 3.60555128, 2.82842712, 2.23606798,
        2.        ],
       [2.        , 2.23606798, 2.82842712, 3.60555128, 1.        ,
        1.41421356, 2.23606798, 3.16227766, 0.        , 1.        ,
        2.        , 3.        , 1.        , 1.41421356, 2.23606798,
        3.16227766],
       [2.23606798, 2.        , 2.23606798, 2.82842712, 1.41421356,
        1.        , 1.41421356, 2.23606798, 1.        , 0.        ,
        1.        , 2.        , 1.41421356, 1.        , 1.41421356,
        2.23606798],
       [2.82842712, 2.23606798, 2.        , 2.23606798, 2.23606798,
        1.41421356, 1.        , 1.41421356, 2.        , 1.        ,
        0.        , 1.        , 2.23606798, 1.41421356, 1.        ,
        1.41421356],
       [3.60555128, 2.82842712, 2.23606798, 2.        , 3.16227766,
        2.23606798, 1.41421356, 1.        , 3.        , 2.        ,
        1.        , 0.        , 3.16227766, 2.23606798, 1.41421356,
        1.        ],
       [3.        , 3.16227766, 3.60555128, 4.24264069, 2.        ,
        2.23606798, 2.82842712, 3.60555128, 1.        , 1.41421356,
        2.23606798, 3.16227766, 0.        , 1.        , 2.        ,
        3.        ],
       [3.16227766, 3.        , 3.16227766, 3.60555128, 2.23606798,
        2.        , 2.23606798, 2.82842712, 1.41421356, 1.        ,
        1.41421356, 2.23606798, 1.        , 0.        , 1.        ,
        2.        ],
       [3.60555128, 3.16227766, 3.        , 3.16227766, 2.

<details>
<summary>英文:</summary>

Maybe, you are looking for [`cdist`][1]:

from scipy.spatial.distance import cdist

xx = np.arange(4)
yy = np.arange(4)

grid = np.array(np.meshgrid(xx, yy)).T.reshape(-1, 2)
dist = cdist(grid, grid)


Output:

>>> dist
array([[0. , 1. , 2. , 3. , 1. ,
1.41421356, 2.23606798, 3.16227766, 2. , 2.23606798,
2.82842712, 3.60555128, 3. , 3.16227766, 3.60555128,
4.24264069],
[1. , 0. , 1. , 2. , 1.41421356,
1. , 1.41421356, 2.23606798, 2.23606798, 2. ,
2.23606798, 2.82842712, 3.16227766, 3. , 3.16227766,
3.60555128],
[2. , 1. , 0. , 1. , 2.23606798,
1.41421356, 1. , 1.41421356, 2.82842712, 2.23606798,
2. , 2.23606798, 3.60555128, 3.16227766, 3. ,
3.16227766],
[3. , 2. , 1. , 0. , 3.16227766,
2.23606798, 1.41421356, 1. , 3.60555128, 2.82842712,
2.23606798, 2. , 4.24264069, 3.60555128, 3.16227766,
3. ],
[1. , 1.41421356, 2.23606798, 3.16227766, 0. ,
1. , 2. , 3. , 1. , 1.41421356,
2.23606798, 3.16227766, 2. , 2.23606798, 2.82842712,
3.60555128],
[1.41421356, 1. , 1.41421356, 2.23606798, 1. ,
0. , 1. , 2. , 1.41421356, 1. ,
1.41421356, 2.23606798, 2.23606798, 2. , 2.23606798,
2.82842712],
[2.23606798, 1.41421356, 1. , 1.41421356, 2. ,
1. , 0. , 1. , 2.23606798, 1.41421356,
1. , 1.41421356, 2.82842712, 2.23606798, 2. ,
2.23606798],
[3.16227766, 2.23606798, 1.41421356, 1. , 3. ,
2. , 1. , 0. , 3.16227766, 2.23606798,
1.41421356, 1. , 3.60555128, 2.82842712, 2.23606798,
2. ],
[2. , 2.23606798, 2.82842712, 3.60555128, 1. ,
1.41421356, 2.23606798, 3.16227766, 0. , 1. ,
2. , 3. , 1. , 1.41421356, 2.23606798,
3.16227766],
[2.23606798, 2. , 2.23606798, 2.82842712, 1.41421356,
1. , 1.41421356, 2.23606798, 1. , 0. ,
1. , 2. , 1.41421356, 1. , 1.41421356,
2.23606798],
[2.82842712, 2.23606798, 2. , 2.23606798, 2.23606798,
1.41421356, 1. , 1.41421356, 2. , 1. ,
0. , 1. , 2.23606798, 1.41421356, 1. ,
1.41421356],
[3.60555128, 2.82842712, 2.23606798, 2. , 3.16227766,
2.23606798, 1.41421356, 1. , 3. , 2. ,
1. , 0. , 3.16227766, 2.23606798, 1.41421356,
1. ],
[3. , 3.16227766, 3.60555128, 4.24264069, 2. ,
2.23606798, 2.82842712, 3.60555128, 1. , 1.41421356,
2.23606798, 3.16227766, 0. , 1. , 2. ,
3. ],
[3.16227766, 3. , 3.16227766, 3.60555128, 2.23606798,
2. , 2.23606798, 2.82842712, 1.41421356, 1. ,
1.41421356, 2.23606798, 1. , 0. , 1. ,
2. ],
[3.60555128, 3.16227766, 3. , 3.16227766, 2.82842712,
2.23606798, 2. , 2.23606798, 2.23606798, 1.41421356,
1. , 1.41421356, 2. , 1. , 0. ,
1. ],
[4.24264069, 3.60555128, 3.16227766, 3. , 3.60555128,
2.82842712, 2.23606798, 2. , 3.16227766, 2.23606798,
1.41421356, 1. , 3. , 2. , 1. ,
0. ]])


[1]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
</details>

huangapple
  • 本文由 发表于 2023年2月8日 17:23:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75383595.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定