英文:
Finding Distant Pairs in Python taking advantage of pandas
问题
我有这个文件:
这可以被解读为:
data = np.loadtxt('test_2_stack_overflow.csv', delimiter=',')
或者
dfr = pd.read_csv('test_2_stack_overflow.csv', header=None)
列索引代表x坐标,行代表y坐标,其像素值在文件中报告。
我想选择所有具有相同距离(+-a容差)的点对。最简单的方法可能是应用两个循环,但我想利用numpy或pandas的功能。
我考虑应用一种类似于以下的方法:
diff = dfr.apply(lambda col: ((col.name - col.index)**2 + (col.name - col.index)**2)**0.5)
然而,这似乎不可行。我也考虑采取另一种方法。我可以创建一个二维坐标向量,这可能也更通用。
我可以为坐标创建一个向量:
xx = np.arange(4)
yy = np.arange(4)
coord = np.array(np.meshgrid(xx, yy)).T.reshape(-1, 2)
之后,我可以计算每对之间的距离:
ref = pd.DataFrame.from_records(coord, columns=('x', 'y'))
diff2 = dfr.apply(
lambda col: np.sqrt(
abs(ref.iloc[col.name]['x'] - ref.iloc[col.index]['x']) ** 2 +
abs(ref.iloc[col.name]['y'] - ref.iloc[col.index]['y']) ** 2
)
)
然而,我得到了以下结果:
你可以注意到,所有的值都是整数,这是不可能的。例如,坐标为(0,2)和(3,1)的点之间的距离应为(sqrt(10))。
你认为呢?我应该选择更好的策略和/或是否有任何错误?
谢谢。
英文:
I have this file:
This can be read as:
data = np.loadtxt('test_2_stack_overflow.csv', delimiter=',')
or
dfr = pd.read_csv('test_2_stack_overflow.csv',header=None)
The column index represent the x coordinate and the row the y coordinate of a pixel whose value is reported in the file.
I would like to select all the pairs that have the same distance (+-a tollerance). The easiest way could be to apply two loops but I would like to take advantage of the numpy or pandas.
I am thinking to apply a sort of
diff = dfr.apply(lambda col: ((col.name - col.index)**2 + (col.name - col.index)**2)**0.5)
This however seems not feasible. I am also thinking to follow another path. I could create a two dimensional vector of coordinates, this could be also be more general.
I could create a vector for the coordinates:
xx = np.arange(4)
yy = np.arange(4)
coord = np.array(np.meshgrid(xx,yy)).T.reshape(-1,2)
After that I could compute the distances between each pair:
ref = pd.DataFrame.from_records(coord, columns=('x', 'y'))
diff2 = dfr.apply(
lambda col: np.sqrt(
abs(ref.iloc[col.name]['x'] - ref.iloc[col.index]['x']) ** 2 +
abs(ref.iloc[col.name]['y'] - ref.iloc[col.index]['y']) ** 2
)
)
However, I get the following:
As you can notice, the values are all integer and that is not possible. For example, the points with coordinates (0,2) and (3,1) should have a distance of (sqrt(10))
What do you think? Should I choose a better strategy ans/or is there any error?
Thanks
答案1
得分: 1
也许你正在寻找cdist
:
from scipy.spatial.distance import cdist
xx = np.arange(4)
yy = np.arange(4)
grid = np.array(np.meshgrid(xx, yy)).T.reshape(-1, 2)
dist = cdist(grid, grid)
输出:
>>> dist
array([[0. , 1. , 2. , 3. , 1. ,
1.41421356, 2.23606798, 3.16227766, 2. , 2.23606798,
2.82842712, 3.60555128, 3. , 3.16227766, 3.60555128,
4.24264069],
[1. , 0. , 1. , 2. , 1.41421356,
1. , 1.41421356, 2.23606798, 2.23606798, 2. ,
2.23606798, 2.82842712, 3.16227766, 3. , 3.16227766,
3.60555128],
[2. , 1. , 0. , 1. , 2.23606798,
1.41421356, 1. , 1.41421356, 2.82842712, 2.23606798,
2. , 2.23606798, 3.60555128, 3.16227766, 3. ,
3.16227766],
[3. , 2. , 1. , 0. , 3.16227766,
2.23606798, 1.41421356, 1. , 3.60555128, 2.82842712,
2.23606798, 2. , 4.24264069, 3.60555128, 3.16227766,
3. ],
[1. , 1.41421356, 2.23606798, 3.16227766, 0. ,
1. , 2. , 3. , 1. , 1.41421356,
2.23606798, 3.16227766, 2. , 2.23606798, 2.82842712,
3.60555128],
[1.41421356, 1. , 1.41421356, 2.23606798, 1. ,
0. , 1. , 2. , 1.41421356, 1. ,
1.41421356, 2.23606798, 2.23606798, 2. , 2.23606798,
2.82842712],
[2.23606798, 1.41421356, 1. , 1.41421356, 2. ,
1. , 0. , 1. , 2.23606798, 1.41421356,
1. , 1.41421356, 2.82842712, 2.23606798, 2. ,
2.23606798],
[3.16227766, 2.23606798, 1.41421356, 1. , 3. ,
2. , 1. , 0. , 3.16227766, 2.23606798,
1.41421356, 1. , 3.60555128, 2.82842712, 2.23606798,
2. ],
[2. , 2.23606798, 2.82842712, 3.60555128, 1. ,
1.41421356, 2.23606798, 3.16227766, 0. , 1. ,
2. , 3. , 1. , 1.41421356, 2.23606798,
3.16227766],
[2.23606798, 2. , 2.23606798, 2.82842712, 1.41421356,
1. , 1.41421356, 2.23606798, 1. , 0. ,
1. , 2. , 1.41421356, 1. , 1.41421356,
2.23606798],
[2.82842712, 2.23606798, 2. , 2.23606798, 2.23606798,
1.41421356, 1. , 1.41421356, 2. , 1. ,
0. , 1. , 2.23606798, 1.41421356, 1. ,
1.41421356],
[3.60555128, 2.82842712, 2.23606798, 2. , 3.16227766,
2.23606798, 1.41421356, 1. , 3. , 2. ,
1. , 0. , 3.16227766, 2.23606798, 1.41421356,
1. ],
[3. , 3.16227766, 3.60555128, 4.24264069, 2. ,
2.23606798, 2.82842712, 3.60555128, 1. , 1.41421356,
2.23606798, 3.16227766, 0. , 1. , 2. ,
3. ],
[3.16227766, 3. , 3.16227766, 3.60555128, 2.23606798,
2. , 2.23606798, 2.82842712, 1.41421356, 1. ,
1.41421356, 2.23606798, 1. , 0. , 1. ,
2. ],
[3.60555128, 3.16227766, 3. , 3.16227766, 2.
<details>
<summary>英文:</summary>
Maybe, you are looking for [`cdist`][1]:
from scipy.spatial.distance import cdist
xx = np.arange(4)
yy = np.arange(4)
grid = np.array(np.meshgrid(xx, yy)).T.reshape(-1, 2)
dist = cdist(grid, grid)
Output:
>>> dist
array([[0. , 1. , 2. , 3. , 1. ,
1.41421356, 2.23606798, 3.16227766, 2. , 2.23606798,
2.82842712, 3.60555128, 3. , 3.16227766, 3.60555128,
4.24264069],
[1. , 0. , 1. , 2. , 1.41421356,
1. , 1.41421356, 2.23606798, 2.23606798, 2. ,
2.23606798, 2.82842712, 3.16227766, 3. , 3.16227766,
3.60555128],
[2. , 1. , 0. , 1. , 2.23606798,
1.41421356, 1. , 1.41421356, 2.82842712, 2.23606798,
2. , 2.23606798, 3.60555128, 3.16227766, 3. ,
3.16227766],
[3. , 2. , 1. , 0. , 3.16227766,
2.23606798, 1.41421356, 1. , 3.60555128, 2.82842712,
2.23606798, 2. , 4.24264069, 3.60555128, 3.16227766,
3. ],
[1. , 1.41421356, 2.23606798, 3.16227766, 0. ,
1. , 2. , 3. , 1. , 1.41421356,
2.23606798, 3.16227766, 2. , 2.23606798, 2.82842712,
3.60555128],
[1.41421356, 1. , 1.41421356, 2.23606798, 1. ,
0. , 1. , 2. , 1.41421356, 1. ,
1.41421356, 2.23606798, 2.23606798, 2. , 2.23606798,
2.82842712],
[2.23606798, 1.41421356, 1. , 1.41421356, 2. ,
1. , 0. , 1. , 2.23606798, 1.41421356,
1. , 1.41421356, 2.82842712, 2.23606798, 2. ,
2.23606798],
[3.16227766, 2.23606798, 1.41421356, 1. , 3. ,
2. , 1. , 0. , 3.16227766, 2.23606798,
1.41421356, 1. , 3.60555128, 2.82842712, 2.23606798,
2. ],
[2. , 2.23606798, 2.82842712, 3.60555128, 1. ,
1.41421356, 2.23606798, 3.16227766, 0. , 1. ,
2. , 3. , 1. , 1.41421356, 2.23606798,
3.16227766],
[2.23606798, 2. , 2.23606798, 2.82842712, 1.41421356,
1. , 1.41421356, 2.23606798, 1. , 0. ,
1. , 2. , 1.41421356, 1. , 1.41421356,
2.23606798],
[2.82842712, 2.23606798, 2. , 2.23606798, 2.23606798,
1.41421356, 1. , 1.41421356, 2. , 1. ,
0. , 1. , 2.23606798, 1.41421356, 1. ,
1.41421356],
[3.60555128, 2.82842712, 2.23606798, 2. , 3.16227766,
2.23606798, 1.41421356, 1. , 3. , 2. ,
1. , 0. , 3.16227766, 2.23606798, 1.41421356,
1. ],
[3. , 3.16227766, 3.60555128, 4.24264069, 2. ,
2.23606798, 2.82842712, 3.60555128, 1. , 1.41421356,
2.23606798, 3.16227766, 0. , 1. , 2. ,
3. ],
[3.16227766, 3. , 3.16227766, 3.60555128, 2.23606798,
2. , 2.23606798, 2.82842712, 1.41421356, 1. ,
1.41421356, 2.23606798, 1. , 0. , 1. ,
2. ],
[3.60555128, 3.16227766, 3. , 3.16227766, 2.82842712,
2.23606798, 2. , 2.23606798, 2.23606798, 1.41421356,
1. , 1.41421356, 2. , 1. , 0. ,
1. ],
[4.24264069, 3.60555128, 3.16227766, 3. , 3.60555128,
2.82842712, 2.23606798, 2. , 3.16227766, 2.23606798,
1.41421356, 1. , 3. , 2. , 1. ,
0. ]])
[1]: https://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.cdist.html
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论