英文:
Finding shortest distance of every point to a non-straight line in Python
问题
我已经创建了类似于这个的图形:
我的目标是计算每个蓝色点到红色线上任意一点的最短距离。理想情况下,这可以用于选择最接近x%的点或那些落在某个特定距离内的点,但这里的主要问题是首先计算每个距离。
这些点是从数据文件中提取并绘制的:
data = np.loadtxt('gr.dat')
...
ax.scatter(data[:,0],data[:,1])
而红色线是一个计算出的Baraffe轨迹,用于创建该线的所有点都存储在一个数据文件中,通过以下方式绘制:
df=pd.read_csv('baraffe.dat', sep="\s+", names= ['mass', 'age', 'g', 'r', 'i'])
df2 = pd.DataFrame(df, columns=["mass", "age", "g", "r", "i"])
df2['b_color'] = df2['g'] - df2['r']
df2.plot(ax=ax, x='b_color',y='g', color="r")
...
基本上,我想要计算每个点在x和y方向上需要移动的最小距离,以达到红色线上的任何一点。
我尝试模仿这里的答案,但我不确定如何将该定义应用于数据框或更大的数组,总是得到TypeError。如果有任何见解,我将不胜感激,谢谢!
英文:
I have created figures similar to this one here:
My goal here is to take each blue point and calculate the shortest distance it would take to get to any point on the red line. Ideally, this could be used to select the x% closest points or those falling within a certain distance, but the primary issue here is calculating each distance in the first place.
The points were taken from a data file and plotted as such:
> data = np.loadtxt('gr.dat')
> ...
> ax.scatter(data[:,0],data[:,1])
whereas the red line is a calculated Baraffe track where all points used to create the line were stored in a dat file and plotted via:
df=pd.read_csv('baraffe.dat', sep="\s+", names= ['mass', 'age', 'g', 'r', 'i'])
df2 = pd.DataFrame(df, columns=["mass", "age", "g", "r", "i"])
df2['b_color'] = df2['g'] - df2['r']
df2.plot(ax=ax, x='b_color',y='g', color="r")
...`
This is my first attempt at using pandas so I know my code could definitely be optimized and is likely redundant, but it does output the figure attached.
Essentially, I want to calculate the smallest distance each dot would have to move (in both x and y) to reach any point on the red line.
I did try and mimic the answer in (here) but I am unsure how to apply that definition to a dataframe or larger array without always getting a TypeError. If there is any insight to this I would greatly appreciate it, and thank you!
答案1
得分: 1
一旦你在Baraffe轨迹上的点上构建了KDTree,你可以使用KDTree实例的不同方法来计算你感兴趣的所有数量。
在这里,为了简单起见,我只展示了如何使用query
方法来建立最近邻点之间的一对一对应关系。
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import KDTree
np.random.seed(20230307)
x = np.linspace(0, 10, 51)
y = np.sin(x) * 0.7
x, y = +x * 0.6 + y * 0.8, -0.8 * x + 0.6 * y
xp = np.linspace(1, 9, 21)
yp = -1 + np.random.rand(21) * 0.4
xp, yp = +xp * 0.6 + yp * 0.8, -0.8 * xp + 0.6 * yp
kdt = KDTree(np.vstack((x, y)).T) # 被索引的数组必须是N×2
distances, indices = kdt.query(np.vstack((xp, yp)).T, k=1)
fig, ax = plt.subplots()
ax.set纵横比(1)
ax.plot(x, y, color='k', lw=0.8)
ax.scatter(xp, yp, color='r')
for x0, y0, i in zip(xp, yp, indices):
plt.plot((x0, x[i]), (y0, y[i]), color='g', lw=0.5)
plt.show()
英文:
Use scipy.spatial.KDTree
.
Once you have built the KDTree on the points of the Baraffe track, you can use the different methods of the KDTree instance to compute all the quantities that are interesting you.
Here, for simplicity, I have just shown how to use the query
method to build a 1—1 correspondence between most-neighboring points.
import numpy as np
import matplotlib.pyplot as plt
from scipy.spatial import KDTree
np.random.seed(20230307)
x = np.linspace(0, 10, 51)
y = np.sin(x)*0.7
x, y = +x*0.6+y*0.8, -0.8*x+0.6*y
xp = np.linspace(1, 9, 21)
yp = -1+np.random.rand(21)*0.4
xp, yp = +xp*0.6+yp*0.8, -0.8*xp+0.6*yp
kdt = KDTree(np.vstack((x, y)).T) # the array that is indexed must be N×2
distances, indices = kdt.query(np.vstack((xp, yp)).T, k=1)
fig, ax = plt.subplots()
ax.set_aspect(1)
ax.plot(x, y, color='k', lw=0.8)
ax.scatter(xp, yp, color='r')
for x0, y0, i in zip(xp, yp, indices):
plt.plot((x0, x[i]), (y0, y[i]), color='g', lw=0.5)
plt.show()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论