如何使用Python 2的数据框按(笛卡尔)坐标进行交叉匹配?

huangapple go评论72阅读模式
英文:

How to cross match with python 2 dataframes by (Cartesian) coordinates?

问题

我有2个天文目录,包含了宇宙中的星系及其相应的天空坐标(赤经,赤纬)。我将这些目录处理为数据框。这些目录来自不同的观测调查,其中一些星系出现在两个目录中。我想交叉匹配这些星系并将它们放入新的目录。我该如何用Python做到这一点?我认为可能有一些简单的方法可以用numpy、pandas、astropy或其他包来实现,但我找不到解决方案。谢谢

英文:

I have 2 astronomical catalogues, containing galaxies with their respective sky coordinates (ra, dec). I handle the catalogues as data frames. The catalogs are from different observational surveys and there are some galaxies that appear in both catalogs. I want to cross match these galaxies and put them in a new catalog. How can I do this is with python? I taught there should be some easy way with numpy, pandas, astropy or another package, but I couldn't find a solution? Thx

答案1

得分: 0

以下是您要翻译的内容:

经过大量研究,我发现使用一个叫做 astroml 的软件包是最简单的方法,这里有一个 教程。我在以下的笔记本中使用了它:cross_math_data_and_colour_cuts_.ipynbPS_data_cleaning_and_processing.ipynb

from astroML.crossmatch import crossmatch_angular
# 如果你使用 Google Colab,请先运行这一行命令:"!pip install astroml"

df_1 = pd.read_csv('catalog_1.csv')
df_2 = pd.read_csv('catalog_2.csv')

# 对目录进行交叉匹配
max_radius = 1. / 3600  # 1 弧秒
# 注意,为了使下面的代码正常运行,目录的前两列应该是 ra 和 dec
# 此外,df_1 应该是两个目录中较长的一个,否则会出现索引错误
dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
match = ~np.isinf(dist)
# 那么所需的解决方案是:
df_crossed = df_1[match]

# 或者:
# ind 包含与第二个目录相匹配的星系的索引,
# 当没有匹配时,索引的值等于第一个目录的长度
# 所以,如果你必须使用第二个目录的索引而不是第一个目录的索引,请执行以下操作:
df_2['new_var'] = [df_2.old_var[i] if i<len(df_2) else -999 for i in ind]
# 这样,每当你有一个匹配时,'new_var' 将包含来自 'old_var' 的正确值
# 每当你有一个不匹配时,它将包含 -999 作为标志
英文:

After a lot of research the easiest way I have found is by using a package called astroml, here a tutorial. Notebooks I have used it in are called cross_math_data_and_colour_cuts_.ipynb and PS_data_cleaning_and_processing.ipynb.

from astroML.crossmatch import crossmatch_angular
# if you are using google colab use first the line &quot;!pip install astroml&quot;

df_1 = pd.read_csv(&#39;catalog_1.csv&#39;)
df_2 = pd.read_csv(&#39;catalog_2.csv&#39;)

# crossmatch catalogs
max_radius = 1. / 3600  # 1 arcsec
# note, that for the below to work the first 2 columns of the catalogs should be ra, dec
# also, df_1 should be the longer of the 2 catalogs, else there will be index errors
dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
match = ~np.isinf(dist)
# THE DESIRED SOLUTION IS THEN:
df_crossed = df_1[match]


# ALTERNATIVELY:
# ind contains the indices of the cross-matched galaxies in respect to the second catalog,
# when there is no match it the kind value is the length of the first catalog
# so if you necessarily have to work with the indices of the second catalog, instead of the first, do:
df_2[&#39;new_var&#39;] = [df_2.old_var[i] if i&lt;len(df_2) else -999 for i in mind]
# that way whenever you have a match &#39;new_var&#39; will contain the correct value from &#39;old_var&#39;
# and whenever you have a mismatch it will contain -999 as a flag

huangapple
  • 本文由 发表于 2023年5月21日 22:09:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/76300306.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定