英文:
How to cross match with python 2 dataframes by (Cartesian) coordinates?
问题
我有2个天文目录,包含了宇宙中的星系及其相应的天空坐标(赤经,赤纬)。我将这些目录处理为数据框。这些目录来自不同的观测调查,其中一些星系出现在两个目录中。我想交叉匹配这些星系并将它们放入新的目录。我该如何用Python做到这一点?我认为可能有一些简单的方法可以用numpy、pandas、astropy或其他包来实现,但我找不到解决方案。谢谢
英文:
I have 2 astronomical catalogues, containing galaxies with their respective sky coordinates (ra, dec). I handle the catalogues as data frames. The catalogs are from different observational surveys and there are some galaxies that appear in both catalogs. I want to cross match these galaxies and put them in a new catalog. How can I do this is with python? I taught there should be some easy way with numpy, pandas, astropy or another package, but I couldn't find a solution? Thx
答案1
得分: 0
以下是您要翻译的内容:
经过大量研究,我发现使用一个叫做 astroml
的软件包是最简单的方法,这里有一个 教程。我在以下的笔记本中使用了它:cross_math_data_and_colour_cuts_.ipynb
和 PS_data_cleaning_and_processing.ipynb
。
from astroML.crossmatch import crossmatch_angular
# 如果你使用 Google Colab,请先运行这一行命令:"!pip install astroml"
df_1 = pd.read_csv('catalog_1.csv')
df_2 = pd.read_csv('catalog_2.csv')
# 对目录进行交叉匹配
max_radius = 1. / 3600 # 1 弧秒
# 注意,为了使下面的代码正常运行,目录的前两列应该是 ra 和 dec
# 此外,df_1 应该是两个目录中较长的一个,否则会出现索引错误
dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
match = ~np.isinf(dist)
# 那么所需的解决方案是:
df_crossed = df_1[match]
# 或者:
# ind 包含与第二个目录相匹配的星系的索引,
# 当没有匹配时,索引的值等于第一个目录的长度
# 所以,如果你必须使用第二个目录的索引而不是第一个目录的索引,请执行以下操作:
df_2['new_var'] = [df_2.old_var[i] if i<len(df_2) else -999 for i in ind]
# 这样,每当你有一个匹配时,'new_var' 将包含来自 'old_var' 的正确值
# 每当你有一个不匹配时,它将包含 -999 作为标志
英文:
After a lot of research the easiest way I have found is by using a package called astroml
, here a tutorial. Notebooks I have used it in are called cross_math_data_and_colour_cuts_.ipynb
and PS_data_cleaning_and_processing.ipynb
.
from astroML.crossmatch import crossmatch_angular
# if you are using google colab use first the line "!pip install astroml"
df_1 = pd.read_csv('catalog_1.csv')
df_2 = pd.read_csv('catalog_2.csv')
# crossmatch catalogs
max_radius = 1. / 3600 # 1 arcsec
# note, that for the below to work the first 2 columns of the catalogs should be ra, dec
# also, df_1 should be the longer of the 2 catalogs, else there will be index errors
dist, ind = crossmatch_angular(df_1.values, df_2.values, max_radius)
match = ~np.isinf(dist)
# THE DESIRED SOLUTION IS THEN:
df_crossed = df_1[match]
# ALTERNATIVELY:
# ind contains the indices of the cross-matched galaxies in respect to the second catalog,
# when there is no match it the kind value is the length of the first catalog
# so if you necessarily have to work with the indices of the second catalog, instead of the first, do:
df_2['new_var'] = [df_2.old_var[i] if i<len(df_2) else -999 for i in mind]
# that way whenever you have a match 'new_var' will contain the correct value from 'old_var'
# and whenever you have a mismatch it will contain -999 as a flag
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论