英文:
Python (pandas) - check if value in one df is between ANY pair in another (unequal) df
问题
以下是您要翻译的内容:
作为一个最简单的例子,考虑以下两个数据框(注意它们的大小不相等):
df
min_val max_val
0 0 4
1 5 9
2 10 14
3 15 19
4 20 24
5 25 29
df1
val
0 1
1 6
2 2
3 Nan
4 34
我正在尝试检查df1中的每个值是否可以在df中的任何一对中找到。输出应该是一个新的数据框,其中包含df1的val列,以及它所在的一对,再加上一个额外的列,名字可以叫做'within'和'not within'。因此,输出应该如下所示:
val min_val max_val nameTag
0 1 0 4 within
1 6 5 9 within
2 2 0 4 within
3 Nan Nan Nan not within
4 34 Nan Nan not within
到目前为止,我找到的任何解决方案都是逐行搜索,错过了df1中的值2,而它在df中的一对0-4中(一些对我不起作用的帖子在此处,以及在此处)。
将不适用于我的任何指针/建议/解决方案将不胜感激。谢谢。
英文:
As a minimal example consider the following two df (notice their sizes are not equal):
df
min_val max_val
0 0 4
1 5 9
2 10 14
3 15 19
4 20 24
5 25 29
df1
val
0 1
1 6
2 2
3 Nan
4 34
I am trying to check whether each value in df1 can be found within any pair in df. The output should be a new dataframe that will contain the val column of df1 plus the pair within which it was found plus an extra column with a name tag let's say 'within' and 'not within'. So the output should look like:
val min_val max_val nameTag
0 1 0 4 within
1 6 5 9 within
2 2 0 4 within
3 Nan Nan Nan not within
4 34 Nan Nan not within
So far, any solutions I have found do the searches line-by-line missing the val 2 in df1 which is within the pair 0-4 in df (some posts that did not work for me HERE, and HERE).
Any pointers/advice/solutions will be much appreciated.
Thanks
答案1
得分: 3
我将使用merge_asof
函数:
tmp = pd.merge_asof(df1.reset_index().sort_values(by='val').dropna(),
df.sort_values(by='min_val').astype(float),
left_on='val', right_on='min_val'
).set_index('index').reindex(df1.index)
df1['nameTag'] = np.where(tmp['val'].le(tmp['max_val']), 'within', 'not within')
或者使用IntervalIndex
:
s = pd.Series('within', pd.IntervalIndex.from_arrays(df['min_val'], df['max_val']))
df1['nameTag'] = s.reindex(df1['val']).fillna('no within').to_numpy()
输出:
val nameTag
0 1.0 within
1 6.0 within
2 2.0 within
3 NaN not within
4 34.0 not within
英文:
I would use a merge_asof
:
tmp = pd.merge_asof(df1.reset_index().sort_values(by='val').dropna(),
df.sort_values(by='min_val').astype(float),
left_on='val', right_on='min_val'
).set_index('index').reindex(df1.index)
df1['nameTag'] = np.where(tmp['val'].le(tmp['max_val']), 'within', 'not within')
Or an IntervalIndex
:
s = pd.Series('within', pd.IntervalIndex.from_arrays(df['min_val'], df['max_val']))
df1['nameTag'] =s.reindex(df1['val']).fillna('no within').to_numpy()
Output:
val nameTag
0 1.0 within
1 6.0 within
2 2.0 within
3 NaN not within
4 34.0 not within
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论