Python (pandas) – check if value in one df is between ANY pair in another (unequal) df

huangapple go评论63阅读模式
英文:

Python (pandas) - check if value in one df is between ANY pair in another (unequal) df

问题

以下是您要翻译的内容:

作为一个最简单的例子,考虑以下两个数据框(注意它们的大小不相等):

df
   min_val  max_val
0        0        4
1        5        9
2       10       14
3       15       19 
4       20       24
5       25       29

df1
   val
0    1
1    6
2    2
3   Nan
4    34

我正在尝试检查df1中的每个值是否可以在df中的任何一对中找到。输出应该是一个新的数据框,其中包含df1的val列,以及它所在的一对,再加上一个额外的列,名字可以叫做'within'和'not within'。因此,输出应该如下所示:

   val   min_val  max_val  nameTag
0   1      0        4       within
1   6      5        9       within
2   2      0        4       within
3   Nan    Nan      Nan     not within
4   34     Nan      Nan     not within

到目前为止,我找到的任何解决方案都是逐行搜索,错过了df1中的值2,而它在df中的一对0-4中(一些对我不起作用的帖子在此处,以及在此处)。

将不适用于我的任何指针/建议/解决方案将不胜感激。谢谢。

英文:

As a minimal example consider the following two df (notice their sizes are not equal):

df
   min_val  max_val
0        0        4
1        5        9
2       10       14
3       15       19 
4       20       24
5       25       29

df1
   val
0    1
1    6
2    2
3   Nan
4    34

I am trying to check whether each value in df1 can be found within any pair in df. The output should be a new dataframe that will contain the val column of df1 plus the pair within which it was found plus an extra column with a name tag let's say 'within' and 'not within'. So the output should look like:

   val   min_val  max_val  nameTag
0   1      0        4       within
1   6      5        9       within
2   2      0        4       within
3   Nan    Nan      Nan     not within
4   34     Nan      Nan     not within

So far, any solutions I have found do the searches line-by-line missing the val 2 in df1 which is within the pair 0-4 in df (some posts that did not work for me HERE, and HERE).

Any pointers/advice/solutions will be much appreciated.
Thanks

答案1

得分: 3

我将使用merge_asof函数:

tmp = pd.merge_asof(df1.reset_index().sort_values(by='val').dropna(),
                    df.sort_values(by='min_val').astype(float),
                    left_on='val', right_on='min_val'
                   ).set_index('index').reindex(df1.index)

df1['nameTag'] = np.where(tmp['val'].le(tmp['max_val']), 'within', 'not within')

或者使用IntervalIndex

s = pd.Series('within', pd.IntervalIndex.from_arrays(df['min_val'], df['max_val']))

df1['nameTag'] = s.reindex(df1['val']).fillna('no within').to_numpy()

输出:

    val     nameTag
0   1.0      within
1   6.0      within
2   2.0      within
3   NaN  not within
4  34.0  not within
英文:

I would use a merge_asof:

tmp = pd.merge_asof(df1.reset_index().sort_values(by='val').dropna(),
                    df.sort_values(by='min_val').astype(float),
                    left_on='val', right_on='min_val'
                   ).set_index('index').reindex(df1.index)

df1['nameTag'] = np.where(tmp['val'].le(tmp['max_val']), 'within', 'not within')

Or an IntervalIndex:

s = pd.Series('within', pd.IntervalIndex.from_arrays(df['min_val'], df['max_val']))

df1['nameTag'] =s.reindex(df1['val']).fillna('no within').to_numpy()

Output:

    val     nameTag
0   1.0      within
1   6.0      within
2   2.0      within
3   NaN  not within
4  34.0  not within

huangapple
  • 本文由 发表于 2023年5月22日 19:59:00
  • 转载请务必保留本文链接:https://go.coder-hub.com/76305949.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定