2023年5月22日 19:59:00go评论91阅读模式

英文:

Python (pandas) - check if value in one df is between ANY pair in another (unequal) df

问题

以下是您要翻译的内容：

作为一个最简单的例子，考虑以下两个数据框（注意它们的大小不相等）：

df
   min_val  max_val
0        0        4
1        5        9
2       10       14
3       15       19 
4       20       24
5       25       29
df1
   val
0    1
1    6
2    2
3   Nan
4    34

我正在尝试检查df1中的每个值是否可以在df中的任何一对中找到。输出应该是一个新的数据框，其中包含df1的val列，以及它所在的一对，再加上一个额外的列，名字可以叫做'within'和'not within'。因此，输出应该如下所示：

   val   min_val  max_val  nameTag
0   1      0        4       within
1   6      5        9       within
2   2      0        4       within
3   Nan    Nan      Nan     not within
4   34     Nan      Nan     not within

到目前为止，我找到的任何解决方案都是逐行搜索，错过了df1中的值2，而它在df中的一对0-4中（一些对我不起作用的帖子在此处，以及在此处）。

将不适用于我的任何指针/建议/解决方案将不胜感激。谢谢。

英文:

As a minimal example consider the following two df (notice their sizes are not equal):

df
   min_val  max_val
0        0        4
1        5        9
2       10       14
3       15       19 
4       20       24
5       25       29
df1
   val
0    1
1    6
2    2
3   Nan
4    34

I am trying to check whether each value in df1 can be found within any pair in df. The output should be a new dataframe that will contain the val column of df1 plus the pair within which it was found plus an extra column with a name tag let's say 'within' and 'not within'. So the output should look like:

   val   min_val  max_val  nameTag
0   1      0        4       within
1   6      5        9       within
2   2      0        4       within
3   Nan    Nan      Nan     not within
4   34     Nan      Nan     not within

So far, any solutions I have found do the searches line-by-line missing the val 2 in df1 which is within the pair 0-4 in df (some posts that did not work for me HERE, and HERE).

Any pointers/advice/solutions will be much appreciated.
Thanks

答案1

得分: 3

我将使用merge_asof函数：

tmp = pd.merge_asof(df1.reset_index().sort_values(by='val').dropna(),
                    df.sort_values(by='min_val').astype(float),
                    left_on='val', right_on='min_val'
                   ).set_index('index').reindex(df1.index)
df1['nameTag'] = np.where(tmp['val'].le(tmp['max_val']), 'within', 'not within')

或者使用IntervalIndex：

s = pd.Series('within', pd.IntervalIndex.from_arrays(df['min_val'], df['max_val']))
df1['nameTag'] = s.reindex(df1['val']).fillna('no within').to_numpy()

输出：

    val     nameTag
0   1.0      within
1   6.0      within
2   2.0      within
3   NaN  not within
4  34.0  not within

英文:

I would use a merge_asof:

tmp = pd.merge_asof(df1.reset_index().sort_values(by=&#39;val&#39;).dropna(),
                    df.sort_values(by=&#39;min_val&#39;).astype(float),
                    left_on=&#39;val&#39;, right_on=&#39;min_val&#39;
                   ).set_index(&#39;index&#39;).reindex(df1.index)
df1[&#39;nameTag&#39;] = np.where(tmp[&#39;val&#39;].le(tmp[&#39;max_val&#39;]), &#39;within&#39;, &#39;not within&#39;)

Or an IntervalIndex:

s = pd.Series(&#39;within&#39;, pd.IntervalIndex.from_arrays(df[&#39;min_val&#39;], df[&#39;max_val&#39;]))
df1[&#39;nameTag&#39;] =s.reindex(df1[&#39;val&#39;]).fillna(&#39;no within&#39;).to_numpy()

Output:

    val     nameTag
0   1.0      within
1   6.0      within
2   2.0      within
3   NaN  not within
4  34.0  not within

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python (pandas) – check if value in one df is between ANY pair in another (unequal) df

问题

答案1

How to find 2 integers that can form the numerical values of a list and structure the answers in another list?

如何在Python 3.11中判断函数是否已运行？

Pandas基于月份和年份比较数值。

ModuleNotFoundError with my own modules

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。