2023年2月8日 19:47:22go评论95阅读模式

英文:

Ignore nan elements in a list using loc pandas

问题

我有2个不同的数据框：df1，df2

df1：
索引 a
0 10
1 2
2 3
3 1
4 7
5 6

df2：
索引 a
0 1
1 2
2 4
3 3
4 20
5 5

我想在df1中找到具有特定回溯的最大值的索引（在此示例中，假设回溯=3）。为此，我使用以下代码：

tdf['a'] = df1.rolling(lookback).apply(lambda x: x.idxmax())

结果将是：

id a
0 nan
1 nan
2 0
3 2
4 4
5 4

现在我需要将idxmax()在tdf['b']中找到的每个索引中的值保存在df2中。

因此，如果tdf['a'].iloc[3] == 2，我希望tdf['b'].iloc[3] == df2.iloc[2]。我期望最终结果如下：

id b
0 nan
1 nan
2 1
3 4
4 20
5 20

我猜想可以使用.loc()函数来实现这一点，就像这样：

tdf['b'] = df2.loc[tdf['a']]

但它会引发异常，因为tdf['a']中有nan值。如果在将tdf['a']传递给.loc()函数之前使用dropna()，那么索引就会混乱（例如，在tdf['b']中，索引0必须是nan，但在dropna()之后它将有一个值）。

有没有办法获得我想要的结果？

英文:

I have 2 different dataframes: df1, df2

I want to find the index of maximum values with a specific lookback in df1 (let's consider lookback=3 in this example). To do this, I use the following code:

tdf[&#39;a&#39;] = df1.rolling(lookback).apply(lambda x: x.idxmax())

And the result would be:

Now I need to save the values in df2 for each index found by idxmax() in tdf['b']

So if tdf['a'].iloc[3] == 2, I want tdf['b'].iloc[3] == df2.iloc[2]. I expect the final result to be like this:

I'm guessing that I can do this using .loc() function like this:

tdf[&#39;b&#39;] = df2.loc[tdf[&#39;a&#39;]]

But it throws an exception because there are nan values in tdf['a']. If I use dropna() before passing tdf['a'] to the .loc() function, then the indices get messed up (for example in tdf['b'], index 0 has to be nan but it'll have a value after dropna()).

Is there any way to get what I want?

答案1

得分: 1

只需使用 map 方法：

lookback = 3
s = df1['a'].rolling(lookback).apply(lambda x: x.idxmax())
s.map(df2['a'])

输出：

0     NaN
1     NaN
2     1.0
3     4.0
4    20.0
5    20.0
Name: a, dtype: float64

英文:

Simply use a map:

lookback = 3
s = df1[&#39;a&#39;].rolling(lookback).apply(lambda x: x.idxmax())
s.map(df2[&#39;a&#39;])

Output:

0     NaN
1     NaN
2     1.0
3     4.0
4    20.0
5    20.0
Name: a, dtype: float64

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用Pandas中的`loc`方法忽略列表中的NaN元素。

问题

答案1

在pandas中按1小时重新采样DataFrame会产生意外的NaN值。

Python嵌套循环: 错误的输出

使用pandas将单元格的值替换为下方单元格中的值

生成JSON文件内容

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。