使用Pandas中的`loc`方法忽略列表中的NaN元素。

huangapple go评论95阅读模式
英文:

Ignore nan elements in a list using loc pandas

问题

我有2个不同的数据框:df1,df2

df1:
索引 a
0 10
1 2
2 3
3 1
4 7
5 6

df2:
索引 a
0 1
1 2
2 4
3 3
4 20
5 5

我想在df1中找到具有特定回溯的最大值的索引(在此示例中,假设回溯=3)。为此,我使用以下代码:

tdf['a'] = df1.rolling(lookback).apply(lambda x: x.idxmax())

结果将是:

id a
0 nan
1 nan
2 0
3 2
4 4
5 4

现在我需要将idxmax()在tdf['b']中找到的每个索引中的值保存在df2中。

因此,如果tdf['a'].iloc[3] == 2,我希望tdf['b'].iloc[3] == df2.iloc[2]。我期望最终结果如下:

id b
0 nan
1 nan
2 1
3 4
4 20
5 20

我猜想可以使用.loc()函数来实现这一点,就像这样:

tdf['b'] = df2.loc[tdf['a']]

但它会引发异常,因为tdf['a']中有nan值。如果在将tdf['a']传递给.loc()函数之前使用dropna(),那么索引就会混乱(例如,在tdf['b']中,索引0必须是nan,但在dropna()之后它将有一个值)。

有没有办法获得我想要的结果?

英文:

I have 2 different dataframes: df1, df2

  1. df1:
  2. index a
  3. 0 10
  4. 1 2
  5. 2 3
  6. 3 1
  7. 4 7
  8. 5 6
  9. df2:
  10. index a
  11. 0 1
  12. 1 2
  13. 2 4
  14. 3 3
  15. 4 20
  16. 5 5

I want to find the index of maximum values with a specific lookback in df1 (let's consider lookback=3 in this example). To do this, I use the following code:

  1. tdf['a'] = df1.rolling(lookback).apply(lambda x: x.idxmax())

And the result would be:

  1. id a
  2. 0 nan
  3. 1 nan
  4. 2 0
  5. 3 2
  6. 4 4
  7. 5 4

Now I need to save the values in df2 for each index found by idxmax() in tdf['b']

So if tdf['a'].iloc[3] == 2, I want tdf['b'].iloc[3] == df2.iloc[2]. I expect the final result to be like this:

  1. id b
  2. 0 nan
  3. 1 nan
  4. 2 1
  5. 3 4
  6. 4 20
  7. 5 20

I'm guessing that I can do this using .loc() function like this:

  1. tdf['b'] = df2.loc[tdf['a']]

But it throws an exception because there are nan values in tdf['a']. If I use dropna() before passing tdf['a'] to the .loc() function, then the indices get messed up (for example in tdf['b'], index 0 has to be nan but it'll have a value after dropna()).

Is there any way to get what I want?

答案1

得分: 1

只需使用 map 方法:

  1. lookback = 3
  2. s = df1['a'].rolling(lookback).apply(lambda x: x.idxmax())
  3. s.map(df2['a'])

输出:

  1. 0 NaN
  2. 1 NaN
  3. 2 1.0
  4. 3 4.0
  5. 4 20.0
  6. 5 20.0
  7. Name: a, dtype: float64
英文:

Simply use a map:

  1. lookback = 3
  2. s = df1['a'].rolling(lookback).apply(lambda x: x.idxmax())
  3. s.map(df2['a'])

Output:

  1. 0 NaN
  2. 1 NaN
  3. 2 1.0
  4. 3 4.0
  5. 4 20.0
  6. 5 20.0
  7. Name: a, dtype: float64

huangapple
  • 本文由 发表于 2023年2月8日 19:47:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75385365.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定