向Pandas数据帧添加行,其中所有值均低于某一列的最大值。

huangapple go评论97阅读模式
英文:

Adding Rows to Pandas Dataframe where for all values below the max value of a Column

问题

请注意以下代码的翻译:

  1. df = pd.DataFrame({"names": ["foo", "boo", "coo","coo"],"time": [1,4,2,3],"values": [20,10,15,12]})

我想要为每个名称在1和时间列的最大值之间插入所有可能的时间行。所以期望的数据框如下:

  1. df = pd.DataFrame({"names": ["foo","boo","boo", "boo","boo","coo","coo","coo"],"time": [1,1,2,3,4,1,2,3],"values": [20,NaN,NaN,NaN,10,NaN,15,12]})

如何操作?

英文:

Let's consider the dataframe below -

  1. df = pd.DataFrame({"names": ["foo", "boo", "coo","coo"],"time": [1,4,2,3],"values": [20,10,15,12]})

I want to insert rows for all possible time between 1 and maximum of time column for each name.
So the desired dataframe would be -

  1. df = pd.DataFrame({"names": ["foo","boo","boo", "boo","boo","coo","coo","coo"],"time": [1,1,2,3,4,1,2,3],"values": [20,NaN,NaN,NaN,10,NaN,15,12]})

How to do it?

答案1

得分: 0

使用 GroupBy.apply 中的自定义函数,结合 Series.reindex,通过 range

  1. out = (df.set_index('time')
  2. .groupby('names', sort=False)['values']
  3. .apply(lambda x: x.reindex(range(1, x.index.max()+1)))
  4. .reset_index())
  5. print(out)
  6. names time values
  7. 0 foo 1 20.0
  8. 1 boo 1 NaN
  9. 2 boo 2 NaN
  10. 3 boo 3 NaN
  11. 4 boo 4 10.0
  12. 5 coo 1 NaN
  13. 6 coo 2 15.0
  14. 7 coo 3 12.0
英文:

Use custom function in GroupBy.apply with Series.reindex by range:

  1. out = (df.set_index('time')
  2. .groupby('names', sort=False)['values']
  3. .apply(lambda x: x.reindex(range(1, x.index.max()+1)))
  4. .reset_index())
  5. print (out)
  6. names time values
  7. 0 foo 1 20.0
  8. 1 boo 1 NaN
  9. 2 boo 2 NaN
  10. 3 boo 3 NaN
  11. 4 boo 4 10.0
  12. 5 coo 1 NaN
  13. 6 coo 2 15.0
  14. 7 coo 3 12.0

huangapple
  • 本文由 发表于 2023年7月13日 11:17:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76675663.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定