英文:
Adding Rows to Pandas Dataframe where for all values below the max value of a Column
问题
请注意以下代码的翻译:
df = pd.DataFrame({"names": ["foo", "boo", "coo","coo"],"time": [1,4,2,3],"values": [20,10,15,12]})
我想要为每个名称在1和时间列的最大值之间插入所有可能的时间行。所以期望的数据框如下:
df = pd.DataFrame({"names": ["foo","boo","boo", "boo","boo","coo","coo","coo"],"time": [1,1,2,3,4,1,2,3],"values": [20,NaN,NaN,NaN,10,NaN,15,12]})
如何操作?
英文:
Let's consider the dataframe below -
df = pd.DataFrame({"names": ["foo", "boo", "coo","coo"],"time": [1,4,2,3],"values": [20,10,15,12]})
I want to insert rows for all possible time between 1 and maximum of time column for each name.
So the desired dataframe would be -
df = pd.DataFrame({"names": ["foo","boo","boo", "boo","boo","coo","coo","coo"],"time": [1,1,2,3,4,1,2,3],"values": [20,NaN,NaN,NaN,10,NaN,15,12]})
How to do it?
答案1
得分: 0
使用 GroupBy.apply
中的自定义函数,结合 Series.reindex
,通过 range
:
out = (df.set_index('time')
.groupby('names', sort=False)['values']
.apply(lambda x: x.reindex(range(1, x.index.max()+1)))
.reset_index())
print(out)
names time values
0 foo 1 20.0
1 boo 1 NaN
2 boo 2 NaN
3 boo 3 NaN
4 boo 4 10.0
5 coo 1 NaN
6 coo 2 15.0
7 coo 3 12.0
英文:
Use custom function in GroupBy.apply
with Series.reindex
by range
:
out = (df.set_index('time')
.groupby('names', sort=False)['values']
.apply(lambda x: x.reindex(range(1, x.index.max()+1)))
.reset_index())
print (out)
names time values
0 foo 1 20.0
1 boo 1 NaN
2 boo 2 NaN
3 boo 3 NaN
4 boo 4 10.0
5 coo 1 NaN
6 coo 2 15.0
7 coo 3 12.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论