英文:
Pad middle values based on previouse and next values
问题
# 填充缺失的中间值以使数据框看起来像这样
import pandas as pd
L = [0, 3, 5, 7, 9]
L2 = ['Repeat1', 'Repeat2', 'Repeat3', 'Repeat4', 'Repeat5']
df = pd.DataFrame({'col': L})
df['col2'] = L2
# 生成连续的数字序列
all_values = list(range(df['col'].min(), df['col'].max() + 1))
# 重新索引数据框以包含所有值
df = df.reindex(all_values).ffill().reset_index(drop=True)
df.columns = ['col', 'col2']
print(df)
英文:
Let's say I've df Like this
col col2
0 0 Repeat1
1 3 Repeat2
2 5 Repeat3
3 7 Repeat4
4 9 Repeat5
Reproducable
L= [0,3,5,7,9]
L2 = ['Repeat1','Repeat2','Repeat3','Repeat4','Repeat5']
import pandas as pd
df = pd.DataFrame({'col':L})
df['col2']= L2
print (df)
How can fill missing intermidaite values such that my df will looks like this
col col2
0 0 Repeat1
1 1 Repeat1
2 2 Repeat1
3 3 Repeat2
4 4 Repeat2
5 5 Repeat3
6 6 Repeat3
7 7 Repeat4
8 8 Repeat4
9 9 Repeat5
Similar threads I've tried
https://stackoverflow.com/questions/37821653/filling-missing-middle-values-in-pandas-dataframe (Filling Nan values for intermediate values but I don't need Nan)
https://stackoverflow.com/questions/28798076/fill-pandas-dataframe-with-values-in-between (Very Big approch. I'm looking any functional appraoch)
Both cases helped me some extent But i was wondering is any ways to do it?
答案1
得分: 3
输出:
您可以使用"col"作为临时索引进行重新索引(reindex)和前向填充(ffill):
out = (df.set_index('col')
.reindex(range(df['col'].max()+1))
.ffill()
.reset_index()
)
输出:
col col2
0 0 Repeat1
1 1 Repeat1
2 2 Repeat1
3 3 Repeat2
4 4 Repeat2
5 5 Repeat3
6 6 Repeat3
7 7 Repeat4
8 8 Repeat4
9 9 Repeat5
英文:
You can reindex
and ffill
with "col" as temporary index:
out = (df.set_index('col')
.reindex(range(df['col'].max()+1))
.ffill()
.reset_index()
)
Output:
col col2
0 0 Repeat1
1 1 Repeat1
2 2 Repeat1
3 3 Repeat2
4 4 Repeat2
5 5 Repeat3
6 6 Repeat3
7 7 Repeat4
8 8 Repeat4
9 9 Repeat5
答案2
得分: 1
你也可以使用 merge
和 ffill
(df.merge(pd.DataFrame({'col': range(df['col'].max()+1)}), how='right')
.ffill()
)
输出:
col col2
0 0 Repeat1
1 1 Repeat1
2 2 Repeat1
3 3 Repeat2
4 4 Repeat2
5 5 Repeat3
6 6 Repeat3
7 7 Repeat4
8 8 Repeat4
9 9 Repeat5
英文:
You can also merge
and ffill
(df.merge(pd.DataFrame({'col': range(df['col'].max()+1)}), how='right')
.ffill()
)
Output:
col col2
0 0 Repeat1
1 1 Repeat1
2 2 Repeat1
3 3 Repeat2
4 4 Repeat2
5 5 Repeat3
6 6 Repeat3
7 7 Repeat4
8 8 Repeat4
9 9 Repeat5
答案3
得分: 1
另一个可能的解决方案,基于pandas.concat
:
pd.concat([pd.DataFrame({'col': range(df['col'].max()+1)}),
df.set_index('col')], axis=1).ffill()
或者,另一种方法:
(pd.concat([df, pd.DataFrame(
{'col': list(set(range(1, df.col.max()+1)).difference(df.col))})])
.sort_values('col').ffill().reset_index(drop=True))
输出:
col col2
0 0 Repeat1
1 1 Repeat1
2 2 Repeat1
3 3 Repeat2
4 4 Repeat2
5 5 Repeat3
6 6 Repeat3
7 7 Repeat4
8 8 Repeat4
9 9 Repeat5
英文:
Another possible solution, which is based on pandas.concat
:
pd.concat([pd.DataFrame({'col': range(df['col'].max()+1)}),
df.set_index('col')], axis=1).ffill()
Or, alternatively:
(pd.concat([df, pd.DataFrame(
{'col': list(set(range(1, df.col.max()+1)).difference(df.col))})])
.sort_values('col').ffill().reset_index(drop=True))
Output:
col col2
0 0 Repeat1
1 1 Repeat1
2 2 Repeat1
3 3 Repeat2
4 4 Repeat2
5 5 Repeat3
6 6 Repeat3
7 7 Repeat4
8 8 Repeat4
9 9 Repeat5
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论