2023年2月14日 22:37:47go评论64阅读模式

英文:

Pad middle values based on previouse and next values

问题

# 填充缺失的中间值以使数据框看起来像这样
import pandas as pd

L = [0, 3, 5, 7, 9]
L2 = ['Repeat1', 'Repeat2', 'Repeat3', 'Repeat4', 'Repeat5']

df = pd.DataFrame({'col': L})
df['col2'] = L2

# 生成连续的数字序列
all_values = list(range(df['col'].min(), df['col'].max() + 1))

# 重新索引数据框以包含所有值
df = df.reindex(all_values).ffill().reset_index(drop=True)
df.columns = ['col', 'col2']

print(df)

英文:

Let's say I've df Like this

   col     col2
0    0  Repeat1
1    3  Repeat2
2    5  Repeat3
3    7  Repeat4
4    9  Repeat5

Reproducable

L= [0,3,5,7,9]
L2 = [&#39;Repeat1&#39;,&#39;Repeat2&#39;,&#39;Repeat3&#39;,&#39;Repeat4&#39;,&#39;Repeat5&#39;]

import pandas as pd
df = pd.DataFrame({&#39;col&#39;:L})
df[&#39;col2&#39;]= L2
print (df)

How can fill missing intermidaite values such that my df will looks like this

  col     col2
0    0  Repeat1
1    1  Repeat1
2    2  Repeat1
3    3  Repeat2
4    4  Repeat2
5    5  Repeat3
6    6  Repeat3
7    7  Repeat4
8    8  Repeat4
9    9  Repeat5

Similar threads I've tried

https://stackoverflow.com/questions/37821653/filling-missing-middle-values-in-pandas-dataframe (Filling Nan values for intermediate values but I don't need Nan)

https://stackoverflow.com/questions/28798076/fill-pandas-dataframe-with-values-in-between (Very Big approch. I'm looking any functional appraoch)

Both cases helped me some extent But i was wondering is any ways to do it?

答案1

得分: 3

输出：

您可以使用"col"作为临时索引进行重新索引(reindex)和前向填充(ffill)：
out = (df.set_index('col')
         .reindex(range(df['col'].max()+1))
         .ffill()
         .reset_index()
      )

输出：

   col     col2
0    0  Repeat1
1    1  Repeat1
2    2  Repeat1
3    3  Repeat2
4    4  Repeat2
5    5  Repeat3
6    6  Repeat3
7    7  Repeat4
8    8  Repeat4
9    9  Repeat5

英文:

You can reindex and ffill with "col" as temporary index:

out = (df.set_index(&#39;col&#39;)
         .reindex(range(df[&#39;col&#39;].max()+1))
         .ffill()
         .reset_index()
      )

Output:

   col     col2
0    0  Repeat1
1    1  Repeat1
2    2  Repeat1
3    3  Repeat2
4    4  Repeat2
5    5  Repeat3
6    6  Repeat3
7    7  Repeat4
8    8  Repeat4
9    9  Repeat5

答案2

得分: 1

你也可以使用 merge 和 ffill

(df.merge(pd.DataFrame({'col': range(df['col'].max()+1)}), how='right')
   .ffill()
)

输出:

       col     col2
    0    0  Repeat1
    1    1  Repeat1
    2    2  Repeat1
    3    3  Repeat2
    4    4  Repeat2
    5    5  Repeat3
    6    6  Repeat3
    7    7  Repeat4
    8    8  Repeat4
    9    9  Repeat5

英文:

You can also merge and ffill

(df.merge(pd.DataFrame({&#39;col&#39;: range(df[&#39;col&#39;].max()+1)}), how=&#39;right&#39;)
       .ffill()
    )

Output:

   col     col2
0    0  Repeat1
1    1  Repeat1
2    2  Repeat1
3    3  Repeat2
4    4  Repeat2
5    5  Repeat3
6    6  Repeat3
7    7  Repeat4
8    8  Repeat4
9    9  Repeat5

答案3

得分: 1

另一个可能的解决方案，基于pandas.concat：

pd.concat([pd.DataFrame({'col': range(df['col'].max()+1)}),
            df.set_index('col')], axis=1).ffill()

或者，另一种方法：

(pd.concat([df, pd.DataFrame(
    {'col': list(set(range(1, df.col.max()+1)).difference(df.col))})])
 .sort_values('col').ffill().reset_index(drop=True))

输出：

   col     col2
0    0  Repeat1
1    1  Repeat1
2    2  Repeat1
3    3  Repeat2
4    4  Repeat2
5    5  Repeat3
6    6  Repeat3
7    7  Repeat4
8    8  Repeat4
9    9  Repeat5

英文:

Another possible solution, which is based on pandas.concat:

pd.concat([pd.DataFrame({&#39;col&#39;: range(df[&#39;col&#39;].max()+1)}),
            df.set_index(&#39;col&#39;)], axis=1).ffill()

Or, alternatively:

(pd.concat([df, pd.DataFrame(
    {&#39;col&#39;: list(set(range(1, df.col.max()+1)).difference(df.col))})])
 .sort_values(&#39;col&#39;).ffill().reset_index(drop=True))

Output:

   col     col2
0    0  Repeat1
1    1  Repeat1
2    2  Repeat1
3    3  Repeat2
4    4  Repeat2
5    5  Repeat3
6    6  Repeat3
7    7  Repeat4
8    8  Repeat4
9    9  Repeat5

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于前一个和后一个值填充中间数值

问题

答案1

答案2

答案3

PySpark 3高阶函数用于提取到列中

pandas基于另一列添加排名列

dict_items类未显示正确的类继承。

撤销在pandas数据框中使用字典进行替换。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论