英文:
Standardizing steps for two dataframes
问题
我有两个分开的数据框 - Df1 和 Df2。这些数据框是相同的,只是一个涵盖了1年的日期范围,而另一个涵盖了5年的日期范围。
现在我想对这两个数据框应用相同的步骤,其中一些步骤如下:
DF = DF.fillna(method='ffill')
DF = DF.fillna('0')
DF = DF.T
DF.index.name = 'Sector'
DF = DF.round(0).astype(int)
DF = DF.sort_index(axis=1)
DF.columns = pd.to_datetime(DF.columns)
list_month = np.array(pd.DatetimeIndex(DF.columns).month)
DF.columns = pd.Series(DF.columns[:]).apply(lambda x: x.strftime('%b %Y')).to_list()
有没有办法我可以重复上面的步骤来处理这两个数据框,而不必为每个数据框编写相同的代码块两次?我认为可以使用一个循环来完成,但我不确定如何使用它。
英文:
I have two separate dfs - Df1 and Df2. These Dfs are the same except that one covers a date range of 1 year while the other covers the date range for 5 years.
Now I want to apply the same steps to both the DFs, some steps being below:-
DF = DF.fillna(method='ffill')
DF = DF.fillna('0')
DF = DF.T
DF.index.name = 'Sector'
DF = DF.round(0).astype(int)
DF = DF.sort_index(axis=1)
DF.columns = pd.to_datetime(DF.columns)
list_month = np.array(pd.DatetimeIndex(DF.columns).month)
DF.columns = pd.Series(DF.columns[:]).apply(lambda x: x.strftime('%b %Y')).to_list()
Is there a way I can repeat the above steps for both the DFs without having to write the same code block twice, one for each df? I think a for loop might be useful here but I am unsure about how to use it.
答案1
得分: 0
以下是翻译的内容:
# 创建可重复使用的代码的最一般方法是编写一个函数。但是,如果您事先知道您有确切的两个数据框,您也可以使用列表推导式的循环功能:
DF1, DF2 = [
( DF
.rename(index=pd.to_datetime(DF.index.to_series()).to_dict())
.sort_index()
.pipe( lambda df: df.rename(index=df.index.to_series().dt.strftime('%b %Y').to_dict()) )
.fillna(method='ffill').fillna('0').T.rename_axis(index='Sector')
.round(0).astype(int) )
for DF in [DF1, DF2]]
样本输入:
DF1:
auto tech util agri
2/20/2023 7.7 10.10 NaN NaN
3/21/2022 7.7 NaN 1.01 NaN
6/25/2021 7.7 11.11 NaN NaN
5/24/2020 7.7 NaN 2.02 NaN
4/23/2019 7.7 12.12 NaN 1.0
DF2:
auto tech util agri
2/20/2023 7.7 10.10 NaN NaN
3/21/2022 7.7 NaN 1.01 NaN
6/25/2021 7.7 11.11 NaN NaN
5/24/2020 7.7 NaN 2.02 NaN
4/23/2019 7.7 12.12 NaN 1.0
4/23/2018 8.8 20.10 NaN NaN
4/23/2017 8.8 NaN 11.01 NaN
4/23/2015 8.8 21.11 NaN NaN
4/23/2016 8.8 NaN 22.02 NaN
4/23/2014 8.8 22.12 NaN NaN
输出:
DF1:
Apr 2019 May 2020 Jun 2021 Mar 2022 Feb 2023
Sector
auto 7 7 7 7 7
tech 12 12 11 11 10
util 0 2 2 1 1
agri 1 1 1 1 1
DF2:
Apr 2014 Apr 2015 Apr 2016 Apr 2017 Apr 2018 Apr 2019 May 2020 Jun 2021 Mar 2022 Feb 2023
Sector
auto 8 8 8 8 8 7 7 7 7 7
tech 22 21 21 21 20 12 12 11 11 10
util 0 0 22 11 11 11 2 2 1 1
agri 0 0 0 0 0 1 1 1 1 1
英文:
The most general way to create reusable code is to write a function. However, if you know in advance that you have exactly two dataframes, you can also use the looping capability of a list comprehension:
DF1, DF2 = [
( DF
.rename(index=pd.to_datetime(DF.index.to_series()).to_dict())
.sort_index()
.pipe( lambda df: df.rename(index=df.index.to_series().dt.strftime('%b %Y').to_dict()) )
.fillna(method='ffill').fillna('0').T.rename_axis(index='Sector')
.round(0).astype(int) )
for DF in [DF1, DF2]]
Sample input:
DF1:
auto tech util agri
2/20/2023 7.7 10.10 NaN NaN
3/21/2022 7.7 NaN 1.01 NaN
6/25/2021 7.7 11.11 NaN NaN
5/24/2020 7.7 NaN 2.02 NaN
4/23/2019 7.7 12.12 NaN 1.0
DF2:
auto tech util agri
2/20/2023 7.7 10.10 NaN NaN
3/21/2022 7.7 NaN 1.01 NaN
6/25/2021 7.7 11.11 NaN NaN
5/24/2020 7.7 NaN 2.02 NaN
4/23/2019 7.7 12.12 NaN 1.0
4/23/2018 8.8 20.10 NaN NaN
4/23/2017 8.8 NaN 11.01 NaN
4/23/2015 8.8 21.11 NaN NaN
4/23/2016 8.8 NaN 22.02 NaN
4/23/2014 8.8 22.12 NaN NaN
Output:
DF1:
Apr 2019 May 2020 Jun 2021 Mar 2022 Feb 2023
Sector
auto 7 7 7 7 7
tech 12 12 11 11 10
util 0 2 2 1 1
agri 1 1 1 1 1
DF2:
Apr 2014 Apr 2015 Apr 2016 Apr 2017 Apr 2018 Apr 2019 May 2020 Jun 2021 Mar 2022 Feb 2023
Sector
auto 8 8 8 8 8 7 7 7 7 7
tech 22 21 21 21 20 12 12 11 11 10
util 0 0 22 11 11 11 2 2 1 1
agri 0 0 0 0 0 1 1 1 1 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论