标准化两个数据框的步骤

huangapple go评论57阅读模式
英文:

Standardizing steps for two dataframes

问题

我有两个分开的数据框 - Df1 和 Df2。这些数据框是相同的,只是一个涵盖了1年的日期范围,而另一个涵盖了5年的日期范围。

现在我想对这两个数据框应用相同的步骤,其中一些步骤如下:

    DF = DF.fillna(method='ffill')
    DF = DF.fillna('0')
    DF = DF.T
    DF.index.name = 'Sector'
    DF = DF.round(0).astype(int)
    DF = DF.sort_index(axis=1)
    
    DF.columns = pd.to_datetime(DF.columns)
    list_month = np.array(pd.DatetimeIndex(DF.columns).month)
    DF.columns = pd.Series(DF.columns[:]).apply(lambda x: x.strftime('%b %Y')).to_list()

有没有办法我可以重复上面的步骤来处理这两个数据框,而不必为每个数据框编写相同的代码块两次?我认为可以使用一个循环来完成,但我不确定如何使用它。

英文:

I have two separate dfs - Df1 and Df2. These Dfs are the same except that one covers a date range of 1 year while the other covers the date range for 5 years.

Now I want to apply the same steps to both the DFs, some steps being below:-

    DF = DF.fillna(method='ffill')
    DF = DF.fillna('0')
    DF = DF.T
    DF.index.name = 'Sector'
    DF = DF.round(0).astype(int)
    DF = DF.sort_index(axis=1)
    
    DF.columns = pd.to_datetime(DF.columns)
    list_month = np.array(pd.DatetimeIndex(DF.columns).month)
    DF.columns = pd.Series(DF.columns[:]).apply(lambda x: x.strftime('%b %Y')).to_list()

Is there a way I can repeat the above steps for both the DFs without having to write the same code block twice, one for each df? I think a for loop might be useful here but I am unsure about how to use it.

答案1

得分: 0

以下是翻译的内容:

# 创建可重复使用的代码的最一般方法是编写一个函数。但是,如果您事先知道您有确切的两个数据框,您也可以使用列表推导式的循环功能:
DF1, DF2 = [
    ( DF
    .rename(index=pd.to_datetime(DF.index.to_series()).to_dict())
    .sort_index()
    .pipe( lambda df: df.rename(index=df.index.to_series().dt.strftime('%b %Y').to_dict()) )
    .fillna(method='ffill').fillna('0').T.rename_axis(index='Sector')
    .round(0).astype(int) )
for DF in [DF1, DF2]]

样本输入:

DF1:
           auto   tech  util  agri
2/20/2023   7.7  10.10   NaN   NaN
3/21/2022   7.7    NaN  1.01   NaN
6/25/2021   7.7  11.11   NaN   NaN
5/24/2020   7.7    NaN  2.02   NaN
4/23/2019   7.7  12.12   NaN   1.0

DF2:
           auto   tech   util  agri
2/20/2023   7.7  10.10    NaN   NaN
3/21/2022   7.7    NaN   1.01   NaN
6/25/2021   7.7  11.11    NaN   NaN
5/24/2020   7.7    NaN   2.02   NaN
4/23/2019   7.7  12.12    NaN   1.0
4/23/2018   8.8  20.10    NaN   NaN
4/23/2017   8.8    NaN  11.01   NaN
4/23/2015   8.8  21.11    NaN   NaN
4/23/2016   8.8    NaN  22.02   NaN
4/23/2014   8.8  22.12    NaN   NaN

输出:

DF1:
        Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           7         7         7         7         7
tech          12        12        11        11        10
util           0         2         2         1         1
agri           1         1         1         1         1

DF2:
        Apr 2014  Apr 2015  Apr 2016  Apr 2017  Apr 2018  Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           8         8         8         8         8         7         7         7         7         7
tech          22        21        21        21        20        12        12        11        11        10
util           0         0        22        11        11        11         2         2         1         1
agri           0         0         0         0         0         1         1         1         1         1
英文:

The most general way to create reusable code is to write a function. However, if you know in advance that you have exactly two dataframes, you can also use the looping capability of a list comprehension:

    DF1, DF2 = [
        ( DF
        .rename(index=pd.to_datetime(DF.index.to_series()).to_dict())
        .sort_index()
        .pipe( lambda df: df.rename(index=df.index.to_series().dt.strftime('%b %Y').to_dict()) )
        .fillna(method='ffill').fillna('0').T.rename_axis(index='Sector')
        .round(0).astype(int) )
    for DF in [DF1, DF2]]

Sample input:

DF1:
           auto   tech  util  agri
2/20/2023   7.7  10.10   NaN   NaN
3/21/2022   7.7    NaN  1.01   NaN
6/25/2021   7.7  11.11   NaN   NaN
5/24/2020   7.7    NaN  2.02   NaN
4/23/2019   7.7  12.12   NaN   1.0

DF2:
           auto   tech   util  agri
2/20/2023   7.7  10.10    NaN   NaN
3/21/2022   7.7    NaN   1.01   NaN
6/25/2021   7.7  11.11    NaN   NaN
5/24/2020   7.7    NaN   2.02   NaN
4/23/2019   7.7  12.12    NaN   1.0
4/23/2018   8.8  20.10    NaN   NaN
4/23/2017   8.8    NaN  11.01   NaN
4/23/2015   8.8  21.11    NaN   NaN
4/23/2016   8.8    NaN  22.02   NaN
4/23/2014   8.8  22.12    NaN   NaN

Output:

DF1:
        Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           7         7         7         7         7
tech          12        12        11        11        10
util           0         2         2         1         1
agri           1         1         1         1         1

DF2:
        Apr 2014  Apr 2015  Apr 2016  Apr 2017  Apr 2018  Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           8         8         8         8         8         7         7         7         7         7
tech          22        21        21        21        20        12        12        11        11        10
util           0         0        22        11        11        11         2         2         1         1
agri           0         0         0         0         0         1         1         1         1         1

huangapple
  • 本文由 发表于 2023年2月19日 02:01:40
  • 转载请务必保留本文链接:https://go.coder-hub.com/75495332.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定