问题

我有两个分开的数据框 - Df1 和 Df2。这些数据框是相同的，只是一个涵盖了1年的日期范围，而另一个涵盖了5年的日期范围。

现在我想对这两个数据框应用相同的步骤，其中一些步骤如下：

    DF = DF.fillna(method='ffill')
    DF = DF.fillna('0')
    DF = DF.T
    DF.index.name = 'Sector'
    DF = DF.round(0).astype(int)
    DF = DF.sort_index(axis=1)
    
    DF.columns = pd.to_datetime(DF.columns)
    list_month = np.array(pd.DatetimeIndex(DF.columns).month)
    DF.columns = pd.Series(DF.columns[:]).apply(lambda x: x.strftime('%b %Y')).to_list()

有没有办法我可以重复上面的步骤来处理这两个数据框，而不必为每个数据框编写相同的代码块两次？我认为可以使用一个循环来完成，但我不确定如何使用它。

英文:

I have two separate dfs - Df1 and Df2. These Dfs are the same except that one covers a date range of 1 year while the other covers the date range for 5 years.

Now I want to apply the same steps to both the DFs, some steps being below:-

    DF = DF.fillna(method=&#39;ffill&#39;)
    DF = DF.fillna(&#39;0&#39;)
    DF = DF.T
    DF.index.name = &#39;Sector&#39;
    DF = DF.round(0).astype(int)
    DF = DF.sort_index(axis=1)
    
    DF.columns = pd.to_datetime(DF.columns)
    list_month = np.array(pd.DatetimeIndex(DF.columns).month)
    DF.columns = pd.Series(DF.columns[:]).apply(lambda x: x.strftime(&#39;%b %Y&#39;)).to_list()

Is there a way I can repeat the above steps for both the DFs without having to write the same code block twice, one for each df? I think a for loop might be useful here but I am unsure about how to use it.

答案1

得分: 0

以下是翻译的内容：

# 创建可重复使用的代码的最一般方法是编写一个函数。但是，如果您事先知道您有确切的两个数据框，您也可以使用列表推导式的循环功能：
DF1, DF2 = [
    ( DF
    .rename(index=pd.to_datetime(DF.index.to_series()).to_dict())
    .sort_index()
    .pipe( lambda df: df.rename(index=df.index.to_series().dt.strftime('%b %Y').to_dict()) )
    .fillna(method='ffill').fillna('0').T.rename_axis(index='Sector')
    .round(0).astype(int) )
for DF in [DF1, DF2]]

样本输入：

DF1:
           auto   tech  util  agri
2/20/2023   7.7  10.10   NaN   NaN
3/21/2022   7.7    NaN  1.01   NaN
6/25/2021   7.7  11.11   NaN   NaN
5/24/2020   7.7    NaN  2.02   NaN
4/23/2019   7.7  12.12   NaN   1.0

DF2:
           auto   tech   util  agri
2/20/2023   7.7  10.10    NaN   NaN
3/21/2022   7.7    NaN   1.01   NaN
6/25/2021   7.7  11.11    NaN   NaN
5/24/2020   7.7    NaN   2.02   NaN
4/23/2019   7.7  12.12    NaN   1.0
4/23/2018   8.8  20.10    NaN   NaN
4/23/2017   8.8    NaN  11.01   NaN
4/23/2015   8.8  21.11    NaN   NaN
4/23/2016   8.8    NaN  22.02   NaN
4/23/2014   8.8  22.12    NaN   NaN

输出：

DF1:
        Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           7         7         7         7         7
tech          12        12        11        11        10
util           0         2         2         1         1
agri           1         1         1         1         1

DF2:
        Apr 2014  Apr 2015  Apr 2016  Apr 2017  Apr 2018  Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           8         8         8         8         8         7         7         7         7         7
tech          22        21        21        21        20        12        12        11        11        10
util           0         0        22        11        11        11         2         2         1         1
agri           0         0         0         0         0         1         1         1         1         1

英文:

The most general way to create reusable code is to write a function. However, if you know in advance that you have exactly two dataframes, you can also use the looping capability of a list comprehension:

    DF1, DF2 = [
        ( DF
        .rename(index=pd.to_datetime(DF.index.to_series()).to_dict())
        .sort_index()
        .pipe( lambda df: df.rename(index=df.index.to_series().dt.strftime(&#39;%b %Y&#39;).to_dict()) )
        .fillna(method=&#39;ffill&#39;).fillna(&#39;0&#39;).T.rename_axis(index=&#39;Sector&#39;)
        .round(0).astype(int) )
    for DF in [DF1, DF2]]

Sample input:

DF1:
           auto   tech  util  agri
2/20/2023   7.7  10.10   NaN   NaN
3/21/2022   7.7    NaN  1.01   NaN
6/25/2021   7.7  11.11   NaN   NaN
5/24/2020   7.7    NaN  2.02   NaN
4/23/2019   7.7  12.12   NaN   1.0

DF2:
           auto   tech   util  agri
2/20/2023   7.7  10.10    NaN   NaN
3/21/2022   7.7    NaN   1.01   NaN
6/25/2021   7.7  11.11    NaN   NaN
5/24/2020   7.7    NaN   2.02   NaN
4/23/2019   7.7  12.12    NaN   1.0
4/23/2018   8.8  20.10    NaN   NaN
4/23/2017   8.8    NaN  11.01   NaN
4/23/2015   8.8  21.11    NaN   NaN
4/23/2016   8.8    NaN  22.02   NaN
4/23/2014   8.8  22.12    NaN   NaN

Output:

DF1:
        Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           7         7         7         7         7
tech          12        12        11        11        10
util           0         2         2         1         1
agri           1         1         1         1         1

DF2:
        Apr 2014  Apr 2015  Apr 2016  Apr 2017  Apr 2018  Apr 2019  May 2020  Jun 2021  Mar 2022  Feb 2023
Sector
auto           8         8         8         8         8         7         7         7         7         7
tech          22        21        21        21        20        12        12        11        11        10
util           0         0        22        11        11        11         2         2         1         1
agri           0         0         0         0         0         1         1         1         1         1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

标准化两个数据框的步骤

问题

答案1

将Liquid模板渲染为Python字典。

为什么我的Python-Requests脚本在使用URL列表时不断下载相同的页面？

在代码中是否应包含测试语句？

奇怪的行为：在Python中将整数乘以负幂。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论