2023年3月9日 23:30:46go评论188阅读模式

英文:

reading multi-index header based excel file using pandas

问题

以下是您提供的代码的翻译部分：

import pandas as pd
# 加载Excel文件
df = pd.read_excel('test_3.xlsx', sheet_name='WEEK - 2023', header=None)
# 将前3行设置为标题
header = df.iloc[:3, :].fillna(method='ffill', axis=1)
df.columns = pd.MultiIndex.from_arrays(header.values)
df = df.iloc[3:, :]
# 选择指定的列
df = df.loc[:, ('month', 'week', ('PLAN 2023', 'Traffic per channel', 'red'))]
# 重命名列以删除多级标题
df.columns = ['month', 'week', 'P_traffic_red']
# 打印最终数据框
print(df)

希望这可以帮助您读取Excel文件并处理多级标题。

英文:

I have an excel file where first 3 rows have header names, I want to read it in pandas but facing difficulty in the multi-index header.

		                             PLAN 2023						
		     Traffic per channel 			       Traffic Share per Channel		
month week   All Traffic red green orange          red green orange
jan    1     100	     50  30    20              50% 30%   20%

for 'month' and 'week', I have the header names stored in row 3 but for others, it's distributed in row 1,2,3. Also, the row number is not fixed, therefore, I need to read by headers.

The final expected output should look like this

month   week   plan_2023_Traffic_per_channel_All  .....plan_2023_Traffic_Share_per_channel_orange
jan     1                     100                                            20%

my script is below, for simplicity, I am just printing 1 value

import pandas as pd
# Load the Excel file
df = pd.read_excel(&#39;test_3.xlsx&#39;, sheet_name=&#39;WEEK - 2023&#39;, header=None)
# Set the first 3 rows as the header
header = df.iloc[:3,:].fillna(method=&#39;ffill&#39;, axis=1)
df.columns = pd.MultiIndex.from_arrays(header.values)
df = df.iloc[3:,:]
# Select only the specified columns
df = df.loc[:, (&#39;month&#39;, &#39;week&#39;, (&#39;PLAN 2023&#39;, &#39;Traffic per channel&#39;, &#39;red&#39;))]
# Rename the columns to remove the multi-level header
df.columns = [&#39;month&#39;, &#39;week&#39;, &#39;P_traffic_red&#39;]
# Print the final data frame
print(df)

picture for reference

Thank you in advance

答案1

得分: 2

你可以尝试以下代码：

df = pd.read_excel('test_3.xlsx', header=None)
cols = (df.iloc[:3].ffill(axis=1)
          .apply(lambda x: '_'.join(x.dropna().str.replace(' ', '_'))))
df = df.iloc[3:].set_axis(cols, axis=1)

输出结果：

>>> df
  statMonthName statWeek Plan_2023_Traffic_per_channel_All_Traffic  ... Plan_2023_Traffic_Share_per_Chanel_red Plan_2023_Traffic_Share_per_Chanel_green Plan_2023_Traffic_Share_per_Chanel_orange
3           jan        1                                       100  ...                                    50%                                      30%                                       20%
[1 rows x 9 columns]
>>> df.columns
Index(['statMonthName', 'statWeek',
       'Plan_2023_Traffic_per_channel_All_Traffic',
       'Plan_2023_Traffic_per_channel_red',
       'Plan_2023_Traffic_per_channel_green',
       'Plan_2023_Traffic_per_channel_orange',
       'Plan_2023_Traffic_Share_per_Chanel_red',
       'Plan_2023_Traffic_Share_per_Chanel_green',
       'Plan_2023_Traffic_Share_per_Chanel_orange'],
      dtype='object')

英文:

You can try:

df = pd.read_excel(&#39;test_3.xlsx&#39;, header=None)
cols = (df.iloc[:3].ffill(axis=1)
          .apply(lambda x: &#39;_&#39;.join(x.dropna().str.replace(&#39; &#39;, &#39;_&#39;))))
df = df.iloc[3:].set_axis(cols, axis=1)

Output:

&gt;&gt;&gt; df
  statMonthName statWeek Plan_2023_Traffic_per_channel_All_Traffic  ... Plan_2023_Traffic_Share_per_Chanel_red Plan_2023_Traffic_Share_per_Chanel_green Plan_2023_Traffic_Share_per_Chanel_orange
3           jan        1                                       100  ...                                    50%                                      30%                                       20%
[1 rows x 9 columns]
&gt;&gt;&gt; df.columns
Index([&#39;statMonthName&#39;, &#39;statWeek&#39;,
       &#39;Plan_2023_Traffic_per_channel_All_Traffic&#39;,
       &#39;Plan_2023_Traffic_per_channel_red&#39;,
       &#39;Plan_2023_Traffic_per_channel_green&#39;,
       &#39;Plan_2023_Traffic_per_channel_orange&#39;,
       &#39;Plan_2023_Traffic_Share_per_Chanel_red&#39;,
       &#39;Plan_2023_Traffic_Share_per_Chanel_green&#39;,
       &#39;Plan_2023_Traffic_Share_per_Chanel_orange&#39;],
      dtype=&#39;object&#39;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用pandas读取基于多索引标题的Excel文件

问题

答案1

用注释替换Python脚本中的打印命令。

从 Pandas 数据框中访问 HTML 表格行元素

Conditionally modify import statements.

Python PyQt5 – 将 QTime 转换为 Python 时间对象

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。