使用pandas读取基于多索引标题的Excel文件

huangapple go评论188阅读模式
英文:

reading multi-index header based excel file using pandas

问题

以下是您提供的代码的翻译部分:

  1. import pandas as pd
  2. # 加载Excel文件
  3. df = pd.read_excel('test_3.xlsx', sheet_name='WEEK - 2023', header=None)
  4. # 将前3行设置为标题
  5. header = df.iloc[:3, :].fillna(method='ffill', axis=1)
  6. df.columns = pd.MultiIndex.from_arrays(header.values)
  7. df = df.iloc[3:, :]
  8. # 选择指定的列
  9. df = df.loc[:, ('month', 'week', ('PLAN 2023', 'Traffic per channel', 'red'))]
  10. # 重命名列以删除多级标题
  11. df.columns = ['month', 'week', 'P_traffic_red']
  12. # 打印最终数据框
  13. print(df)

希望这可以帮助您读取Excel文件并处理多级标题。

英文:

I have an excel file where first 3 rows have header names, I want to read it in pandas but facing difficulty in the multi-index header.

  1. PLAN 2023
  2. Traffic per channel Traffic Share per Channel
  3. month week All Traffic red green orange red green orange
  4. jan 1 100 50 30 20 50% 30% 20%

for 'month' and 'week', I have the header names stored in row 3 but for others, it's distributed in row 1,2,3. Also, the row number is not fixed, therefore, I need to read by headers.

The final expected output should look like this

  1. month week plan_2023_Traffic_per_channel_All .....plan_2023_Traffic_Share_per_channel_orange
  2. jan 1 100 20%

my script is below, for simplicity, I am just printing 1 value

  1. import pandas as pd
  2. # Load the Excel file
  3. df = pd.read_excel('test_3.xlsx', sheet_name='WEEK - 2023', header=None)
  4. # Set the first 3 rows as the header
  5. header = df.iloc[:3,:].fillna(method='ffill', axis=1)
  6. df.columns = pd.MultiIndex.from_arrays(header.values)
  7. df = df.iloc[3:,:]
  8. # Select only the specified columns
  9. df = df.loc[:, ('month', 'week', ('PLAN 2023', 'Traffic per channel', 'red'))]
  10. # Rename the columns to remove the multi-level header
  11. df.columns = ['month', 'week', 'P_traffic_red']
  12. # Print the final data frame
  13. print(df)

picture for reference

使用pandas读取基于多索引标题的Excel文件

Thank you in advance

答案1

得分: 2

你可以尝试以下代码:

  1. df = pd.read_excel('test_3.xlsx', header=None)
  2. cols = (df.iloc[:3].ffill(axis=1)
  3. .apply(lambda x: '_'.join(x.dropna().str.replace(' ', '_'))))
  4. df = df.iloc[3:].set_axis(cols, axis=1)

输出结果:

  1. >>> df
  2. statMonthName statWeek Plan_2023_Traffic_per_channel_All_Traffic ... Plan_2023_Traffic_Share_per_Chanel_red Plan_2023_Traffic_Share_per_Chanel_green Plan_2023_Traffic_Share_per_Chanel_orange
  3. 3 jan 1 100 ... 50% 30% 20%
  4. [1 rows x 9 columns]
  5. >>> df.columns
  6. Index(['statMonthName', 'statWeek',
  7. 'Plan_2023_Traffic_per_channel_All_Traffic',
  8. 'Plan_2023_Traffic_per_channel_red',
  9. 'Plan_2023_Traffic_per_channel_green',
  10. 'Plan_2023_Traffic_per_channel_orange',
  11. 'Plan_2023_Traffic_Share_per_Chanel_red',
  12. 'Plan_2023_Traffic_Share_per_Chanel_green',
  13. 'Plan_2023_Traffic_Share_per_Chanel_orange'],
  14. dtype='object')
英文:

You can try:

  1. df = pd.read_excel('test_3.xlsx', header=None)
  2. cols = (df.iloc[:3].ffill(axis=1)
  3. .apply(lambda x: '_'.join(x.dropna().str.replace(' ', '_'))))
  4. df = df.iloc[3:].set_axis(cols, axis=1)

Output:

  1. >>> df
  2. statMonthName statWeek Plan_2023_Traffic_per_channel_All_Traffic ... Plan_2023_Traffic_Share_per_Chanel_red Plan_2023_Traffic_Share_per_Chanel_green Plan_2023_Traffic_Share_per_Chanel_orange
  3. 3 jan 1 100 ... 50% 30% 20%
  4. [1 rows x 9 columns]
  5. >>> df.columns
  6. Index(['statMonthName', 'statWeek',
  7. 'Plan_2023_Traffic_per_channel_All_Traffic',
  8. 'Plan_2023_Traffic_per_channel_red',
  9. 'Plan_2023_Traffic_per_channel_green',
  10. 'Plan_2023_Traffic_per_channel_orange',
  11. 'Plan_2023_Traffic_Share_per_Chanel_red',
  12. 'Plan_2023_Traffic_Share_per_Chanel_green',
  13. 'Plan_2023_Traffic_Share_per_Chanel_orange'],
  14. dtype='object')

huangapple
  • 本文由 发表于 2023年3月9日 23:30:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/75686755.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定