英文:
How to handle headers with merged cells in excel in pandas?
问题
我有这个包含物种合并单元格的Excel文件。
我想要一个数据表,其中列的名称为Specie_1_Poitn1、Specie_1_Poitn2,以此类推。
我尝试了以下方法,但这不是我想要的结果:
df = pd.read_excel("/content/drive/MyDrive/Pollens.xlsx", sheet_name="Jun")
species_pattern = "Specie_"
species_columns = [col for col in df.columns[2:] if species_pattern in str(col)]
species_columns
dfPollensJun = pd.read_excel("/content/drive/MyDrive/Pollens.xlsx",sheet_name="Jun",header = 1)
for i, species in enumerate(species_columns):
columns = dfPollensJun.columns[i*6+2:(i+2)*6+1]
novas_colunas = [f"{species}_{coluna}" for coluna in columns]
dfPollensJun.rename(columns=dict(zip(columns, novas_colunas)), inplace=True)
dfPollensJun
结果如下:
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 360 entries, 0 to 359
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Data 360 non-null datetime64[ns]
1 Hour 360 non-null object
2 Specie_1_Point_1 360 non-null int64
3 Specie_1_Point_2 360 non-null int64
4 Specie_1_Point_3 360 non-null int64
5 Specie_1_Point_4 360 non-null int64
6 Specie_1_Média 360 non-null float64
7 Specie_1_Total 360 non-null int64
8 Specie_2_Specie_1_Point_1.1 360 non-null int64
9 Specie_2_Specie_1_Point_2.1 360 non-null int64
10 Specie_2_Specie_1_Point_3.1 360 non-null int64
11 Specie_2_Specie_1_Point_4.1 360 non-null int64
12 Specie_2_Specie_1_Média.1 360 non-null float64
13 Specie_2_Total.1 360 non-null int64
14 Specie_3_Specie_2_Point_1.2 360 non-null int64
15 Specie_3_Specie_2_Point_2.2 360 non-null int64
16 Specie_3_Specie_2_Point_3.2 360 non-null int64
17 Specie_3_Specie_2_Point_4.2 360 non-null int64
18 Specie_3_Specie_2_Média.2 360 non-null float64
19 Specie_3_Total.2 360 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(15), object(1)
memory usage: 56.4+ KB
英文:
I Have this excel with merged cells for species.
I would like to have a data table with columns named Specie_1_Poitn1, Specie_1_Poitn2, .....
How can I do this?
I tried this, but it's not what I want
df = pd.read_excel("/content/drive/MyDrive/Pollens.xlsx", sheet_name="Jun")
species_pattern = "Specie_"
species_columns = [col for col in df.columns[2:] if species_pattern in str(col)]
species_columns
dfPollensJun = pd.read_excel("/content/drive/MyDrive/Pollens.xlsx",sheet_name="Jun",header = 1)
for i, species in enumerate(species_columns):
columns = dfPollensJun.columns[i*6+2:(i+2)*6+1]
novas_colunas = [f"{species}_{coluna}" for coluna in columns]
dfPollensJun.rename(columns=dict(zip(columns, novas_colunas)), inplace=True)
dfPollensJun
And I got this
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 360 entries, 0 to 359
Data columns (total 20 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Data 360 non-null datetime64[ns]
1 Hour 360 non-null object
2 Specie_1_Point_1 360 non-null int64
3 Specie_1_Point_2 360 non-null int64
4 Specie_1_Point_3 360 non-null int64
5 Specie_1_Point_4 360 non-null int64
6 Specie_1_Média 360 non-null float64
7 Specie_1_Total 360 non-null int64
8 Specie_2_Specie_1_Point_1.1 360 non-null int64
9 Specie_2_Specie_1_Point_2.1 360 non-null int64
10 Specie_2_Specie_1_Point_3.1 360 non-null int64
11 Specie_2_Specie_1_Point_4.1 360 non-null int64
12 Specie_2_Specie_1_Média.1 360 non-null float64
13 Specie_2_Total.1 360 non-null int64
14 Specie_3_Specie_2_Point_1.2 360 non-null int64
15 Specie_3_Specie_2_Point_2.2 360 non-null int64
16 Specie_3_Specie_2_Point_3.2 360 non-null int64
17 Specie_3_Specie_2_Point_4.2 360 non-null int64
18 Specie_3_Specie_2_Média.2 360 non-null float64
19 Specie_3_Total.2 360 non-null int64
dtypes: datetime64[ns](1), float64(3), int64(15), object(1)
memory usage: 56.4+ KB
答案1
得分: 2
Assuming your table starts at the cell A0
, you can try this:
df = pd.read_excel(
"/content/drive/MyDrive/Pollens.xlsx",
sheet_name="Jun", index_col=[0, 1], header=[0, 1]
)
df = df.rename_axis(index=["Data", "Hour"])
df.columns = df.columns.map(lambda x: f"{x[0]}_{x[1]}")
df = df.reset_index() # optional ?
英文:
Assuming your table starts at the cell A0
, you can try this :
df = pd.read_excel(
"/content/drive/MyDrive/Pollens.xlsx",
sheet_name="Jun", index_col=[0, 1], header=[0, 1]
)
df = df.rename_axis(index=["Data", "Hour"])
df.columns = df.columns.map(lambda x: f"{x[0]}_{x[1]}")
df = df.reset_index() # optional ?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论