如何在pandas中处理Excel中合并单元格的标题?

huangapple go评论125阅读模式
英文:

How to handle headers with merged cells in excel in pandas?

问题

我有这个包含物种合并单元格的Excel文件。
我想要一个数据表,其中列的名称为Specie_1_Poitn1、Specie_1_Poitn2,以此类推。

我尝试了以下方法,但这不是我想要的结果:

  1. df = pd.read_excel("/content/drive/MyDrive/Pollens.xlsx", sheet_name="Jun")
  2. species_pattern = "Specie_"
  3. species_columns = [col for col in df.columns[2:] if species_pattern in str(col)]
  4. species_columns
  5. dfPollensJun = pd.read_excel("/content/drive/MyDrive/Pollens.xlsx",sheet_name="Jun",header = 1)
  6. for i, species in enumerate(species_columns):
  7. columns = dfPollensJun.columns[i*6+2:(i+2)*6+1]
  8. novas_colunas = [f"{species}_{coluna}" for coluna in columns]
  9. dfPollensJun.rename(columns=dict(zip(columns, novas_colunas)), inplace=True)
  10. dfPollensJun

结果如下:

  1. <class 'pandas.core.frame.DataFrame'>
  2. RangeIndex: 360 entries, 0 to 359
  3. Data columns (total 20 columns):
  4. # Column Non-Null Count Dtype
  5. --- ------ -------------- -----
  6. 0 Data 360 non-null datetime64[ns]
  7. 1 Hour 360 non-null object
  8. 2 Specie_1_Point_1 360 non-null int64
  9. 3 Specie_1_Point_2 360 non-null int64
  10. 4 Specie_1_Point_3 360 non-null int64
  11. 5 Specie_1_Point_4 360 non-null int64
  12. 6 Specie_1_M&#233;dia 360 non-null float64
  13. 7 Specie_1_Total 360 non-null int64
  14. 8 Specie_2_Specie_1_Point_1.1 360 non-null int64
  15. 9 Specie_2_Specie_1_Point_2.1 360 non-null int64
  16. 10 Specie_2_Specie_1_Point_3.1 360 non-null int64
  17. 11 Specie_2_Specie_1_Point_4.1 360 non-null int64
  18. 12 Specie_2_Specie_1_M&#233;dia.1 360 non-null float64
  19. 13 Specie_2_Total.1 360 non-null int64
  20. 14 Specie_3_Specie_2_Point_1.2 360 non-null int64
  21. 15 Specie_3_Specie_2_Point_2.2 360 non-null int64
  22. 16 Specie_3_Specie_2_Point_3.2 360 non-null int64
  23. 17 Specie_3_Specie_2_Point_4.2 360 non-null int64
  24. 18 Specie_3_Specie_2_M&#233;dia.2 360 non-null float64
  25. 19 Specie_3_Total.2 360 non-null int64
  26. dtypes: datetime64[ns](1), float64(3), int64(15), object(1)
  27. memory usage: 56.4+ KB
英文:

I Have this excel with merged cells for species.
如何在pandas中处理Excel中合并单元格的标题?

I would like to have a data table with columns named Specie_1_Poitn1, Specie_1_Poitn2, .....

How can I do this?

I tried this, but it's not what I want

  1. df = pd.read_excel(&quot;/content/drive/MyDrive/Pollens.xlsx&quot;, sheet_name=&quot;Jun&quot;)
  2. species_pattern = &quot;Specie_&quot;
  3. species_columns = [col for col in df.columns[2:] if species_pattern in str(col)]
  4. species_columns
  5. dfPollensJun = pd.read_excel(&quot;/content/drive/MyDrive/Pollens.xlsx&quot;,sheet_name=&quot;Jun&quot;,header = 1)
  6. for i, species in enumerate(species_columns):
  7. columns = dfPollensJun.columns[i*6+2:(i+2)*6+1]
  8. novas_colunas = [f&quot;{species}_{coluna}&quot; for coluna in columns]
  9. dfPollensJun.rename(columns=dict(zip(columns, novas_colunas)), inplace=True)
  10. dfPollensJun

And I got this

  1. &lt;class &#39;pandas.core.frame.DataFrame&#39;&gt;
  2. RangeIndex: 360 entries, 0 to 359
  3. Data columns (total 20 columns):
  4. # Column Non-Null Count Dtype
  5. --- ------ -------------- -----
  6. 0 Data 360 non-null datetime64[ns]
  7. 1 Hour 360 non-null object
  8. 2 Specie_1_Point_1 360 non-null int64
  9. 3 Specie_1_Point_2 360 non-null int64
  10. 4 Specie_1_Point_3 360 non-null int64
  11. 5 Specie_1_Point_4 360 non-null int64
  12. 6 Specie_1_M&#233;dia 360 non-null float64
  13. 7 Specie_1_Total 360 non-null int64
  14. 8 Specie_2_Specie_1_Point_1.1 360 non-null int64
  15. 9 Specie_2_Specie_1_Point_2.1 360 non-null int64
  16. 10 Specie_2_Specie_1_Point_3.1 360 non-null int64
  17. 11 Specie_2_Specie_1_Point_4.1 360 non-null int64
  18. 12 Specie_2_Specie_1_M&#233;dia.1 360 non-null float64
  19. 13 Specie_2_Total.1 360 non-null int64
  20. 14 Specie_3_Specie_2_Point_1.2 360 non-null int64
  21. 15 Specie_3_Specie_2_Point_2.2 360 non-null int64
  22. 16 Specie_3_Specie_2_Point_3.2 360 non-null int64
  23. 17 Specie_3_Specie_2_Point_4.2 360 non-null int64
  24. 18 Specie_3_Specie_2_M&#233;dia.2 360 non-null float64
  25. 19 Specie_3_Total.2 360 non-null int64
  26. dtypes: datetime64[ns](1), float64(3), int64(15), object(1)
  27. memory usage: 56.4+ KB

答案1

得分: 2

Assuming your table starts at the cell A0, you can try this:

  1. df = pd.read_excel(
  2. "/content/drive/MyDrive/Pollens.xlsx",
  3. sheet_name="Jun", index_col=[0, 1], header=[0, 1]
  4. )
  5. df = df.rename_axis(index=["Data", "Hour"])
  6. df.columns = df.columns.map(lambda x: f"{x[0]}_{x[1]}")
  7. df = df.reset_index() # optional ?
英文:

Assuming your table starts at the cell A0, you can try this :

  1. df = pd.read_excel(
  2. &quot;/content/drive/MyDrive/Pollens.xlsx&quot;,
  3. sheet_name=&quot;Jun&quot;, index_col=[0, 1], header=[0, 1]
  4. )
  5. df = df.rename_axis(index=[&quot;Data&quot;, &quot;Hour&quot;])
  6. df.columns = df.columns.map(lambda x: f&quot;{x[0]}_{x[1]}&quot;)
  7. df = df.reset_index() # optional ?

huangapple
  • 本文由 发表于 2023年6月15日 01:37:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76476222.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定