将pandas数据框转换为多个列。

huangapple go评论70阅读模式
英文:

Transform pandas df into multiple columns

问题

I apologize for any confusion, but it seems like you want assistance with a code-related task. I'll provide a brief summary of what you're trying to achieve: You want to transform a dataframe where each year is a separate column into a new dataframe with columns for 'Country Name', 'Year', and the unique values from 'Series Name' as separate columns.

Here's a summary of the task you're trying to accomplish in Chinese:

你想将一个包含每年数据的数据框转换为一个新的数据框,其中包含'Country Name'、'Year'和'Series Name'的唯一值作为不同的列。

If you need assistance with the code to achieve this, please let me know, and I can provide guidance.

英文:

I apologize if this is a possible duplicate. I have a dataframe like this:

{'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
 'Series Name': {0: 'CO2 emissions (metric tons per capita)',
  1: 'Electric power consumption (kWh per capita)',
  2: 'Energy use (kg of oil equivalent per capita)'},
 '2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
 '2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
 '2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
 '2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
 '2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
 '2015': {0: '4.30185120706067', 1: '..', 2: '..'},
 '2016': {0: '4.20180210453832', 1: '..', 2: '..'},
 '2017': {0: '4.07139674183186', 1: '..', 2: '..'},
 '2018': {0: '3.9756664767256', 1: '..', 2: '..'},
 '2019': {0: '3.74054556792816', 1: '..', 2: '..'},
 '2020': {0: '..', 1: '..', 2: '..'},
 '2021': {0: '..', 1: '..', 2: '..'},
 '2022': {0: '..', 1: '..', 2: '..'}}

I have a column for Country Name, Series Name and all the years. I want to transform this into a single column for all the years and all the unique values in series name as different columns with their values in this. ( Series Name has around 10 categories, I have only shown 3 in the example for reference).

The expected df would be like this:

Country Name    Year    C02 emission    Electric power consumption   Energy use
...      

I am not sure how could I do this, any suggestions would be really grateful.

答案1

得分: 3

你可以通过首先将数据框 melt 来获取年份作为一列,然后 pivot 这个结果以获取 Series Name 作为列名:

m = df.melt(id_vars=['Country Name', 'Series Name'], var_name='Year')
out = m.pivot(columns=['Series Name'], index=['Country Name', 'Year'], values=['value'])

然后,您可以整理列索引和名称,并重置索引:

out.columns = out.columns.droplevel().str.replace(r'\s+\(.*$', '', regex=True)
out = out.reset_index()

输出:

   Country Name  Year     二氧化碳排放  电力消耗   能源使用
0     阿根廷          2010   4.0998122679475     2877.65265331343    1928.65235658729
1     阿根廷          2011   4.28094332027273    2929.07502855568    1952.05105293095
2     阿根廷          2012   4.26422362148416    3000.60352326565    1936.80353979442
3     阿根廷          2013   4.34212454655109    2967.37655805218    1967.02167752077
4     阿根廷          2014   4.20905330505396    3074.70207056563    2029.92282543737
5     阿根廷          2015   4.30185120706067    ..                  ..
6     阿根廷          2016   4.20180210453832    ..                  ..
7     阿根廷          2017   4.07139674183186    ..                  ..
8     阿根廷          2018   3.9756664767256     ..                  ..
9     阿根廷          2019   3.74054556792816    ..                  ..
10    阿根廷          2020   ..                  ..                  ..
11    阿根廷          2021   ..                  ..                  ..
12    阿根廷          2022   ..                  ..                  ..

请注意,我已经将 "C02 emissions," "Electric power consumption," 和 "Energy use" 翻译为中文。

英文:

You can achieve the results you want by first melting the dataframe to get the years as a column, then pivoting that result to get the Series Name as the columns:

m = df.melt(id_vars=['Country Name', 'Series Name'], var_name='Year')
out = m.pivot(columns=['Series Name'], index=['Country Name', 'Year'], values=['value'])

You can then tidy up the column index and names and reset the index:

out.columns = out.columns.droplevel().str.replace(r'\s+\(.*$', '', regex=True)
out = out.reset_index()

Output:

   Country Name  Year     C02 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
3     Argentina  2013  4.34212454655109           2967.37655805218  1967.02167752077
4     Argentina  2014  4.20905330505396           3074.70207056563  2029.92282543737
5     Argentina  2015  4.30185120706067                         ..                ..
6     Argentina  2016  4.20180210453832                         ..                ..
7     Argentina  2017  4.07139674183186                         ..                ..
8     Argentina  2018   3.9756664767256                         ..                ..
9     Argentina  2019  3.74054556792816                         ..                ..
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..

答案2

得分: 2

以下是您提供的代码的翻译部分:

out = (
    pd.DataFrame(data) #you can replace this line with `df`
      .melt(id_vars=["Country Name", "Series Name"], var_name="Year")
      .pivot(index=["Country Name", "Year"], columns="Series Name", values="value")
      .pipe(lambda x: x.set_axis(x.columns.str.extract(r"(.+)\s+\(.+\)", expand=False),
                                 axis=1)).reset_index().rename_axis(None, axis=1)
)

输出

print(out)

   Country Name  Year     CO2 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
..          ...   ...               ...                        ...               ...
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..

[13 rows x 5 columns]

请注意,翻译部分仅包括代码和输出的内容,不包括任何额外的信息。

英文:

Here is one option :

out = (
    pd.DataFrame(data) #you can replace this line with `df`
      .melt(id_vars=["Country Name", "Series Name"], var_name="Year")
      .pivot(index=["Country Name", "Year"], columns="Series Name", values="value")
      .pipe(lambda x: x.set_axis(x.columns.str.extract(r"(.+)\s+\(.+\)", expand=False),
                                 axis=1)).reset_index().rename_axis(None, axis=1)
)

Output :

print(out)

   Country Name  Year     CO2 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
..          ...   ...               ...                        ...               ...
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..

[13 rows x 5 columns]

答案3

得分: 0

需要的库:pandas

import pandas as pd

准备数据:

d = {'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
 'Series Name': {0: 'CO2 emissions (metric tons per capita)',
  1: 'Electric power consumption (kWh per capita)',
  2: 'Energy use (kg of oil equivalent per capita)'},
 '2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
 '2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
 '2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
 '2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
 '2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
 '2015': {0: '4.30185120706067', 1: '..', 2: '..'},
 '2016': {0: '4.20180210453832', 1: '..', 2: '..'},
 '2017': {0: '4.07139674183186', 1: '..', 2: '..'},
 '2018': {0: '3.9756664767256', 1: '..', 2: '..'},
 '2019': {0: '3.74054556792816', 1: '..', 2: '..'},
 '2020': {0: '..', 1: '..', 2: '..'},
 '2021': {0: '..', 1: '..', 2: '..'},
 '2022': {0: '..', 1: '..', 2: '..'}}

处理:

df = pd.DataFrame(data=d).T
df

输出:将pandas数据框转换为多个列。

英文:

Library needed: pandas

import pandas as pd

Preparing data:

d = {'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
 'Series Name': {0: 'CO2 emissions (metric tons per capita)',
  1: 'Electric power consumption (kWh per capita)',
  2: 'Energy use (kg of oil equivalent per capita)'},
 '2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
 '2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
 '2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
 '2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
 '2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
 '2015': {0: '4.30185120706067', 1: '..', 2: '..'},
 '2016': {0: '4.20180210453832', 1: '..', 2: '..'},
 '2017': {0: '4.07139674183186', 1: '..', 2: '..'},
 '2018': {0: '3.9756664767256', 1: '..', 2: '..'},
 '2019': {0: '3.74054556792816', 1: '..', 2: '..'},
 '2020': {0: '..', 1: '..', 2: '..'},
 '2021': {0: '..', 1: '..', 2: '..'},
 '2022': {0: '..', 1: '..', 2: '..'}}

Processing:

df = pd.DataFrame(data = d).T
df

Output:

将pandas数据框转换为多个列。

huangapple
  • 本文由 发表于 2023年5月14日 07:08:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76245208.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定