英文:
Transform pandas df into multiple columns
问题
I apologize for any confusion, but it seems like you want assistance with a code-related task. I'll provide a brief summary of what you're trying to achieve: You want to transform a dataframe where each year is a separate column into a new dataframe with columns for 'Country Name', 'Year', and the unique values from 'Series Name' as separate columns.
Here's a summary of the task you're trying to accomplish in Chinese:
你想将一个包含每年数据的数据框转换为一个新的数据框,其中包含'Country Name'、'Year'和'Series Name'的唯一值作为不同的列。
If you need assistance with the code to achieve this, please let me know, and I can provide guidance.
英文:
I apologize if this is a possible duplicate. I have a dataframe like this:
{'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
'Series Name': {0: 'CO2 emissions (metric tons per capita)',
1: 'Electric power consumption (kWh per capita)',
2: 'Energy use (kg of oil equivalent per capita)'},
'2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
'2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
'2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
'2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
'2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
'2015': {0: '4.30185120706067', 1: '..', 2: '..'},
'2016': {0: '4.20180210453832', 1: '..', 2: '..'},
'2017': {0: '4.07139674183186', 1: '..', 2: '..'},
'2018': {0: '3.9756664767256', 1: '..', 2: '..'},
'2019': {0: '3.74054556792816', 1: '..', 2: '..'},
'2020': {0: '..', 1: '..', 2: '..'},
'2021': {0: '..', 1: '..', 2: '..'},
'2022': {0: '..', 1: '..', 2: '..'}}
I have a column for Country Name
, Series Name
and all the years. I want to transform this into a single column for all the years and all the unique values in series name as different columns with their values in this. ( Series Name
has around 10 categories, I have only shown 3 in the example for reference).
The expected df would be like this:
Country Name Year C02 emission Electric power consumption Energy use
...
I am not sure how could I do this, any suggestions would be really grateful.
答案1
得分: 3
你可以通过首先将数据框 melt
来获取年份作为一列,然后 pivot
这个结果以获取 Series Name
作为列名:
m = df.melt(id_vars=['Country Name', 'Series Name'], var_name='Year')
out = m.pivot(columns=['Series Name'], index=['Country Name', 'Year'], values=['value'])
然后,您可以整理列索引和名称,并重置索引:
out.columns = out.columns.droplevel().str.replace(r'\s+\(.*$', '', regex=True)
out = out.reset_index()
输出:
Country Name Year 二氧化碳排放 电力消耗 能源使用
0 阿根廷 2010 4.0998122679475 2877.65265331343 1928.65235658729
1 阿根廷 2011 4.28094332027273 2929.07502855568 1952.05105293095
2 阿根廷 2012 4.26422362148416 3000.60352326565 1936.80353979442
3 阿根廷 2013 4.34212454655109 2967.37655805218 1967.02167752077
4 阿根廷 2014 4.20905330505396 3074.70207056563 2029.92282543737
5 阿根廷 2015 4.30185120706067 .. ..
6 阿根廷 2016 4.20180210453832 .. ..
7 阿根廷 2017 4.07139674183186 .. ..
8 阿根廷 2018 3.9756664767256 .. ..
9 阿根廷 2019 3.74054556792816 .. ..
10 阿根廷 2020 .. .. ..
11 阿根廷 2021 .. .. ..
12 阿根廷 2022 .. .. ..
请注意,我已经将 "C02 emissions," "Electric power consumption," 和 "Energy use" 翻译为中文。
英文:
You can achieve the results you want by first melt
ing the dataframe to get the years as a column, then pivot
ing that result to get the Series Name
as the columns:
m = df.melt(id_vars=['Country Name', 'Series Name'], var_name='Year')
out = m.pivot(columns=['Series Name'], index=['Country Name', 'Year'], values=['value'])
You can then tidy up the column index and names and reset the index:
out.columns = out.columns.droplevel().str.replace(r'\s+\(.*$', '', regex=True)
out = out.reset_index()
Output:
Country Name Year C02 emissions Electric power consumption Energy use
0 Argentina 2010 4.0998122679475 2877.65265331343 1928.65235658729
1 Argentina 2011 4.28094332027273 2929.07502855568 1952.05105293095
2 Argentina 2012 4.26422362148416 3000.60352326565 1936.80353979442
3 Argentina 2013 4.34212454655109 2967.37655805218 1967.02167752077
4 Argentina 2014 4.20905330505396 3074.70207056563 2029.92282543737
5 Argentina 2015 4.30185120706067 .. ..
6 Argentina 2016 4.20180210453832 .. ..
7 Argentina 2017 4.07139674183186 .. ..
8 Argentina 2018 3.9756664767256 .. ..
9 Argentina 2019 3.74054556792816 .. ..
10 Argentina 2020 .. .. ..
11 Argentina 2021 .. .. ..
12 Argentina 2022 .. .. ..
答案2
得分: 2
以下是您提供的代码的翻译部分:
out = (
pd.DataFrame(data) #you can replace this line with `df`
.melt(id_vars=["Country Name", "Series Name"], var_name="Year")
.pivot(index=["Country Name", "Year"], columns="Series Name", values="value")
.pipe(lambda x: x.set_axis(x.columns.str.extract(r"(.+)\s+\(.+\)", expand=False),
axis=1)).reset_index().rename_axis(None, axis=1)
)
输出:
print(out)
Country Name Year CO2 emissions Electric power consumption Energy use
0 Argentina 2010 4.0998122679475 2877.65265331343 1928.65235658729
1 Argentina 2011 4.28094332027273 2929.07502855568 1952.05105293095
2 Argentina 2012 4.26422362148416 3000.60352326565 1936.80353979442
.. ... ... ... ... ...
10 Argentina 2020 .. .. ..
11 Argentina 2021 .. .. ..
12 Argentina 2022 .. .. ..
[13 rows x 5 columns]
请注意,翻译部分仅包括代码和输出的内容,不包括任何额外的信息。
英文:
Here is one option :
out = (
pd.DataFrame(data) #you can replace this line with `df`
.melt(id_vars=["Country Name", "Series Name"], var_name="Year")
.pivot(index=["Country Name", "Year"], columns="Series Name", values="value")
.pipe(lambda x: x.set_axis(x.columns.str.extract(r"(.+)\s+\(.+\)", expand=False),
axis=1)).reset_index().rename_axis(None, axis=1)
)
Output :
print(out)
Country Name Year CO2 emissions Electric power consumption Energy use
0 Argentina 2010 4.0998122679475 2877.65265331343 1928.65235658729
1 Argentina 2011 4.28094332027273 2929.07502855568 1952.05105293095
2 Argentina 2012 4.26422362148416 3000.60352326565 1936.80353979442
.. ... ... ... ... ...
10 Argentina 2020 .. .. ..
11 Argentina 2021 .. .. ..
12 Argentina 2022 .. .. ..
[13 rows x 5 columns]
答案3
得分: 0
需要的库:pandas
import pandas as pd
准备数据:
d = {'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
'Series Name': {0: 'CO2 emissions (metric tons per capita)',
1: 'Electric power consumption (kWh per capita)',
2: 'Energy use (kg of oil equivalent per capita)'},
'2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
'2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
'2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
'2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
'2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
'2015': {0: '4.30185120706067', 1: '..', 2: '..'},
'2016': {0: '4.20180210453832', 1: '..', 2: '..'},
'2017': {0: '4.07139674183186', 1: '..', 2: '..'},
'2018': {0: '3.9756664767256', 1: '..', 2: '..'},
'2019': {0: '3.74054556792816', 1: '..', 2: '..'},
'2020': {0: '..', 1: '..', 2: '..'},
'2021': {0: '..', 1: '..', 2: '..'},
'2022': {0: '..', 1: '..', 2: '..'}}
处理:
df = pd.DataFrame(data=d).T
df
英文:
Library needed: pandas
import pandas as pd
Preparing data:
d = {'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
'Series Name': {0: 'CO2 emissions (metric tons per capita)',
1: 'Electric power consumption (kWh per capita)',
2: 'Energy use (kg of oil equivalent per capita)'},
'2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
'2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
'2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
'2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
'2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
'2015': {0: '4.30185120706067', 1: '..', 2: '..'},
'2016': {0: '4.20180210453832', 1: '..', 2: '..'},
'2017': {0: '4.07139674183186', 1: '..', 2: '..'},
'2018': {0: '3.9756664767256', 1: '..', 2: '..'},
'2019': {0: '3.74054556792816', 1: '..', 2: '..'},
'2020': {0: '..', 1: '..', 2: '..'},
'2021': {0: '..', 1: '..', 2: '..'},
'2022': {0: '..', 1: '..', 2: '..'}}
Processing:
df = pd.DataFrame(data = d).T
df
Output:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论