2023年5月14日 07:08:30go评论70阅读模式

英文:

Transform pandas df into multiple columns

问题

I apologize for any confusion, but it seems like you want assistance with a code-related task. I'll provide a brief summary of what you're trying to achieve: You want to transform a dataframe where each year is a separate column into a new dataframe with columns for 'Country Name', 'Year', and the unique values from 'Series Name' as separate columns.

Here's a summary of the task you're trying to accomplish in Chinese:

你想将一个包含每年数据的数据框转换为一个新的数据框，其中包含'Country Name'、'Year'和'Series Name'的唯一值作为不同的列。

If you need assistance with the code to achieve this, please let me know, and I can provide guidance.

英文:

I apologize if this is a possible duplicate. I have a dataframe like this:

{&#39;Country Name&#39;: {0: &#39;Argentina&#39;, 1: &#39;Argentina&#39;, 2: &#39;Argentina&#39;},
 &#39;Series Name&#39;: {0: &#39;CO2 emissions (metric tons per capita)&#39;,
  1: &#39;Electric power consumption (kWh per capita)&#39;,
  2: &#39;Energy use (kg of oil equivalent per capita)&#39;},
 &#39;2010&#39;: {0: &#39;4.0998122679475&#39;, 1: &#39;2877.65265331343&#39;, 2: &#39;1928.65235658729&#39;},
 &#39;2011&#39;: {0: &#39;4.28094332027273&#39;, 1: &#39;2929.07502855568&#39;, 2: &#39;1952.05105293095&#39;},
 &#39;2012&#39;: {0: &#39;4.26422362148416&#39;, 1: &#39;3000.60352326565&#39;, 2: &#39;1936.80353979442&#39;},
 &#39;2013&#39;: {0: &#39;4.34212454655109&#39;, 1: &#39;2967.37655805218&#39;, 2: &#39;1967.02167752077&#39;},
 &#39;2014&#39;: {0: &#39;4.20905330505396&#39;, 1: &#39;3074.70207056563&#39;, 2: &#39;2029.92282543737&#39;},
 &#39;2015&#39;: {0: &#39;4.30185120706067&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2016&#39;: {0: &#39;4.20180210453832&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2017&#39;: {0: &#39;4.07139674183186&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2018&#39;: {0: &#39;3.9756664767256&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2019&#39;: {0: &#39;3.74054556792816&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2020&#39;: {0: &#39;..&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2021&#39;: {0: &#39;..&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2022&#39;: {0: &#39;..&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;}}

I have a column for Country Name, Series Name and all the years. I want to transform this into a single column for all the years and all the unique values in series name as different columns with their values in this. ( Series Name has around 10 categories, I have only shown 3 in the example for reference).

The expected df would be like this:

Country Name    Year    C02 emission    Electric power consumption   Energy use
...

I am not sure how could I do this, any suggestions would be really grateful.

答案1

得分: 3

你可以通过首先将数据框 melt 来获取年份作为一列，然后 pivot 这个结果以获取 Series Name 作为列名：

m = df.melt(id_vars=['Country Name', 'Series Name'], var_name='Year')
out = m.pivot(columns=['Series Name'], index=['Country Name', 'Year'], values=['value'])

然后，您可以整理列索引和名称，并重置索引：

out.columns = out.columns.droplevel().str.replace(r'\s+\(.*$', '', regex=True)
out = out.reset_index()

输出：

   Country Name  Year     二氧化碳排放  电力消耗   能源使用
0     阿根廷          2010   4.0998122679475     2877.65265331343    1928.65235658729
1     阿根廷          2011   4.28094332027273    2929.07502855568    1952.05105293095
2     阿根廷          2012   4.26422362148416    3000.60352326565    1936.80353979442
3     阿根廷          2013   4.34212454655109    2967.37655805218    1967.02167752077
4     阿根廷          2014   4.20905330505396    3074.70207056563    2029.92282543737
5     阿根廷          2015   4.30185120706067    ..                  ..
6     阿根廷          2016   4.20180210453832    ..                  ..
7     阿根廷          2017   4.07139674183186    ..                  ..
8     阿根廷          2018   3.9756664767256     ..                  ..
9     阿根廷          2019   3.74054556792816    ..                  ..
10    阿根廷          2020   ..                  ..                  ..
11    阿根廷          2021   ..                  ..                  ..
12    阿根廷          2022   ..                  ..                  ..

请注意，我已经将 "C02 emissions," "Electric power consumption," 和 "Energy use" 翻译为中文。

英文:

You can achieve the results you want by first melting the dataframe to get the years as a column, then pivoting that result to get the Series Name as the columns:

m = df.melt(id_vars=[&#39;Country Name&#39;, &#39;Series Name&#39;], var_name=&#39;Year&#39;)
out = m.pivot(columns=[&#39;Series Name&#39;], index=[&#39;Country Name&#39;, &#39;Year&#39;], values=[&#39;value&#39;])

You can then tidy up the column index and names and reset the index:

out.columns = out.columns.droplevel().str.replace(r&#39;\s+\(.*$&#39;, &#39;&#39;, regex=True)
out = out.reset_index()

Output:

   Country Name  Year     C02 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
3     Argentina  2013  4.34212454655109           2967.37655805218  1967.02167752077
4     Argentina  2014  4.20905330505396           3074.70207056563  2029.92282543737
5     Argentina  2015  4.30185120706067                         ..                ..
6     Argentina  2016  4.20180210453832                         ..                ..
7     Argentina  2017  4.07139674183186                         ..                ..
8     Argentina  2018   3.9756664767256                         ..                ..
9     Argentina  2019  3.74054556792816                         ..                ..
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..

答案2

得分: 2

以下是您提供的代码的翻译部分：

out = (
    pd.DataFrame(data) #you can replace this line with `df`
      .melt(id_vars=["Country Name", "Series Name"], var_name="Year")
      .pivot(index=["Country Name", "Year"], columns="Series Name", values="value")
      .pipe(lambda x: x.set_axis(x.columns.str.extract(r"(.+)\s+\(.+\)", expand=False),
                                 axis=1)).reset_index().rename_axis(None, axis=1)
)

输出：

print(out)

   Country Name  Year     CO2 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
..          ...   ...               ...                        ...               ...
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..

[13 rows x 5 columns]

请注意，翻译部分仅包括代码和输出的内容，不包括任何额外的信息。

英文:

Here is one option :

out = (
    pd.DataFrame(data) #you can replace this line with `df`
      .melt(id_vars=[&quot;Country Name&quot;, &quot;Series Name&quot;], var_name=&quot;Year&quot;)
      .pivot(index=[&quot;Country Name&quot;, &quot;Year&quot;], columns=&quot;Series Name&quot;, values=&quot;value&quot;)
      .pipe(lambda x: x.set_axis(x.columns.str.extract(r&quot;(.+)\s+\(.+\)&quot;, expand=False),
                                 axis=1)).reset_index().rename_axis(None, axis=1)
)

Output :

print(out)

   Country Name  Year     CO2 emissions Electric power consumption        Energy use
0     Argentina  2010   4.0998122679475           2877.65265331343  1928.65235658729
1     Argentina  2011  4.28094332027273           2929.07502855568  1952.05105293095
2     Argentina  2012  4.26422362148416           3000.60352326565  1936.80353979442
..          ...   ...               ...                        ...               ...
10    Argentina  2020                ..                         ..                ..
11    Argentina  2021                ..                         ..                ..
12    Argentina  2022                ..                         ..                ..

[13 rows x 5 columns]

答案3

得分: 0

需要的库：pandas

import pandas as pd

准备数据：

d = {'Country Name': {0: 'Argentina', 1: 'Argentina', 2: 'Argentina'},
 'Series Name': {0: 'CO2 emissions (metric tons per capita)',
  1: 'Electric power consumption (kWh per capita)',
  2: 'Energy use (kg of oil equivalent per capita)'},
 '2010': {0: '4.0998122679475', 1: '2877.65265331343', 2: '1928.65235658729'},
 '2011': {0: '4.28094332027273', 1: '2929.07502855568', 2: '1952.05105293095'},
 '2012': {0: '4.26422362148416', 1: '3000.60352326565', 2: '1936.80353979442'},
 '2013': {0: '4.34212454655109', 1: '2967.37655805218', 2: '1967.02167752077'},
 '2014': {0: '4.20905330505396', 1: '3074.70207056563', 2: '2029.92282543737'},
 '2015': {0: '4.30185120706067', 1: '..', 2: '..'},
 '2016': {0: '4.20180210453832', 1: '..', 2: '..'},
 '2017': {0: '4.07139674183186', 1: '..', 2: '..'},
 '2018': {0: '3.9756664767256', 1: '..', 2: '..'},
 '2019': {0: '3.74054556792816', 1: '..', 2: '..'},
 '2020': {0: '..', 1: '..', 2: '..'},
 '2021': {0: '..', 1: '..', 2: '..'},
 '2022': {0: '..', 1: '..', 2: '..'}}

处理：

df = pd.DataFrame(data=d).T
df

输出：

英文:

Library needed: pandas

import pandas as pd

Preparing data:

d = {&#39;Country Name&#39;: {0: &#39;Argentina&#39;, 1: &#39;Argentina&#39;, 2: &#39;Argentina&#39;},
 &#39;Series Name&#39;: {0: &#39;CO2 emissions (metric tons per capita)&#39;,
  1: &#39;Electric power consumption (kWh per capita)&#39;,
  2: &#39;Energy use (kg of oil equivalent per capita)&#39;},
 &#39;2010&#39;: {0: &#39;4.0998122679475&#39;, 1: &#39;2877.65265331343&#39;, 2: &#39;1928.65235658729&#39;},
 &#39;2011&#39;: {0: &#39;4.28094332027273&#39;, 1: &#39;2929.07502855568&#39;, 2: &#39;1952.05105293095&#39;},
 &#39;2012&#39;: {0: &#39;4.26422362148416&#39;, 1: &#39;3000.60352326565&#39;, 2: &#39;1936.80353979442&#39;},
 &#39;2013&#39;: {0: &#39;4.34212454655109&#39;, 1: &#39;2967.37655805218&#39;, 2: &#39;1967.02167752077&#39;},
 &#39;2014&#39;: {0: &#39;4.20905330505396&#39;, 1: &#39;3074.70207056563&#39;, 2: &#39;2029.92282543737&#39;},
 &#39;2015&#39;: {0: &#39;4.30185120706067&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2016&#39;: {0: &#39;4.20180210453832&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2017&#39;: {0: &#39;4.07139674183186&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2018&#39;: {0: &#39;3.9756664767256&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2019&#39;: {0: &#39;3.74054556792816&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2020&#39;: {0: &#39;..&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2021&#39;: {0: &#39;..&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;},
 &#39;2022&#39;: {0: &#39;..&#39;, 1: &#39;..&#39;, 2: &#39;..&#39;}}

Processing:

df = pd.DataFrame(data = d).T
df

Output:

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将pandas数据框转换为多个列。

问题

答案1

答案2

答案3

How do I indicate that the .value of an enum is an unstable implementation detail?

执行一个 SQL 查询，根据 pandas 数据帧的参数进行操作。

In Django, how to get the return of two functions in the same html page? Error: TypeError: kwargs argument must be a dict, but got function

这两个命令为什么产生不同的结果？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论