如何在Python中操作数据框?

huangapple go评论72阅读模式
英文:

How to manipulate a dataframe in Python?

问题

我有以下的样本数据框:

如何在Python中操作数据框?

我需要将它转换成如下形式:

如何在Python中操作数据框?

请注意:

  • 在新的列中,原始数据框中的最后两列已被替换为原始数据框的倒数第二列的地区变量。
  • 原始数据框重复了列直到“Year”列,新数据框应该有更少的行,因为新变量现在以“转置方式”存在。

我之前认为会奏效的方法并没有奏效。因为行仍然是重复的:

print(PopRatesC.shape)
PopRates = PopRatesC.copy()
PopRates.rename(columns={'Value':'PopRate'}, inplace=True)
PopRates=pd.get_dummies(data=PopRates, columns=['Series'])
PopRates['Rural_PopRate']= PopRates['Series_Rural_Rate']*PopRates['PopRate']
PopRates['Urban_PopRate']= PopRates['Series_Urban_Rate']*PopRates['PopRate']
print(PopRates.shape)
PopRates[416:422]

如何在Python中操作数据框?

请解释我应该使用的正确方法。

英文:

I have the following sample dataframe:

如何在Python中操作数据框?

I need to transform it to look like :

如何在Python中操作数据框?

Pls note that:

  • In the new column, the last 2 columns in original data frame have been replaced by the district variables of second last column of original df
  • The original df was duplicating columns upto 'Year' column and the new df should have fewer rows because the new variables are now in a 'transposed manner'

What I had thought would work out never did. Because the rows are still duplicated:

print(PopRatesC.shape)
PopRates = PopRatesC.copy()
PopRates.rename(columns={'Value':'PopRate'}, inplace=True)
PopRates=pd.get_dummies(data=PopRates, columns=['Series'])
PopRates['Rural_PopRate']= PopRates['Series_Rural_Rate']*PopRates['PopRate']
PopRates['Urban_PopRate']= PopRates['Series_Urban_Rate']*PopRates['PopRate']
print(PopRates.shape)
PopRates[416:422]

如何在Python中操作数据框?

Kindly explain the right approach I should use.

答案1

得分: 2

看起来你想根据一组列创建一个数据透视表,只需使用pandas.pivot()

示例
import pandas as pd

df = pd.DataFrame({
    'code': [108,108,108,108,132,132],
    'country': ['burundi','burundi','burundi','burundi','cabo verde','cabo verde'],
    'year': [2010,2010,2015,2015,2005,2005],
    'series': ['urban_rate','rural_rate','urban_rate','rural_rate','urban_rate','rural_rate'],
    'value': [5.9,3.0,5.6,2.7,3.3,-0.2]
})

pd.pivot(df, index=['code','country','year'], columns='series', values='value').reset_index()
输出
code country year rural_rate urban_rate
108 burundi 2010 3.0 5.9
108 burundi 2015 2.7 5.6
132 cabo verde 2005 -0.2 3.3
英文:

It looks like you wanna get a pivot based on group of columns, so simply use pandas.pivot().

Example
import pandas as pd

df = pd.DataFrame({
    'code': [108,108,108,108,132,132],
    'country': ['burundi','burundi','burundi','burundi','cabo verde','cabo verde'],
    'year': [2010,2010,2015,2015,2005,2005],
    'series': ['urban_rate','rural_rate','urban_rate','rural_rate','urban_rate','rural_rate'],
    'value': [5.9,3.0,5.6,2.7,3.3,-0.2]
})

pd.pivot(df, index=['code','country','year'], columns='series', values='value').reset_index()
Output
code country year rural_rate urban_rate
108 burundi 2010 3.0 5.9
108 burundi 2015 2.7 5.6
132 cabo verde 2005 -0.2 3.3

huangapple
  • 本文由 发表于 2023年5月14日 17:41:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76246796.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定