英文:
How to manipulate a dataframe in Python?
问题
我有以下的样本数据框:
我需要将它转换成如下形式:
请注意:
- 在新的列中,原始数据框中的最后两列已被替换为原始数据框的倒数第二列的地区变量。
- 原始数据框重复了列直到“Year”列,新数据框应该有更少的行,因为新变量现在以“转置方式”存在。
我之前认为会奏效的方法并没有奏效。因为行仍然是重复的:
print(PopRatesC.shape)
PopRates = PopRatesC.copy()
PopRates.rename(columns={'Value':'PopRate'}, inplace=True)
PopRates=pd.get_dummies(data=PopRates, columns=['Series'])
PopRates['Rural_PopRate']= PopRates['Series_Rural_Rate']*PopRates['PopRate']
PopRates['Urban_PopRate']= PopRates['Series_Urban_Rate']*PopRates['PopRate']
print(PopRates.shape)
PopRates[416:422]
请解释我应该使用的正确方法。
英文:
I have the following sample dataframe:
I need to transform it to look like :
Pls note that:
- In the new column, the last 2 columns in original data frame have been replaced by the district variables of second last column of original df
- The original df was duplicating columns upto 'Year' column and the new df should have fewer rows because the new variables are now in a 'transposed manner'
What I had thought would work out never did. Because the rows are still duplicated:
print(PopRatesC.shape)
PopRates = PopRatesC.copy()
PopRates.rename(columns={'Value':'PopRate'}, inplace=True)
PopRates=pd.get_dummies(data=PopRates, columns=['Series'])
PopRates['Rural_PopRate']= PopRates['Series_Rural_Rate']*PopRates['PopRate']
PopRates['Urban_PopRate']= PopRates['Series_Urban_Rate']*PopRates['PopRate']
print(PopRates.shape)
PopRates[416:422]
Kindly explain the right approach I should use.
答案1
得分: 2
看起来你想根据一组列创建一个数据透视表,只需使用pandas.pivot()
。
示例
import pandas as pd
df = pd.DataFrame({
'code': [108,108,108,108,132,132],
'country': ['burundi','burundi','burundi','burundi','cabo verde','cabo verde'],
'year': [2010,2010,2015,2015,2005,2005],
'series': ['urban_rate','rural_rate','urban_rate','rural_rate','urban_rate','rural_rate'],
'value': [5.9,3.0,5.6,2.7,3.3,-0.2]
})
pd.pivot(df, index=['code','country','year'], columns='series', values='value').reset_index()
输出
code | country | year | rural_rate | urban_rate |
---|---|---|---|---|
108 | burundi | 2010 | 3.0 | 5.9 |
108 | burundi | 2015 | 2.7 | 5.6 |
132 | cabo verde | 2005 | -0.2 | 3.3 |
英文:
It looks like you wanna get a pivot based on group of columns, so simply use pandas.pivot()
.
Example
import pandas as pd
df = pd.DataFrame({
'code': [108,108,108,108,132,132],
'country': ['burundi','burundi','burundi','burundi','cabo verde','cabo verde'],
'year': [2010,2010,2015,2015,2005,2005],
'series': ['urban_rate','rural_rate','urban_rate','rural_rate','urban_rate','rural_rate'],
'value': [5.9,3.0,5.6,2.7,3.3,-0.2]
})
pd.pivot(df, index=['code','country','year'], columns='series', values='value').reset_index()
Output
code | country | year | rural_rate | urban_rate |
---|---|---|---|---|
108 | burundi | 2010 | 3.0 | 5.9 |
108 | burundi | 2015 | 2.7 | 5.6 |
132 | cabo verde | 2005 | -0.2 | 3.3 |
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论