2023年5月14日 17:41:02go评论72阅读模式

英文:

How to manipulate a dataframe in Python?

问题

我有以下的样本数据框：

我需要将它转换成如下形式：

请注意：

在新的列中，原始数据框中的最后两列已被替换为原始数据框的倒数第二列的地区变量。
原始数据框重复了列直到“Year”列，新数据框应该有更少的行，因为新变量现在以“转置方式”存在。

我之前认为会奏效的方法并没有奏效。因为行仍然是重复的：

print(PopRatesC.shape)
PopRates = PopRatesC.copy()
PopRates.rename(columns={'Value':'PopRate'}, inplace=True)
PopRates=pd.get_dummies(data=PopRates, columns=['Series'])
PopRates['Rural_PopRate']= PopRates['Series_Rural_Rate']*PopRates['PopRate']
PopRates['Urban_PopRate']= PopRates['Series_Urban_Rate']*PopRates['PopRate']
print(PopRates.shape)
PopRates[416:422]

请解释我应该使用的正确方法。

英文:

I have the following sample dataframe:

I need to transform it to look like :

Pls note that:

In the new column, the last 2 columns in original data frame have been replaced by the district variables of second last column of original df
The original df was duplicating columns upto 'Year' column and the new df should have fewer rows because the new variables are now in a 'transposed manner'

What I had thought would work out never did. Because the rows are still duplicated:

print(PopRatesC.shape)
PopRates = PopRatesC.copy()
PopRates.rename(columns={&#39;Value&#39;:&#39;PopRate&#39;}, inplace=True)
PopRates=pd.get_dummies(data=PopRates, columns=[&#39;Series&#39;])
PopRates[&#39;Rural_PopRate&#39;]= PopRates[&#39;Series_Rural_Rate&#39;]*PopRates[&#39;PopRate&#39;]
PopRates[&#39;Urban_PopRate&#39;]= PopRates[&#39;Series_Urban_Rate&#39;]*PopRates[&#39;PopRate&#39;]
print(PopRates.shape)
PopRates[416:422]

Kindly explain the right approach I should use.

答案1

得分: 2

看起来你想根据一组列创建一个数据透视表，只需使用pandas.pivot()。

示例

import pandas as pd

df = pd.DataFrame({
    'code': [108,108,108,108,132,132],
    'country': ['burundi','burundi','burundi','burundi','cabo verde','cabo verde'],
    'year': [2010,2010,2015,2015,2005,2005],
    'series': ['urban_rate','rural_rate','urban_rate','rural_rate','urban_rate','rural_rate'],
    'value': [5.9,3.0,5.6,2.7,3.3,-0.2]
})

pd.pivot(df, index=['code','country','year'], columns='series', values='value').reset_index()

输出

code	country	year	rural_rate	urban_rate
108	burundi	2010	3.0	5.9
108	burundi	2015	2.7	5.6
132	cabo verde	2005	-0.2	3.3

英文:

It looks like you wanna get a pivot based on group of columns, so simply use pandas.pivot().

Example

import pandas as pd

df = pd.DataFrame({
    &#39;code&#39;: [108,108,108,108,132,132],
    &#39;country&#39;: [&#39;burundi&#39;,&#39;burundi&#39;,&#39;burundi&#39;,&#39;burundi&#39;,&#39;cabo verde&#39;,&#39;cabo verde&#39;],
    &#39;year&#39;: [2010,2010,2015,2015,2005,2005],
    &#39;series&#39;: [&#39;urban_rate&#39;,&#39;rural_rate&#39;,&#39;urban_rate&#39;,&#39;rural_rate&#39;,&#39;urban_rate&#39;,&#39;rural_rate&#39;],
    &#39;value&#39;: [5.9,3.0,5.6,2.7,3.3,-0.2]
})

pd.pivot(df, index=[&#39;code&#39;,&#39;country&#39;,&#39;year&#39;], columns=&#39;series&#39;, values=&#39;value&#39;).reset_index()

Output

code	country	year	rural_rate	urban_rate
108	burundi	2010	3.0	5.9
108	burundi	2015	2.7	5.6
132	cabo verde	2005	-0.2	3.3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python中操作数据框？

问题

答案1

示例

输出

Example

Output

冲突 pyinstaller vs pathlib –> 移除 pathlib 导致删除 anaconda 导航器

如何在Python中将Excel数据转换为JSON？

‘patch’ 方法在收到 URL 中的 pk 时生成 500 错误。

如何将变量值更新为当前值（当前值 * 100 / colsums()）

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论