英文:
Sorting pandas pivot table keeping the multiple indexes match
问题
可以在保持索引匹配的情况下对DataFrame进行排序吗?
我的DataFrame如下:
budget population
state fu
acre ac1 600 50
ac2 25 110
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
我想要得到下面的输出,因为巴伊亚州的总预算更大:
budget population
state fu
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
但是使用sort_values()
后,我得到了下面的输出:
budget population
state fu
bahia ba1 2300 80
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
bahia ba2 1 10
英文:
Is it possible to sort a dataframe keeping the match between the indexes?
My df:
budget population
state fu
acre ac1 600 50
ac2 25 110
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
I would like to get the output below, since index bahia has the grater total budget:
budget population
state fu
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
But after using sort_values() I get the output below:
budget population
state fu
bahia ba1 2300 80
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
bahia ba2 1 10
I updated the question to give more context
答案1
得分: 1
这是一种在不计算总预算的情况下进行排序的方法。根据我的理解,即使一些州的总预算比其他州大,但fu
预算较小,这个方法也应该返回您所需的结果。
首先,我们按州分组budget
。
其次,计算max
预算。
第三,按降序对这些值进行排序。
第四,获取这个新的Series
的state
名称的index
。
最后,使用新的顺序reindex
原始的df
的适当level
。
new_index = df["budget"]\
.groupby("state")\
.max()\
.sort_values(ascending=False)\
.index # 只返回索引
df.reindex(new_index, level=0)
输出:
budget population
state fu
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
英文:
Here is a way to sort without calculating the total budget. IIUC, this should return what you need, even if some states have a larger total budget than others, but smaller fu
budgets.
First, we group budget
by state.<br>
Second, calculate the max
budget.<br>
Third, sort those values in descending order.<br>
Fourth, take the index
of this new Series
of state
names.<br>
Lastly, reindex
the appropriate level
of our original df
with the new order.
new_index = df["budget"]\
.groupby("state")\
.max()\
.sort_values(ascending=False)\
.index # just return the index
df.reindex(new_index, level=0)
Output:
budget population
state fu
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
答案2
得分: 0
以下是已翻译的内容:
有多种方法可以做到这一点。其中一种方法是计算您想要排序的度量标准(总预算),对数据框进行排序,然后删除新创建的变量。
我们将不得不重置原始数据框的索引,以便能够正确合并。
# 创建总预算变量
gp = df.groupby('state')['budget'].sum().reset_index()
gp.columns = ['state', 'total_budget']
# 与总预算变量合并
out = df.reset_index().merge(gp, on='state')
# 基于total_budget进行排序
out = out.sort_values('total_budget', ascending=False)
out.drop('total_budget', inplace=True, axis=1)
out = out.set_index(['state', 'fu'])
最终输出如下:
budget population
state fu
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
除此之外,更紧凑的解决方案是
out = pd.concat([x[1] for x in sorted(df.reset_index().groupby('state'), key=lambda x: -np.sum(x[1].budget))]).set_index(['state', 'fu'])
在这里,out
也会产生与之前相同的输出。
英文:
There are multiple ways of doing this. One of the ways would be to calculate the metric that you want to sort on (total budget), sort the dataframe and then remove the newly created variable.
We will have to reset the indices of the original dataframe inorder to be able to merge properly.
#Creating the total budget variable
gp = df.groupby('state')['budget'].sum().reset_index()
gp.columns = ['state','total_budget']
#Merging with the total budget variable
out = df.reset_index().merge(gp, on='state')
#Sorting based on total_budget
out = out.sort_values('total_budget', ascending = False)
out.drop('total_budget',inplace = True, axis = 1)
out = out.set_index(['state','fu'])
The final output looks like
budget population
state fu
bahia ba1 2300 80
ba2 1 10
paulo sp1 1000 100
sp2 1000 230
acre ac1 600 50
ac2 25 110
In addition to this, a more compact solution would be
out = pd.concat([x[1] for x in sorted(df.reset_index().groupby('state'), key = lambda x : -np.sum(x[1].budget) )]).set_index(['state','fu'])
Here too, out
gives the same output as before.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论