对 pandas 透视表进行排序,保持多个索引匹配。

huangapple go评论69阅读模式
英文:

Sorting pandas pivot table keeping the multiple indexes match

问题

可以在保持索引匹配的情况下对DataFrame进行排序吗?

我的DataFrame如下:

    		     budget	population
    state	fu	 	
    acre	ac1	 600	50
            ac2	 25 	110
    bahia	ba1	 2300	80
            ba2	  1  	10
    paulo	sp1	 1000	100
            sp2	 1000	230

我想要得到下面的输出,因为巴伊亚州的总预算更大:

		         budget	population
    state	fu	 	
    bahia	ba1	 2300	80
            ba2	  1  	10
    paulo	sp1	 1000	100
            sp2	 1000	230
    acre	ac1	 600	50
            ac2	 25 	110

但是使用sort_values()后,我得到了下面的输出:

    		      budget population
    state	fu		
    bahia	ba1	  2300	 80
    paulo	sp1	  1000	 100
            sp2	  1000	 230
    acre	ac1	  600	 50
            ac2	  25	 110
    bahia	ba2	   1	 10
英文:

Is it possible to sort a dataframe keeping the match between the indexes?

My df:

		     budget	population
state	fu	 	
acre	ac1	 600	50
        ac2	 25 	110
bahia	ba1	 2300	80
        ba2	  1  	10
paulo	sp1	 1000	100
        sp2	 1000	230

I would like to get the output below, since index bahia has the grater total budget:

	         budget	population
state	fu	 	
bahia	ba1	 2300	80
        ba2	  1  	10
paulo	sp1	 1000	100
        sp2	 1000	230
acre	ac1	 600	50
        ac2	 25 	110

But after using sort_values() I get the output below:

		      budget population
state	fu		
bahia	ba1	  2300	 80
paulo	sp1	  1000	 100
        sp2	  1000	 230
acre	ac1	  600	 50
        ac2	  25	 110
bahia	ba2	   1	 10

I updated the question to give more context

答案1

得分: 1

这是一种在不计算总预算的情况下进行排序的方法。根据我的理解,即使一些州的总预算比其他州大,但fu预算较小,这个方法也应该返回您所需的结果。

首先,我们按州分组budget
其次,计算max预算。
第三,按降序对这些值进行排序。
第四,获取这个新的Seriesstate名称的index
最后,使用新的顺序reindex原始的df的适当level

new_index = df["budget"]\
    .groupby("state")\
    .max()\
    .sort_values(ascending=False)\
    .index # 只返回索引

df.reindex(new_index, level=0)

输出:

              budget  population
state fu                       
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110
英文:

Here is a way to sort without calculating the total budget. IIUC, this should return what you need, even if some states have a larger total budget than others, but smaller fu budgets.

First, we group budget by state.<br>
Second, calculate the max budget.<br>
Third, sort those values in descending order.<br>
Fourth, take the index of this new Series of state names.<br>
Lastly, reindex the appropriate level of our original df with the new order.

new_index = df[&quot;budget&quot;]\
    .groupby(&quot;state&quot;)\
    .max()\
    .sort_values(ascending=False)\
    .index # just return the index

df.reindex(new_index, level=0)

Output:

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

答案2

得分: 0

以下是已翻译的内容:

有多种方法可以做到这一点。其中一种方法是计算您想要排序的度量标准(总预算),对数据框进行排序,然后删除新创建的变量。

我们将不得不重置原始数据框的索引,以便能够正确合并。

# 创建总预算变量
gp = df.groupby('state')['budget'].sum().reset_index()
gp.columns = ['state', 'total_budget']

# 与总预算变量合并
out = df.reset_index().merge(gp, on='state')

# 基于total_budget进行排序
out = out.sort_values('total_budget', ascending=False)
out.drop('total_budget', inplace=True, axis=1)
out = out.set_index(['state', 'fu'])

最终输出如下:

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

除此之外,更紧凑的解决方案是

out = pd.concat([x[1] for x in sorted(df.reset_index().groupby('state'), key=lambda x: -np.sum(x[1].budget))]).set_index(['state', 'fu'])

在这里,out 也会产生与之前相同的输出。

英文:

There are multiple ways of doing this. One of the ways would be to calculate the metric that you want to sort on (total budget), sort the dataframe and then remove the newly created variable.

We will have to reset the indices of the original dataframe inorder to be able to merge properly.

#Creating the total budget variable
gp = df.groupby(&#39;state&#39;)[&#39;budget&#39;].sum().reset_index()
gp.columns = [&#39;state&#39;,&#39;total_budget&#39;]

#Merging with the total budget variable
out = df.reset_index().merge(gp, on=&#39;state&#39;)

#Sorting based on total_budget
out = out.sort_values(&#39;total_budget&#39;, ascending = False)
out.drop(&#39;total_budget&#39;,inplace = True, axis = 1)
out = out.set_index([&#39;state&#39;,&#39;fu&#39;])

The final output looks like

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

In addition to this, a more compact solution would be

out = pd.concat([x[1] for x in sorted(df.reset_index().groupby(&#39;state&#39;), key = lambda x : -np.sum(x[1].budget) )]).set_index([&#39;state&#39;,&#39;fu&#39;])

Here too, out gives the same output as before.

huangapple
  • 本文由 发表于 2020年1月4日 12:14:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/59587822.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定