对 pandas 透视表进行排序,保持多个索引匹配。

huangapple go评论87阅读模式

Sorting pandas pivot table keeping the multiple indexes match




    		     budget	population
    state	fu	 	
    acre	ac1	 600	50
            ac2	 25 	110
    bahia	ba1	 2300	80
            ba2	  1  	10
    paulo	sp1	 1000	100
            sp2	 1000	230


		         budget	population
    state	fu	 	
    bahia	ba1	 2300	80
            ba2	  1  	10
    paulo	sp1	 1000	100
            sp2	 1000	230
    acre	ac1	 600	50
            ac2	 25 	110


    		      budget population
    state	fu		
    bahia	ba1	  2300	 80
    paulo	sp1	  1000	 100
            sp2	  1000	 230
    acre	ac1	  600	 50
            ac2	  25	 110
    bahia	ba2	   1	 10

Is it possible to sort a dataframe keeping the match between the indexes?

My df:

		     budget	population
state	fu	 	
acre	ac1	 600	50
        ac2	 25 	110
bahia	ba1	 2300	80
        ba2	  1  	10
paulo	sp1	 1000	100
        sp2	 1000	230

I would like to get the output below, since index bahia has the grater total budget:

	         budget	population
state	fu	 	
bahia	ba1	 2300	80
        ba2	  1  	10
paulo	sp1	 1000	100
        sp2	 1000	230
acre	ac1	 600	50
        ac2	 25 	110

But after using sort_values() I get the output below:

		      budget population
state	fu		
bahia	ba1	  2300	 80
paulo	sp1	  1000	 100
        sp2	  1000	 230
acre	ac1	  600	 50
        ac2	  25	 110
bahia	ba2	   1	 10

I updated the question to give more context


得分: 1



new_index = df["budget"]\
    .index # 只返回索引

df.reindex(new_index, level=0)


              budget  population
state fu                       
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

Here is a way to sort without calculating the total budget. IIUC, this should return what you need, even if some states have a larger total budget than others, but smaller fu budgets.

First, we group budget by state.<br>
Second, calculate the max budget.<br>
Third, sort those values in descending order.<br>
Fourth, take the index of this new Series of state names.<br>
Lastly, reindex the appropriate level of our original df with the new order.

new_index = df[&quot;budget&quot;]\
    .index # just return the index

df.reindex(new_index, level=0)


           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110


得分: 0




# 创建总预算变量
gp = df.groupby('state')['budget'].sum().reset_index()
gp.columns = ['state', 'total_budget']

# 与总预算变量合并
out = df.reset_index().merge(gp, on='state')

# 基于total_budget进行排序
out = out.sort_values('total_budget', ascending=False)
out.drop('total_budget', inplace=True, axis=1)
out = out.set_index(['state', 'fu'])


           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110


out = pd.concat([x[1] for x in sorted(df.reset_index().groupby('state'), key=lambda x: -np.sum(x[1].budget))]).set_index(['state', 'fu'])

在这里,out 也会产生与之前相同的输出。


There are multiple ways of doing this. One of the ways would be to calculate the metric that you want to sort on (total budget), sort the dataframe and then remove the newly created variable.

We will have to reset the indices of the original dataframe inorder to be able to merge properly.

#Creating the total budget variable
gp = df.groupby(&#39;state&#39;)[&#39;budget&#39;].sum().reset_index()
gp.columns = [&#39;state&#39;,&#39;total_budget&#39;]

#Merging with the total budget variable
out = df.reset_index().merge(gp, on=&#39;state&#39;)

#Sorting based on total_budget
out = out.sort_values(&#39;total_budget&#39;, ascending = False)
out.drop(&#39;total_budget&#39;,inplace = True, axis = 1)
out = out.set_index([&#39;state&#39;,&#39;fu&#39;])

The final output looks like

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

In addition to this, a more compact solution would be

out = pd.concat([x[1] for x in sorted(df.reset_index().groupby(&#39;state&#39;), key = lambda x : -np.sum(x[1].budget) )]).set_index([&#39;state&#39;,&#39;fu&#39;])

Here too, out gives the same output as before.

  • 本文由 发表于 2020年1月4日 12:14:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/59587822.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
