2020年1月4日 12:14:34go评论95阅读模式

英文:

Sorting pandas pivot table keeping the multiple indexes match

问题

可以在保持索引匹配的情况下对DataFrame进行排序吗？

我的DataFrame如下：

    		     budget	population
    state	fu	 	
    acre	ac1	 600	50
            ac2	 25 	110
    bahia	ba1	 2300	80
            ba2	  1  	10
    paulo	sp1	 1000	100
            sp2	 1000	230

我想要得到下面的输出，因为巴伊亚州的总预算更大：

		         budget	population
    state	fu	 	
    bahia	ba1	 2300	80
            ba2	  1  	10
    paulo	sp1	 1000	100
            sp2	 1000	230
    acre	ac1	 600	50
            ac2	 25 	110

但是使用sort_values()后，我得到了下面的输出：

    		      budget population
    state	fu		
    bahia	ba1	  2300	 80
    paulo	sp1	  1000	 100
            sp2	  1000	 230
    acre	ac1	  600	 50
            ac2	  25	 110
    bahia	ba2	   1	 10

英文:

Is it possible to sort a dataframe keeping the match between the indexes?

My df:

		     budget	population
state	fu	 	
acre	ac1	 600	50
        ac2	 25 	110
bahia	ba1	 2300	80
        ba2	  1  	10
paulo	sp1	 1000	100
        sp2	 1000	230

I would like to get the output below, since index bahia has the grater total budget:

	         budget	population
state	fu	 	
bahia	ba1	 2300	80
        ba2	  1  	10
paulo	sp1	 1000	100
        sp2	 1000	230
acre	ac1	 600	50
        ac2	 25 	110

But after using sort_values() I get the output below:

		      budget population
state	fu		
bahia	ba1	  2300	 80
paulo	sp1	  1000	 100
        sp2	  1000	 230
acre	ac1	  600	 50
        ac2	  25	 110
bahia	ba2	   1	 10

I updated the question to give more context

答案1

得分: 1

这是一种在不计算总预算的情况下进行排序的方法。根据我的理解，即使一些州的总预算比其他州大，但fu预算较小，这个方法也应该返回您所需的结果。

首先，我们按州分组budget。
其次，计算max预算。
第三，按降序对这些值进行排序。
第四，获取这个新的Series的state名称的index。
最后，使用新的顺序reindex原始的df的适当level。

new_index = df["budget"]\
    .groupby("state")\
    .max()\
    .sort_values(ascending=False)\
    .index # 只返回索引
df.reindex(new_index, level=0)

输出：

              budget  population
state fu                       
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

英文:

Here is a way to sort without calculating the total budget. IIUC, this should return what you need, even if some states have a larger total budget than others, but smaller fu budgets.

First, we group budget by state.<br>
Second, calculate the max budget.<br>
Third, sort those values in descending order.<br>
Fourth, take the index of this new Series of state names.<br>
Lastly, reindex the appropriate level of our original df with the new order.

new_index = df[&quot;budget&quot;]\
    .groupby(&quot;state&quot;)\
    .max()\
    .sort_values(ascending=False)\
    .index # just return the index
df.reindex(new_index, level=0)

Output:

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

答案2

得分: 0

以下是已翻译的内容：

有多种方法可以做到这一点。其中一种方法是计算您想要排序的度量标准（总预算），对数据框进行排序，然后删除新创建的变量。

我们将不得不重置原始数据框的索引，以便能够正确合并。

# 创建总预算变量
gp = df.groupby('state')['budget'].sum().reset_index()
gp.columns = ['state', 'total_budget']
# 与总预算变量合并
out = df.reset_index().merge(gp, on='state')
# 基于total_budget进行排序
out = out.sort_values('total_budget', ascending=False)
out.drop('total_budget', inplace=True, axis=1)
out = out.set_index(['state', 'fu'])

最终输出如下：

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

除此之外，更紧凑的解决方案是

out = pd.concat([x[1] for x in sorted(df.reset_index().groupby('state'), key=lambda x: -np.sum(x[1].budget))]).set_index(['state', 'fu'])

在这里，out 也会产生与之前相同的输出。

英文:

There are multiple ways of doing this. One of the ways would be to calculate the metric that you want to sort on (total budget), sort the dataframe and then remove the newly created variable.

We will have to reset the indices of the original dataframe inorder to be able to merge properly.

#Creating the total budget variable
gp = df.groupby(&#39;state&#39;)[&#39;budget&#39;].sum().reset_index()
gp.columns = [&#39;state&#39;,&#39;total_budget&#39;]
#Merging with the total budget variable
out = df.reset_index().merge(gp, on=&#39;state&#39;)
#Sorting based on total_budget
out = out.sort_values(&#39;total_budget&#39;, ascending = False)
out.drop(&#39;total_budget&#39;,inplace = True, axis = 1)
out = out.set_index([&#39;state&#39;,&#39;fu&#39;])

The final output looks like

           budget  population
state fu                     
bahia ba1    2300          80
      ba2       1          10
paulo sp1    1000         100
      sp2    1000         230
acre  ac1     600          50
      ac2      25         110

In addition to this, a more compact solution would be

out = pd.concat([x[1] for x in sorted(df.reset_index().groupby(&#39;state&#39;), key = lambda x : -np.sum(x[1].budget) )]).set_index([&#39;state&#39;,&#39;fu&#39;])

Here too, out gives the same output as before.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

对 pandas 透视表进行排序，保持多个索引匹配。

问题

答案1

答案2

如何在Pandas中查找重复的列？

删除按正则表达式规则匹配的内容。

如何根据用户的选择更改数值？

Ansible使用Python3 pip安装问题

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。