2023年6月29日 16:35:09go评论100阅读模式

英文:

Sorting pandas dataframe based on two columns

问题

这是我的数据框：

   Level Parent   Type
0     A1   None   Saga
1     B4     A2   Epic
2      C     B1  Story
3      C     B1  Story
4     B3     A2   Epic
5      C     B2  Story
6      C     B2  Story
7      C     B1  Story
8      C     B2  Story
9     A2   None   Saga
10    B1     A1   Epic
11     C     B2  Story
12     C     B2  Story
13     C     B3  Story
14    B2     A1   Epic
15     C     B4  Story
16     C     B4  Story
17     C     B4  Story

我想对数据框进行排序。

Level是最高的排序类别。

如果有其他行的父级与其他级别行的内容相同，那么此行应直接显示在其下。

较低级别的行优先于中间级别的行，所以类似这样：

有人知道我如何解决这个问题吗？我已经尝试过使用sort_values，但这不起作用。

英文:

This is my dataframe:

   Level Parent   Type
0     A1   None   Saga
1     B4     A2   Epic
2      C     B1  Story
3      C     B1  Story
4     B3     A2   Epic
5      C     B2  Story
6      C     B2  Story
7      C     B1  Story
8      C     B2  Story
9     A2   None   Saga
10    B1     A1   Epic
11     C     B2  Story
12     C     B2  Story
13     C     B3  Story
14    B2     A1   Epic
15     C     B4  Story
16     C     B4  Story
17     C     B4  Story

I want to sort the dataframe.

Level is the highest category by which to sort.

If there are other rows which have the same content in the parent line as other level lines, this line should appear directly below.

Lower levels have priority over the middle ones, so something like this:

Does anyone know how I can solve this= I already tried it with sort_values, but this does not work

答案1

得分: 1

这是一个图问题，networkx 可以非常有帮助。思路是在对数据框进行排序之前创建自己的边缘排名（父节点，级别）：

# pip install networkx
import networkx as nx
import numpy as np
# 创建图
G = nx.from_pandas_edgelist(df, source='Parent', target='Level', create_using=nx.DiGraph)
root = np.nan  # <- 假设只有一个根节点
leaves = [node for node, degree in G.out_degree() if degree == 0]
# 寻找所有路径
paths = []
for leaf in leaves:
    for path in nx.all_simple_edge_paths(G, root, leaf):
        paths.append(pd.DataFrame(path, columns=['Parent', 'Level']))
# 定义每个边的排名
rank = pd.concat(paths).drop_duplicates(ignore_index=True).reset_index(names='Rank')
# 将排名广播到所有行
df = df.merge(rank, on=['Level', 'Parent']).sort_values('Rank')

输出:

>>> df
   Level Parent   Type  Rank
0     A1    NaN   Saga     0
12    B1     A1   Epic     1
2      C     B1  Story     2
3      C     B1  Story     2
4      C     B1  Story     2
14    B2     A1   Epic     3
10     C     B2  Story     4
9      C     B2  Story     4
8      C     B2  Story     4
7      C     B2  Story     4
6      C     B2  Story     4
11    A2    NaN   Saga     5
1     B4     A2   Epic     6
15     C     B4  Story     7
16     C     B4  Story     7
17     C     B4  Story     7
5     B3     A2   Epic     8
13     C     B3  Story     9

中间步骤:

>>> rank
   Rank Parent Level
0     0    NaN    A1
1     1     A1    B1
2     2     B1     C
3     3     A1    B2
4     4     B2     C
5     5    NaN    A2
6     6     A2    B4
7     7     B4     C
8     8     A2    B3
9     9     B3     C

英文:

This is a graph problem, networkx can be very helpful. The idea is to create your own rank of edges (Parent, Level) before sort your dataframe:

# pip install networkx
import networkx as nx
import numpy as np
# Create the graph
G = nx.from_pandas_edgelist(df, source=&#39;Parent&#39;, target=&#39;Level&#39;, create_using=nx.DiGraph)
root = np.nan  # &lt;- assuming you have only one root
leaves = [node for node, degree in G.out_degree() if degree == 0]
# Find all paths
paths = []
for leaf in leaves:
    for path in nx.all_simple_edge_paths(G, root, leaf):
        paths.append(pd.DataFrame(path, columns=[&#39;Parent&#39;, &#39;Level&#39;]))
# Define the rank of each edge
rank = pd.concat(paths).drop_duplicates(ignore_index=True).reset_index(names=&#39;Rank&#39;)
# Broadcast the rank to all rows
df = df.merge(rank, on=[&#39;Level&#39;, &#39;Parent&#39;]).sort_values(&#39;Rank&#39;)

Output:

&gt;&gt;&gt; df
   Level Parent   Type  Rank
0     A1    NaN   Saga     0
12    B1     A1   Epic     1
2      C     B1  Story     2
3      C     B1  Story     2
4      C     B1  Story     2
14    B2     A1   Epic     3
10     C     B2  Story     4
9      C     B2  Story     4
8      C     B2  Story     4
7      C     B2  Story     4
6      C     B2  Story     4
11    A2    NaN   Saga     5
1     B4     A2   Epic     6
15     C     B4  Story     7
16     C     B4  Story     7
17     C     B4  Story     7
5     B3     A2   Epic     8
13     C     B3  Story     9

Intermediate steps:

&gt;&gt;&gt; rank
   Rank Parent Level
0     0    NaN    A1
1     1     A1    B1
2     2     B1     C
3     3     A1    B2
4     4     B2     C
5     5    NaN    A2
6     6     A2    B4
7     7     B4     C
8     8     A2    B3
9     9     B3     C

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Sorting pandas dataframe based on two columns.

问题

答案1

Fatal error in launcher: 在使用 Django 时发生错误

如何在具有条件的转录中计算特定关键词的数量

Feature extraction process using too much memory and causing a crash. What can I do?

Connect to MQTT Broker with .env variables 使用.env变量连接到MQTT代理

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。