英文:
Sorting pandas dataframe based on two columns
问题
这是我的数据框:
Level Parent Type
0 A1 None Saga
1 B4 A2 Epic
2 C B1 Story
3 C B1 Story
4 B3 A2 Epic
5 C B2 Story
6 C B2 Story
7 C B1 Story
8 C B2 Story
9 A2 None Saga
10 B1 A1 Epic
11 C B2 Story
12 C B2 Story
13 C B3 Story
14 B2 A1 Epic
15 C B4 Story
16 C B4 Story
17 C B4 Story
我想对数据框进行排序。
Level是最高的排序类别。
如果有其他行的父级与其他级别行的内容相同,那么此行应直接显示在其下。
较低级别的行优先于中间级别的行,所以类似这样:
有人知道我如何解决这个问题吗?我已经尝试过使用sort_values
,但这不起作用。
英文:
This is my dataframe:
Level Parent Type
0 A1 None Saga
1 B4 A2 Epic
2 C B1 Story
3 C B1 Story
4 B3 A2 Epic
5 C B2 Story
6 C B2 Story
7 C B1 Story
8 C B2 Story
9 A2 None Saga
10 B1 A1 Epic
11 C B2 Story
12 C B2 Story
13 C B3 Story
14 B2 A1 Epic
15 C B4 Story
16 C B4 Story
17 C B4 Story
I want to sort the dataframe.
Level is the highest category by which to sort.
If there are other rows which have the same content in the parent line as other level lines, this line should appear directly below.
Lower levels have priority over the middle ones, so something like this:
Does anyone know how I can solve this= I already tried it with sort_values, but this does not work
答案1
得分: 1
这是一个图问题,networkx
可以非常有帮助。思路是在对数据框进行排序之前创建自己的边缘排名(父节点,级别):
# pip install networkx
import networkx as nx
import numpy as np
# 创建图
G = nx.from_pandas_edgelist(df, source='Parent', target='Level', create_using=nx.DiGraph)
root = np.nan # <- 假设只有一个根节点
leaves = [node for node, degree in G.out_degree() if degree == 0]
# 寻找所有路径
paths = []
for leaf in leaves:
for path in nx.all_simple_edge_paths(G, root, leaf):
paths.append(pd.DataFrame(path, columns=['Parent', 'Level']))
# 定义每个边的排名
rank = pd.concat(paths).drop_duplicates(ignore_index=True).reset_index(names='Rank')
# 将排名广播到所有行
df = df.merge(rank, on=['Level', 'Parent']).sort_values('Rank')
输出:
>>> df
Level Parent Type Rank
0 A1 NaN Saga 0
12 B1 A1 Epic 1
2 C B1 Story 2
3 C B1 Story 2
4 C B1 Story 2
14 B2 A1 Epic 3
10 C B2 Story 4
9 C B2 Story 4
8 C B2 Story 4
7 C B2 Story 4
6 C B2 Story 4
11 A2 NaN Saga 5
1 B4 A2 Epic 6
15 C B4 Story 7
16 C B4 Story 7
17 C B4 Story 7
5 B3 A2 Epic 8
13 C B3 Story 9
中间步骤:
>>> rank
Rank Parent Level
0 0 NaN A1
1 1 A1 B1
2 2 B1 C
3 3 A1 B2
4 4 B2 C
5 5 NaN A2
6 6 A2 B4
7 7 B4 C
8 8 A2 B3
9 9 B3 C
英文:
This is a graph problem, networkx
can be very helpful. The idea is to create your own rank of edges (Parent, Level) before sort your dataframe:
# pip install networkx
import networkx as nx
import numpy as np
# Create the graph
G = nx.from_pandas_edgelist(df, source='Parent', target='Level', create_using=nx.DiGraph)
root = np.nan # <- assuming you have only one root
leaves = [node for node, degree in G.out_degree() if degree == 0]
# Find all paths
paths = []
for leaf in leaves:
for path in nx.all_simple_edge_paths(G, root, leaf):
paths.append(pd.DataFrame(path, columns=['Parent', 'Level']))
# Define the rank of each edge
rank = pd.concat(paths).drop_duplicates(ignore_index=True).reset_index(names='Rank')
# Broadcast the rank to all rows
df = df.merge(rank, on=['Level', 'Parent']).sort_values('Rank')
Output:
>>> df
Level Parent Type Rank
0 A1 NaN Saga 0
12 B1 A1 Epic 1
2 C B1 Story 2
3 C B1 Story 2
4 C B1 Story 2
14 B2 A1 Epic 3
10 C B2 Story 4
9 C B2 Story 4
8 C B2 Story 4
7 C B2 Story 4
6 C B2 Story 4
11 A2 NaN Saga 5
1 B4 A2 Epic 6
15 C B4 Story 7
16 C B4 Story 7
17 C B4 Story 7
5 B3 A2 Epic 8
13 C B3 Story 9
Intermediate steps:
>>> rank
Rank Parent Level
0 0 NaN A1
1 1 A1 B1
2 2 B1 C
3 3 A1 B2
4 4 B2 C
5 5 NaN A2
6 6 A2 B4
7 7 B4 C
8 8 A2 B3
9 9 B3 C
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论