Sorting pandas dataframe based on two columns.

huangapple go评论69阅读模式
英文:

Sorting pandas dataframe based on two columns

问题

这是我的数据框:

   Level Parent   Type
0     A1   None   Saga
1     B4     A2   Epic
2      C     B1  Story
3      C     B1  Story
4     B3     A2   Epic
5      C     B2  Story
6      C     B2  Story
7      C     B1  Story
8      C     B2  Story
9     A2   None   Saga
10    B1     A1   Epic
11     C     B2  Story
12     C     B2  Story
13     C     B3  Story
14    B2     A1   Epic
15     C     B4  Story
16     C     B4  Story
17     C     B4  Story

我想对数据框进行排序。

Level是最高的排序类别。

如果有其他行的父级与其他级别行的内容相同,那么此行应直接显示在其下。

较低级别的行优先于中间级别的行,所以类似这样:

Sorting pandas dataframe based on two columns.

有人知道我如何解决这个问题吗?我已经尝试过使用sort_values,但这不起作用。

英文:

This is my dataframe:

   Level Parent   Type
0     A1   None   Saga
1     B4     A2   Epic
2      C     B1  Story
3      C     B1  Story
4     B3     A2   Epic
5      C     B2  Story
6      C     B2  Story
7      C     B1  Story
8      C     B2  Story
9     A2   None   Saga
10    B1     A1   Epic
11     C     B2  Story
12     C     B2  Story
13     C     B3  Story
14    B2     A1   Epic
15     C     B4  Story
16     C     B4  Story
17     C     B4  Story

I want to sort the dataframe.

Level is the highest category by which to sort.

If there are other rows which have the same content in the parent line as other level lines, this line should appear directly below.

Lower levels have priority over the middle ones, so something like this:

Sorting pandas dataframe based on two columns.

Does anyone know how I can solve this= I already tried it with sort_values, but this does not work

答案1

得分: 1

这是一个图问题,networkx 可以非常有帮助。思路是在对数据框进行排序之前创建自己的边缘排名(父节点,级别):

# pip install networkx
import networkx as nx
import numpy as np

# 创建图
G = nx.from_pandas_edgelist(df, source='Parent', target='Level', create_using=nx.DiGraph)
root = np.nan  # <- 假设只有一个根节点
leaves = [node for node, degree in G.out_degree() if degree == 0]

# 寻找所有路径
paths = []
for leaf in leaves:
    for path in nx.all_simple_edge_paths(G, root, leaf):
        paths.append(pd.DataFrame(path, columns=['Parent', 'Level']))

# 定义每个边的排名
rank = pd.concat(paths).drop_duplicates(ignore_index=True).reset_index(names='Rank')

# 将排名广播到所有行
df = df.merge(rank, on=['Level', 'Parent']).sort_values('Rank')

输出:

>>> df
   Level Parent   Type  Rank
0     A1    NaN   Saga     0
12    B1     A1   Epic     1
2      C     B1  Story     2
3      C     B1  Story     2
4      C     B1  Story     2
14    B2     A1   Epic     3
10     C     B2  Story     4
9      C     B2  Story     4
8      C     B2  Story     4
7      C     B2  Story     4
6      C     B2  Story     4
11    A2    NaN   Saga     5
1     B4     A2   Epic     6
15     C     B4  Story     7
16     C     B4  Story     7
17     C     B4  Story     7
5     B3     A2   Epic     8
13     C     B3  Story     9

中间步骤:

>>> rank
   Rank Parent Level
0     0    NaN    A1
1     1     A1    B1
2     2     B1     C
3     3     A1    B2
4     4     B2     C
5     5    NaN    A2
6     6     A2    B4
7     7     B4     C
8     8     A2    B3
9     9     B3     C
英文:

This is a graph problem, networkx can be very helpful. The idea is to create your own rank of edges (Parent, Level) before sort your dataframe:

# pip install networkx
import networkx as nx
import numpy as np

# Create the graph
G = nx.from_pandas_edgelist(df, source=&#39;Parent&#39;, target=&#39;Level&#39;, create_using=nx.DiGraph)
root = np.nan  # &lt;- assuming you have only one root
leaves = [node for node, degree in G.out_degree() if degree == 0]

# Find all paths
paths = []
for leaf in leaves:
    for path in nx.all_simple_edge_paths(G, root, leaf):
        paths.append(pd.DataFrame(path, columns=[&#39;Parent&#39;, &#39;Level&#39;]))

# Define the rank of each edge
rank = pd.concat(paths).drop_duplicates(ignore_index=True).reset_index(names=&#39;Rank&#39;)

# Broadcast the rank to all rows
df = df.merge(rank, on=[&#39;Level&#39;, &#39;Parent&#39;]).sort_values(&#39;Rank&#39;)

Output:

&gt;&gt;&gt; df
   Level Parent   Type  Rank
0     A1    NaN   Saga     0
12    B1     A1   Epic     1
2      C     B1  Story     2
3      C     B1  Story     2
4      C     B1  Story     2
14    B2     A1   Epic     3
10     C     B2  Story     4
9      C     B2  Story     4
8      C     B2  Story     4
7      C     B2  Story     4
6      C     B2  Story     4
11    A2    NaN   Saga     5
1     B4     A2   Epic     6
15     C     B4  Story     7
16     C     B4  Story     7
17     C     B4  Story     7
5     B3     A2   Epic     8
13     C     B3  Story     9

Intermediate steps:

&gt;&gt;&gt; rank
   Rank Parent Level
0     0    NaN    A1
1     1     A1    B1
2     2     B1     C
3     3     A1    B2
4     4     B2     C
5     5    NaN    A2
6     6     A2    B4
7     7     B4     C
8     8     A2    B3
9     9     B3     C

huangapple
  • 本文由 发表于 2023年6月29日 16:35:09
  • 转载请务必保留本文链接:https://go.coder-hub.com/76579375.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定