如何对包含字符串和整数的列中的整数进行数据框排序?

huangapple go评论65阅读模式
英文:

How do you sort a dataframe with the integer in a column with strings and integers on every row?

问题

df['a'] = df['a'].str.extract('(\d+)').astype(int)
df = df.sort_values(by=['a'], ignore_index=True)
英文:

How would you sort the following dataframe:

df = pd.DataFrame({'a':['abc_1.2.6','abc_1.2.60','abc_1.2.7','abc_1.2.9','abc_1.3.0','abc_1.3.10','abc_1.3.100','abc_1.3.11'], 'b':[1,2,3,4,5,6,7,8]})

>>>
	a	        b
0	abc_1.2.6	1
1	abc_1.2.60	2
2	abc_1.2.7	3
3	abc_1.2.9	4
4	abc_1.3.0	5
5	abc_1.3.10	6
6	abc_1.3.100	7
7	abc_1.3.11	8

to achieve this output?

>>>
	a	        b
0	abc_1.2.6	1
1	abc_1.2.7	3
2	abc_1.2.9	4
3	abc_1.2.60	2
4	abc_1.3.0	5
5	abc_1.3.10	6
6	abc_1.3.11	8
7	abc_1.3.100	7

I understand that integers in strings can be accessed through string transformations, however I'm unsure how to handle this in a dataframe. Obviously df.sort_values(by=['a'],ignore_index=True) is unhelpful in this case.

答案1

得分: 2

One way to use is natsorted with iloc :

#pip install natsort
from natsort import natsorted
​
out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]

Or even shorter, as suggested by @Stef, use natsort_key as a key of sort_values :

from natsort import natsort_key

out = df.sort_values(by="a", key=natsort_key, ignore_index=True)

Output :

print(out)
             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7
英文:

One way to use is natsorted with iloc :

#pip install natsort
from natsort import natsorted
​
out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]

Or even shorter, as suggested by @Stef, use natsort_key as a key of sort_values :

from natsort import natsort_key

out = df.sort_values(by="a", key=natsort_key, ignore_index=True)

Output :

print(out)
             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7

答案2

得分: 1

你可以在排序之前对值应用key函数:

df = (df.sort_values(by=['a'], ignore_index=True,
                     key=lambda x: x.map(lambda v:
                                         tuple(map(int, v[4:].split('.'))))))

             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7
英文:

You can apply the key function to the values before sorting:

df = (df.sort_values(by=['a'], ignore_index=True,
                     key=lambda x: x.map(lambda v:
                                         tuple(map(int, v[4:].split('.'))))))

             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7

答案3

得分: 1

这是另一种方法,使用str.findall()explode()

df.sort_values('a', key=lambda x: x.str.findall(r'\d+').explode().astype(int).groupby(level=0).agg(tuple))

输出:

                 a  b
0    abc_1.2.6  1
2    abc_1.2.7  3
3    abc_1.2.9  4
1   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
7   abc_1.3.11  8
6  abc_1.3.100  7
英文:

Here is another way by using str.findall() and explode()

df.sort_values('a',key = lambda x: x.str.findall(r'\d+').explode().astype(int).groupby(level=0).agg(tuple))

Output:

             a  b
0    abc_1.2.6  1
2    abc_1.2.7  3
3    abc_1.2.9  4
1   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
7   abc_1.3.11  8
6  abc_1.3.100  7

huangapple
  • 本文由 发表于 2023年4月19日 21:24:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055072.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定