英文:
How do you sort a dataframe with the integer in a column with strings and integers on every row?
问题
df['a'] = df['a'].str.extract('(\d+)').astype(int)
df = df.sort_values(by=['a'], ignore_index=True)
英文:
How would you sort the following dataframe:
df = pd.DataFrame({'a':['abc_1.2.6','abc_1.2.60','abc_1.2.7','abc_1.2.9','abc_1.3.0','abc_1.3.10','abc_1.3.100','abc_1.3.11'], 'b':[1,2,3,4,5,6,7,8]})
>>>
a b
0 abc_1.2.6 1
1 abc_1.2.60 2
2 abc_1.2.7 3
3 abc_1.2.9 4
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.100 7
7 abc_1.3.11 8
to achieve this output?
>>>
a b
0 abc_1.2.6 1
1 abc_1.2.7 3
2 abc_1.2.9 4
3 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.11 8
7 abc_1.3.100 7
I understand that integers in strings can be accessed through string transformations, however I'm unsure how to handle this in a dataframe. Obviously df.sort_values(by=['a'],ignore_index=True)
is unhelpful in this case.
答案1
得分: 2
One way to use is natsorted
with iloc
:
#pip install natsort
from natsort import natsorted
out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]
Or even shorter, as suggested by @Stef, use natsort_key
as a key of sort_values
:
from natsort import natsort_key
out = df.sort_values(by="a", key=natsort_key, ignore_index=True)
Output :
print(out)
a b
0 abc_1.2.6 1
1 abc_1.2.7 3
2 abc_1.2.9 4
3 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.11 8
7 abc_1.3.100 7
英文:
One way to use is natsorted
with iloc
:
#pip install natsort
from natsort import natsorted
out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]
Or even shorter, as suggested by @Stef, use natsort_key
as a key of sort_values
:
from natsort import natsort_key
out = df.sort_values(by="a", key=natsort_key, ignore_index=True)
Output :
print(out)
a b
0 abc_1.2.6 1
1 abc_1.2.7 3
2 abc_1.2.9 4
3 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.11 8
7 abc_1.3.100 7
答案2
得分: 1
你可以在排序之前对值应用key
函数:
df = (df.sort_values(by=['a'], ignore_index=True,
key=lambda x: x.map(lambda v:
tuple(map(int, v[4:].split('.'))))))
a b
0 abc_1.2.6 1
1 abc_1.2.7 3
2 abc_1.2.9 4
3 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.11 8
7 abc_1.3.100 7
英文:
You can apply the key
function to the values before sorting:
df = (df.sort_values(by=['a'], ignore_index=True,
key=lambda x: x.map(lambda v:
tuple(map(int, v[4:].split('.'))))))
a b
0 abc_1.2.6 1
1 abc_1.2.7 3
2 abc_1.2.9 4
3 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
6 abc_1.3.11 8
7 abc_1.3.100 7
答案3
得分: 1
这是另一种方法,使用str.findall()
和explode()
df.sort_values('a', key=lambda x: x.str.findall(r'\d+').explode().astype(int).groupby(level=0).agg(tuple))
输出:
a b
0 abc_1.2.6 1
2 abc_1.2.7 3
3 abc_1.2.9 4
1 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
7 abc_1.3.11 8
6 abc_1.3.100 7
英文:
Here is another way by using str.findall()
and explode()
df.sort_values('a',key = lambda x: x.str.findall(r'\d+').explode().astype(int).groupby(level=0).agg(tuple))
Output:
a b
0 abc_1.2.6 1
2 abc_1.2.7 3
3 abc_1.2.9 4
1 abc_1.2.60 2
4 abc_1.3.0 5
5 abc_1.3.10 6
7 abc_1.3.11 8
6 abc_1.3.100 7
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论