2023年4月19日 21:24:27go评论94阅读模式

英文:

How do you sort a dataframe with the integer in a column with strings and integers on every row?

问题

df['a'] = df['a'].str.extract('(\d+)').astype(int)
df = df.sort_values(by=['a'], ignore_index=True)

英文:

How would you sort the following dataframe:

df = pd.DataFrame({&#39;a&#39;:[&#39;abc_1.2.6&#39;,&#39;abc_1.2.60&#39;,&#39;abc_1.2.7&#39;,&#39;abc_1.2.9&#39;,&#39;abc_1.3.0&#39;,&#39;abc_1.3.10&#39;,&#39;abc_1.3.100&#39;,&#39;abc_1.3.11&#39;], &#39;b&#39;:[1,2,3,4,5,6,7,8]})
&gt;&gt;&gt;
	a	        b
0	abc_1.2.6	1
1	abc_1.2.60	2
2	abc_1.2.7	3
3	abc_1.2.9	4
4	abc_1.3.0	5
5	abc_1.3.10	6
6	abc_1.3.100	7
7	abc_1.3.11	8

to achieve this output?

&gt;&gt;&gt;
	a	        b
0	abc_1.2.6	1
1	abc_1.2.7	3
2	abc_1.2.9	4
3	abc_1.2.60	2
4	abc_1.3.0	5
5	abc_1.3.10	6
6	abc_1.3.11	8
7	abc_1.3.100	7

I understand that integers in strings can be accessed through string transformations, however I'm unsure how to handle this in a dataframe. Obviously df.sort_values(by=['a'],ignore_index=True) is unhelpful in this case.

答案1

得分: 2

One way to use is natsorted with iloc :

#pip install natsort
from natsort import natsorted
out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]

Or even shorter, as suggested by @Stef, use natsort_key as a key of sort_values :

from natsort import natsort_key
out = df.sort_values(by="a", key=natsort_key, ignore_index=True)

Output :

print(out)
             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7

英文:

One way to use is natsorted with iloc :

#pip install natsort
from natsort import natsorted
out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, &quot;a&quot;])]

Or even shorter, as suggested by @Stef, use natsort_key as a key of sort_values :

from natsort import natsort_key
out = df.sort_values(by=&quot;a&quot;, key=natsort_key, ignore_index=True)

Output :

print(out)
             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7

答案2

得分: 1

你可以在排序之前对值应用key函数：

df = (df.sort_values(by=['a'], ignore_index=True,
                     key=lambda x: x.map(lambda v:
                                         tuple(map(int, v[4:].split('.'))))))

             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7

英文:

You can apply the key function to the values before sorting:

df = (df.sort_values(by=[&#39;a&#39;], ignore_index=True,
                     key=lambda x: x.map(lambda v:
                                         tuple(map(int, v[4:].split(&#39;.&#39;))))))

             a  b
0    abc_1.2.6  1
1    abc_1.2.7  3
2    abc_1.2.9  4
3   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
6   abc_1.3.11  8
7  abc_1.3.100  7

答案3

得分: 1

这是另一种方法，使用str.findall()和explode()

df.sort_values('a', key=lambda x: x.str.findall(r'\d+').explode().astype(int).groupby(level=0).agg(tuple))

输出：

                 a  b
0    abc_1.2.6  1
2    abc_1.2.7  3
3    abc_1.2.9  4
1   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
7   abc_1.3.11  8
6  abc_1.3.100  7

英文:

Here is another way by using str.findall() and explode()

df.sort_values(&#39;a&#39;,key = lambda x: x.str.findall(r&#39;\d+&#39;).explode().astype(int).groupby(level=0).agg(tuple))

Output:

             a  b
0    abc_1.2.6  1
2    abc_1.2.7  3
3    abc_1.2.9  4
1   abc_1.2.60  2
4    abc_1.3.0  5
5   abc_1.3.10  6
7   abc_1.3.11  8
6  abc_1.3.100  7

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何对包含字符串和整数的列中的整数进行数据框排序？

问题

答案1

答案2

答案3

从数据框的每个组/ID中从底部删除行。

Golang中的字节（byte）和字符串（string）有时是兼容的，有时是不兼容的。

使用用户名和密码连接到OPCUA服务器。

List files in specified directory without subdirectories.

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。