2023年1月9日 14:32:37go评论126阅读模式

英文:

Python pandas keep first columns' order unchanged while second col sort by ascending order

问题

抱歉，我不能执行代码。以下是您要求的翻译：

"Hi I want to keep the column infoid order unchanged but sort date in increasing order(acsending)
Is that possible?

statisticsdate infoid
20230108 46726004
20230106 46726004
20230108 46725082
20230107 46725082
20230108 46725081
20230108 46724162
20230108 46720662

should be like:

statisticsdate infoid
20230106 46726004
20230108 46726004
20230107 46725082
20230108 46725082
20230108 46725081
20230108 46724162
20230108 46720662"

英文:

Hi I want to keep the column infoid order unchanged but sort date in increasing order(acsending)
Is that possible?

statisticsdate  infoid 
20230108      46726004
20230106      46726004
20230108      46725082
20230107      46725082
20230108      46725081
20230108      46724162
20230108      46720662

should be like:

statisticsdate  infoid 
20230106      46726004
20230108      46726004
20230107      46725082
20230108      46725082
20230108      46725081
20230108      46724162
20230108      46720662

答案1

得分: 2

如果可能的话，按照 infoid 降序和 infoid 升序使用 DataFrame.sort_values 进行排序：

df = df.sort_values(['infoid', 'statisticsdate'], ascending=[False, True], ignore_index=True)
print(df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

第一个方法是根据自定义 Lambda 函数按组进行排序，使用 sort=False 参数来不对组进行排序，如果 DataFrame 较大，则该解决方案较慢：

df['statisticsdate'] = (df.groupby('infoid', sort=False, group_keys=False)['statisticsdate']
                          .apply(lambda x: x.sort_values())
                          .to_numpy())
print(df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

或者，您可以将 infoid 转换为有序的分类并按两列进行排序：

df['statisticsdate'] = (df.assign(infoid=pd.Categorical(df['infoid'], ordered=True, categories=df['infoid'].unique()))
                          .sort_values(['infoid', 'statisticsdate']))['statisticsdate']
print(df)
   statisticsdate    infoid
0        20230108  46726004
1        20230106  46726004
2        20230108  46725082
3        20230107  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

如果某些组未排序，则需要将 infoid 转换为连续的组，然后按两列进行排序：

print(df)
   statisticsdate    infoid
0        20230108  46726004
1        20230106  46726004
2        20230108  46725082
3        20230107  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662
7        20230108  46726004 <- 未排序的组 46726004
8        20230106  46726004
df['statisticsdate'] = (df.assign(infoid=df['infoid'].ne(df['infoid'].shift()).cumsum())
              .sort_values(['infoid', 'statisticsdate'], ignore_index=True)['statisticsdate'])
print(df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662
7        20230106  46726004
8        20230108  46726004

英文:

If possible sorting infoid descending and infoid ascending use DataFrame.sort_values only:

df = df.sort_values([&#39;infoid&#39;,&#39;statisticsdate&#39;], ascending=[False, True], ignore_index=True)
print (df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

First idea is sorting per groups by custom lambda function with sort=False parameter for no sorting groups, solution is slow if larger DataFrame:

df[&#39;statisticsdate&#39;] = (df.groupby(&#39;infoid&#39;, sort=False, group_keys=False)[&#39;statisticsdate&#39;]
                          .apply(lambda x: x.sort_values())
                          .to_numpy())
print (df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

Or you can convert infoid to ordered Categorical and sorting by both columns:

df[&#39;statisticsdate&#39;] = (df.assign(infoid = pd.Categorical(df[&#39;infoid&#39;],
                                                          ordered=True, 
                                                          categories=df[&#39;infoid&#39;].unique()))
                          .sort_values([&#39;infoid&#39;,&#39;statisticsdate&#39;]))[&#39;statisticsdate&#39;]
print (df)
   statisticsdate    infoid
0        20230108  46726004
1        20230106  46726004
2        20230108  46725082
3        20230107  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662

If some groups are not sorted is necessary convert infoid to consecutive groups and then sorting by both columns:

print (df)
   statisticsdate    infoid
0        20230108  46726004
1        20230106  46726004
2        20230108  46725082
3        20230107  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662
7        20230108  46726004 &lt;- not sorted group 46726004
8        20230106  46726004
df[&#39;statisticsdate&#39;] = (df.assign(infoid = df[&#39;infoid&#39;].ne(df[&#39;infoid&#39;].shift()).cumsum())
              .sort_values([&#39;infoid&#39;,&#39;statisticsdate&#39;], ignore_index=True)[&#39;statisticsdate&#39;])
print (df)
   statisticsdate    infoid
0        20230106  46726004
1        20230108  46726004
2        20230107  46725082
3        20230108  46725082
4        20230108  46725081
5        20230108  46724162
6        20230108  46720662
7        20230106  46726004
8        20230108  46726004

答案2

得分: 2

这应该可以完成任务：

&gt;&gt;&gt; df.groupby('id').transform(lambda x: x.sort_values()).join(df['id'])
       date        id
0  20230106  46726004
1  20230108  46726004
2  20230107  46725082
3  20230108  46725082
4  20230108  46725081
5  20230108  46724162
6  20230108  46720662

或者不使用 join：

&gt;&gt;&gt; df.set_index('id').groupby('id').transform(lambda x: x.sort_values()).reset_index()
         id      date
0  46726004  20230106
1  46726004  20230108
2  46725082  20230107
3  46725082  20230108
4  46725081  20230108
5  46724162  20230108
6  46720662  20230108

英文:

This should do the trick:

&gt;&gt;&gt; df.groupby(&#39;id&#39;).transform(lambda x: x.sort_values()).join(df[&#39;id&#39;])
       date        id
0  20230106  46726004
1  20230108  46726004
2  20230107  46725082
3  20230108  46725082
4  20230108  46725081
5  20230108  46724162
6  20230108  46720662

Or without a join:

&gt;&gt;&gt; df.set_index(&#39;id&#39;).groupby(&#39;id&#39;).transform(lambda x: x.sort_values()).reset_index()
         id      date
0  46726004  20230106
1  46726004  20230108
2  46725082  20230107
3  46725082  20230108
4  46725081  20230108
5  46724162  20230108
6  46720662  20230108

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python pandas 保持第一列的顺序不变，同时按升序对第二列进行排序

问题

答案1

答案2

在Python中等价于R中的geosphere::distGeo的函数是：

executors 何时以及如何将控制权归还给事件循环？

如何理解Python函数中的一行死代码？

Pandas可以将单元格合并后写入Excel文件吗？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。