2023年5月21日 01:03:11go评论139阅读模式

英文:

Set custom sort (first and last) without knowing all values

问题

我有一个pandas DataFrame，其中包含以下信息：

             id          desc  ammount
    356 89 2521 ShouldBeFirst    0.00
    356 89 2521  ShouldBeLast   19.00
    356 89 2521   RandomValue   39.00
    356 89 2521  RandomValue2   29.00
    123 45 6789  RandomValue3   29.99
    123 45 6789 ShouldBeFirst    0.00
    123 45 6789  ShouldBeLast   99.00
    123 45 6789  RandomValue3   39.00
    123 45 6789  RandomValue2   19.00

我想根据ID对DataFrame进行排序（可以使用 df.sort_values('id', ascending=True) 完成），然后将该ID的第一行始终设置为 ShouldBeFirst，将该ID的最后一行设置为 ShouldBeLast，结果如下：

             id          desc  ammount
    123 45 6789 ShouldBeFirst    0.00
    123 45 6789  RandomValue2   19.00
    123 45 6789  RandomValue3   29.99
    123 45 6789  RandomValue3   39.00
    123 45 6789  ShouldBeLast   99.00
    356 89 2521 ShouldBeFirst    0.00
    356 89 2521  RandomValue2   29.00
    356 89 2521   RandomValue   39.00
    356 89 2521  ShouldBeLast   19.00

英文:

I have a pandas DataFrame with the following information

         id          desc ammount
356 89 2521 ShouldBeFirst    0.00
356 89 2521  ShouldBeLast   19.00
356 89 2521   RandomValue   39.00
356 89 2521  RandomValue2   29.00
123 45 6789  RandomValue3   29.99
123 45 6789 ShouldBeFirst    0.00
123 45 6789  ShouldBeLast   99.00
123 45 6789  RandomValue3   39.00
123 45 6789  RandomValue2   19.00

What I would like is to sort the DataFrame based on the ID (which can easily be done with df.sort_values('id', ascending=True)) and after that, set the first row of that ID to always be ShouldBeFirst, and the last row with that ID to ShouldBeLast, like this:

         id          desc ammount
123 45 6789 ShouldBeFirst    0.00
123 45 6789  RandomValue2   19.00
123 45 6789  RandomValue3   29.99
123 45 6789  RandomValue3   39.00
123 45 6789  ShouldBeLast   99.00
356 89 2521 ShouldBeFirst    0.00
356 89 2521  RandomValue2   29.00
356 89 2521   RandomValue   39.00
356 89 2521  ShouldBeLast   19.00

I've seen several threads on how to custom sort a pandas DataFrame, but they all require to list all possible values. I do not have all the values on the column desc.

答案1

得分: 1

创建一个名为desc_order的中间列，通过映射已知值到预定义的顺序，然后对数据框进行排序。

df['desc_order'] = df['desc'].map({'ShouldBeFirst': 0, 'ShouldBeLast': 2}).fillna(1)
df.sort_values(['id', 'desc_order', 'ammount']).drop(columns=['desc_order'])

英文:

Create an intermediate desc_order column by mapping known values to predefined order then sort the dataframe

df[&#39;desc_order&#39;] = df[&#39;desc&#39;].map({&#39;ShouldBeFirst&#39;: 0, &#39;ShouldBeLast&#39;: 2}).fillna(1)
df.sort_values([&#39;id&#39;, &#39;desc_order&#39;, &#39;ammount&#39;]).drop(columns=[&#39;desc_order&#39;])

            id           desc  ammount
5  123 45 6789  ShouldBeFirst     0.00
8  123 45 6789   RandomValue2    19.00
4  123 45 6789   RandomValue3    29.99
7  123 45 6789   RandomValue3    39.00
6  123 45 6789   ShouldBeLast    99.00
0  356 89 2521  ShouldBeFirst     0.00
3  356 89 2521   RandomValue2    29.00
2  356 89 2521    RandomValue    39.00
1  356 89 2521   ShouldBeLast    19.00

答案2

得分: 1

你可以使用自定义字典来映射第一个和最后一个值，然后使用 fillna 来处理其他值：

order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}

def sorter(s):
    if s.name == 'desc':
        return s.map(order).fillna(1)
    else:
        return s

out = df.sort_values(by=['id', 'desc', 'ammount'], key=sorter)

或者你也可以使用 numpy.lexsort：

order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}

df.iloc[np.lexsort([df['ammount'], df['desc'].map(order).fillna(1), df['id']])]

输出：

            id           desc  ammount
5  123 45 6789  ShouldBeFirst     0.00
8  123 45 6789   RandomValue2    19.00
4  123 45 6789   RandomValue3    29.99
7  123 45 6789   RandomValue3    39.00
6  123 45 6789   ShouldBeLast    99.00
0  356 89 2521  ShouldBeFirst     0.00
3  356 89 2521   RandomValue2    29.00
2  356 89 2521    RandomValue    39.00
1  356 89 2521   ShouldBeLast    19.00

希望这对你有帮助。

英文:

You can use a custom dictionary mapping the first and last and fillna for the other values:

order = {&#39;ShouldBeFirst&#39;: 0, &#39;ShouldBeLast&#39;: 2}

def sorter(s):
    if s.name == &#39;desc&#39;:
        return s.map(order).fillna(1)
    else:
        return s

out = df.sort_values(by=[&#39;id&#39;, &#39;desc&#39;, &#39;ammount&#39;], key=sorter)

Or using numpy.lexsort:

order = {&#39;ShouldBeFirst&#39;: 0, &#39;ShouldBeLast&#39;: 2}

df.iloc[np.lexsort([df[&#39;ammount&#39;], df[&#39;desc&#39;].map(order).fillna(1), df[&#39;id&#39;]])]

Output:

            id           desc  ammount
5  123 45 6789  ShouldBeFirst     0.00
8  123 45 6789   RandomValue2    19.00
4  123 45 6789   RandomValue3    29.99
7  123 45 6789   RandomValue3    39.00
6  123 45 6789   ShouldBeLast    99.00
0  356 89 2521  ShouldBeFirst     0.00
3  356 89 2521   RandomValue2    29.00
2  356 89 2521    RandomValue    39.00
1  356 89 2521   ShouldBeLast    19.00

答案3

得分: 0

尝试一下......

  import pandas as pd
    
    # 假设 DataFrame 已经存在并命名为 'df'
    df_sorted = df.sort_values('id', ascending=True)
    
    mask = df_sorted.duplicated('id', keep=False)
    df_sorted.loc[mask, 'order'] = 1
    df_sorted.loc[~mask, 'order'] = [0, 2] * (df_sorted['id'].nunique() // 2 + 1)
    
    df_sorted = df_sorted.sort_values(['id', 'order']).drop('order', axis=1)
    
    print(df_sorted)

英文:

Try this......
import pandas as pd

# Assuming the DataFrame is already available as &#39;df&#39;
df_sorted = df.sort_values(&#39;id&#39;, ascending=True)

mask = df_sorted.duplicated(&#39;id&#39;, keep=False)
df_sorted.loc[mask, &#39;order&#39;] = 1
df_sorted.loc[~mask, &#39;order&#39;] = [0, 2] * (df_sorted[&#39;id&#39;].nunique() // 2 + 1)

df_sorted = df_sorted.sort_values([&#39;id&#39;, &#39;order&#39;]).drop(&#39;order&#39;, axis=1)

print(df_sorted)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

设置自定义排序（首位），而不知道所有的值。

问题

答案1

答案2

答案3

如何使`sort` shell命令比较原始字节？

如何将值映射到具有上下界的新列中

在R中，将一些列的值合并后，向数据框添加一列。

为什么我的使用NumPy数组的排序算法比使用列表慢？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论