设置自定义排序(首位),而不知道所有的值。

huangapple go评论53阅读模式
英文:

Set custom sort (first and last) without knowing all values

问题

我有一个pandas DataFrame,其中包含以下信息:

             id          desc  ammount
    356 89 2521 ShouldBeFirst    0.00
    356 89 2521  ShouldBeLast   19.00
    356 89 2521   RandomValue   39.00
    356 89 2521  RandomValue2   29.00
    123 45 6789  RandomValue3   29.99
    123 45 6789 ShouldBeFirst    0.00
    123 45 6789  ShouldBeLast   99.00
    123 45 6789  RandomValue3   39.00
    123 45 6789  RandomValue2   19.00

我想根据ID对DataFrame进行排序(可以使用 df.sort_values('id', ascending=True) 完成),然后将该ID的第一行始终设置为 ShouldBeFirst,将该ID的最后一行设置为 ShouldBeLast,结果如下:

             id          desc  ammount
    123 45 6789 ShouldBeFirst    0.00
    123 45 6789  RandomValue2   19.00
    123 45 6789  RandomValue3   29.99
    123 45 6789  RandomValue3   39.00
    123 45 6789  ShouldBeLast   99.00
    356 89 2521 ShouldBeFirst    0.00
    356 89 2521  RandomValue2   29.00
    356 89 2521   RandomValue   39.00
    356 89 2521  ShouldBeLast   19.00
英文:

I have a pandas DataFrame with the following information

         id          desc ammount
356 89 2521 ShouldBeFirst    0.00
356 89 2521  ShouldBeLast   19.00
356 89 2521   RandomValue   39.00
356 89 2521  RandomValue2   29.00
123 45 6789  RandomValue3   29.99
123 45 6789 ShouldBeFirst    0.00
123 45 6789  ShouldBeLast   99.00
123 45 6789  RandomValue3   39.00
123 45 6789  RandomValue2   19.00

What I would like is to sort the DataFrame based on the ID (which can easily be done with df.sort_values('id', ascending=True)) and after that, set the first row of that ID to always be ShouldBeFirst, and the last row with that ID to ShouldBeLast, like this:

         id          desc ammount
123 45 6789 ShouldBeFirst    0.00
123 45 6789  RandomValue2   19.00
123 45 6789  RandomValue3   29.99
123 45 6789  RandomValue3   39.00
123 45 6789  ShouldBeLast   99.00
356 89 2521 ShouldBeFirst    0.00
356 89 2521  RandomValue2   29.00
356 89 2521   RandomValue   39.00
356 89 2521  ShouldBeLast   19.00

I've seen several threads on how to custom sort a pandas DataFrame, but they all require to list all possible values. I do not have all the values on the column desc.

答案1

得分: 1

创建一个名为desc_order的中间列,通过映射已知值到预定义的顺序,然后对数据框进行排序。

df['desc_order'] = df['desc'].map({'ShouldBeFirst': 0, 'ShouldBeLast': 2}).fillna(1)
df.sort_values(['id', 'desc_order', 'ammount']).drop(columns=['desc_order'])
英文:

Create an intermediate desc_order column by mapping known values to predefined order then sort the dataframe

df['desc_order'] = df['desc'].map({'ShouldBeFirst': 0, 'ShouldBeLast': 2}).fillna(1)
df.sort_values(['id', 'desc_order', 'ammount']).drop(columns=['desc_order'])

            id           desc  ammount
5  123 45 6789  ShouldBeFirst     0.00
8  123 45 6789   RandomValue2    19.00
4  123 45 6789   RandomValue3    29.99
7  123 45 6789   RandomValue3    39.00
6  123 45 6789   ShouldBeLast    99.00
0  356 89 2521  ShouldBeFirst     0.00
3  356 89 2521   RandomValue2    29.00
2  356 89 2521    RandomValue    39.00
1  356 89 2521   ShouldBeLast    19.00

答案2

得分: 1

你可以使用自定义字典来映射第一个和最后一个值,然后使用 fillna 来处理其他值:

order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}

def sorter(s):
    if s.name == 'desc':
        return s.map(order).fillna(1)
    else:
        return s

out = df.sort_values(by=['id', 'desc', 'ammount'], key=sorter)

或者你也可以使用 numpy.lexsort

order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}

df.iloc[np.lexsort([df['ammount'], df['desc'].map(order).fillna(1), df['id']])]

输出:

            id           desc  ammount
5  123 45 6789  ShouldBeFirst     0.00
8  123 45 6789   RandomValue2    19.00
4  123 45 6789   RandomValue3    29.99
7  123 45 6789   RandomValue3    39.00
6  123 45 6789   ShouldBeLast    99.00
0  356 89 2521  ShouldBeFirst     0.00
3  356 89 2521   RandomValue2    29.00
2  356 89 2521    RandomValue    39.00
1  356 89 2521   ShouldBeLast    19.00

希望这对你有帮助。

英文:

You can use a custom dictionary mapping the first and last and fillna for the other values:

order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}

def sorter(s):
    if s.name == 'desc':
        return s.map(order).fillna(1)
    else:
        return s

out = df.sort_values(by=['id', 'desc', 'ammount'], key=sorter)

Or using numpy.lexsort:

order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}

df.iloc[np.lexsort([df['ammount'], df['desc'].map(order).fillna(1), df['id']])]

Output:

            id           desc  ammount
5  123 45 6789  ShouldBeFirst     0.00
8  123 45 6789   RandomValue2    19.00
4  123 45 6789   RandomValue3    29.99
7  123 45 6789   RandomValue3    39.00
6  123 45 6789   ShouldBeLast    99.00
0  356 89 2521  ShouldBeFirst     0.00
3  356 89 2521   RandomValue2    29.00
2  356 89 2521    RandomValue    39.00
1  356 89 2521   ShouldBeLast    19.00

答案3

得分: 0

尝试一下......

  import pandas as pd
    
    # 假设 DataFrame 已经存在并命名为 'df'
    df_sorted = df.sort_values('id', ascending=True)
    
    mask = df_sorted.duplicated('id', keep=False)
    df_sorted.loc[mask, 'order'] = 1
    df_sorted.loc[~mask, 'order'] = [0, 2] * (df_sorted['id'].nunique() // 2 + 1)
    
    df_sorted = df_sorted.sort_values(['id', 'order']).drop('order', axis=1)
    
    print(df_sorted)
英文:

Try this......
import pandas as pd

# Assuming the DataFrame is already available as 'df'
df_sorted = df.sort_values('id', ascending=True)

mask = df_sorted.duplicated('id', keep=False)
df_sorted.loc[mask, 'order'] = 1
df_sorted.loc[~mask, 'order'] = [0, 2] * (df_sorted['id'].nunique() // 2 + 1)

df_sorted = df_sorted.sort_values(['id', 'order']).drop('order', axis=1)

print(df_sorted)

huangapple
  • 本文由 发表于 2023年5月21日 01:03:11
  • 转载请务必保留本文链接:https://go.coder-hub.com/76296394.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定