英文:
Set custom sort (first and last) without knowing all values
问题
我有一个pandas DataFrame,其中包含以下信息:
id desc ammount
356 89 2521 ShouldBeFirst 0.00
356 89 2521 ShouldBeLast 19.00
356 89 2521 RandomValue 39.00
356 89 2521 RandomValue2 29.00
123 45 6789 RandomValue3 29.99
123 45 6789 ShouldBeFirst 0.00
123 45 6789 ShouldBeLast 99.00
123 45 6789 RandomValue3 39.00
123 45 6789 RandomValue2 19.00
我想根据ID对DataFrame进行排序(可以使用 df.sort_values('id', ascending=True)
完成),然后将该ID的第一行始终设置为 ShouldBeFirst
,将该ID的最后一行设置为 ShouldBeLast
,结果如下:
id desc ammount
123 45 6789 ShouldBeFirst 0.00
123 45 6789 RandomValue2 19.00
123 45 6789 RandomValue3 29.99
123 45 6789 RandomValue3 39.00
123 45 6789 ShouldBeLast 99.00
356 89 2521 ShouldBeFirst 0.00
356 89 2521 RandomValue2 29.00
356 89 2521 RandomValue 39.00
356 89 2521 ShouldBeLast 19.00
英文:
I have a pandas DataFrame with the following information
id desc ammount
356 89 2521 ShouldBeFirst 0.00
356 89 2521 ShouldBeLast 19.00
356 89 2521 RandomValue 39.00
356 89 2521 RandomValue2 29.00
123 45 6789 RandomValue3 29.99
123 45 6789 ShouldBeFirst 0.00
123 45 6789 ShouldBeLast 99.00
123 45 6789 RandomValue3 39.00
123 45 6789 RandomValue2 19.00
What I would like is to sort the DataFrame based on the ID (which can easily be done with df.sort_values('id', ascending=True)
) and after that, set the first row of that ID to always be ShouldBeFirst
, and the last row with that ID to ShouldBeLast
, like this:
id desc ammount
123 45 6789 ShouldBeFirst 0.00
123 45 6789 RandomValue2 19.00
123 45 6789 RandomValue3 29.99
123 45 6789 RandomValue3 39.00
123 45 6789 ShouldBeLast 99.00
356 89 2521 ShouldBeFirst 0.00
356 89 2521 RandomValue2 29.00
356 89 2521 RandomValue 39.00
356 89 2521 ShouldBeLast 19.00
I've seen several threads on how to custom sort a pandas DataFrame, but they all require to list all possible values. I do not have all the values on the column desc
.
答案1
得分: 1
创建一个名为desc_order
的中间列,通过映射已知值到预定义的顺序,然后对数据框进行排序。
df['desc_order'] = df['desc'].map({'ShouldBeFirst': 0, 'ShouldBeLast': 2}).fillna(1)
df.sort_values(['id', 'desc_order', 'ammount']).drop(columns=['desc_order'])
英文:
Create an intermediate desc_order
column by mapping known values to predefined order then sort the dataframe
df['desc_order'] = df['desc'].map({'ShouldBeFirst': 0, 'ShouldBeLast': 2}).fillna(1)
df.sort_values(['id', 'desc_order', 'ammount']).drop(columns=['desc_order'])
id desc ammount
5 123 45 6789 ShouldBeFirst 0.00
8 123 45 6789 RandomValue2 19.00
4 123 45 6789 RandomValue3 29.99
7 123 45 6789 RandomValue3 39.00
6 123 45 6789 ShouldBeLast 99.00
0 356 89 2521 ShouldBeFirst 0.00
3 356 89 2521 RandomValue2 29.00
2 356 89 2521 RandomValue 39.00
1 356 89 2521 ShouldBeLast 19.00
答案2
得分: 1
你可以使用自定义字典来映射第一个和最后一个值,然后使用 fillna
来处理其他值:
order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}
def sorter(s):
if s.name == 'desc':
return s.map(order).fillna(1)
else:
return s
out = df.sort_values(by=['id', 'desc', 'ammount'], key=sorter)
或者你也可以使用 numpy.lexsort
:
order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}
df.iloc[np.lexsort([df['ammount'], df['desc'].map(order).fillna(1), df['id']])]
输出:
id desc ammount
5 123 45 6789 ShouldBeFirst 0.00
8 123 45 6789 RandomValue2 19.00
4 123 45 6789 RandomValue3 29.99
7 123 45 6789 RandomValue3 39.00
6 123 45 6789 ShouldBeLast 99.00
0 356 89 2521 ShouldBeFirst 0.00
3 356 89 2521 RandomValue2 29.00
2 356 89 2521 RandomValue 39.00
1 356 89 2521 ShouldBeLast 19.00
希望这对你有帮助。
英文:
You can use a custom dictionary mapping the first and last and fillna
for the other values:
order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}
def sorter(s):
if s.name == 'desc':
return s.map(order).fillna(1)
else:
return s
out = df.sort_values(by=['id', 'desc', 'ammount'], key=sorter)
Or using numpy.lexsort
:
order = {'ShouldBeFirst': 0, 'ShouldBeLast': 2}
df.iloc[np.lexsort([df['ammount'], df['desc'].map(order).fillna(1), df['id']])]
Output:
id desc ammount
5 123 45 6789 ShouldBeFirst 0.00
8 123 45 6789 RandomValue2 19.00
4 123 45 6789 RandomValue3 29.99
7 123 45 6789 RandomValue3 39.00
6 123 45 6789 ShouldBeLast 99.00
0 356 89 2521 ShouldBeFirst 0.00
3 356 89 2521 RandomValue2 29.00
2 356 89 2521 RandomValue 39.00
1 356 89 2521 ShouldBeLast 19.00
答案3
得分: 0
尝试一下......
import pandas as pd
# 假设 DataFrame 已经存在并命名为 'df'
df_sorted = df.sort_values('id', ascending=True)
mask = df_sorted.duplicated('id', keep=False)
df_sorted.loc[mask, 'order'] = 1
df_sorted.loc[~mask, 'order'] = [0, 2] * (df_sorted['id'].nunique() // 2 + 1)
df_sorted = df_sorted.sort_values(['id', 'order']).drop('order', axis=1)
print(df_sorted)
英文:
Try this......
import pandas as pd
# Assuming the DataFrame is already available as 'df'
df_sorted = df.sort_values('id', ascending=True)
mask = df_sorted.duplicated('id', keep=False)
df_sorted.loc[mask, 'order'] = 1
df_sorted.loc[~mask, 'order'] = [0, 2] * (df_sorted['id'].nunique() // 2 + 1)
df_sorted = df_sorted.sort_values(['id', 'order']).drop('order', axis=1)
print(df_sorted)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论