英文:
Sort column strings without numbers (and keep order when doing graphs)
问题
我有这段DataFrame代码
df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})
生成的DataFrame如下:
A qty
0 0-5 7
1 18-23 15
2 12-17 8
3 6-11 34
我想按列'A'对DataFrame进行排序,而不必给'A'列加上数字,这样以后绘制图表时就不会有数字。
在按'A'列排序后,期望的输出如下:
A qty
0 0-5 7
3 6-11 34
2 12-17 8
1 18-23 15
为了达到类似的结果,我会执行以下步骤:
# 添加一个分类编码
df['A'] = df['A'].astype('category').cat.codes + 1
# 转换格式
df['A'] = df['A'].astype('string')
# 使用字典重命名(基于先前的输出)
dic = {'1':'1_0-5', '3':'3_18-23', '2':'2_12-17', '4':'4_6-11'}
df['A'] = df['A'].replace(dic, regex=True)
# 再次使用字典重命名
dic = {'1_0-5':'1_0-5', '3_18-23':'4_18-23', '2_12-17':'3_12-17', '4_6-11':'2_6-11'}
df['A'] = df['A'].replace(dic, regex=True)
通过这样做,我可以实现以下结果:
A qty
0 1_0-5 7
1 2_6-11 15
2 3_12-17 8
3 4_18-23 34
对于我来说,Groupby无法实现我需要的排序,虽然它可以按照期望的方式对列A进行排序,但在绘制图表时,顺序不会保持不变。
英文:
I have this df code
df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})
yielding
A qty
0 0-5 7
1 18-23 15
2 12-17 8
3 6-11 34
I would like to order the df by col 'A' without having to number the A column, so that later when I do graphs I don't have the numbers.
This is the desired output after sorting the df by column A:
A qty
0 0-5 7
3 6-11 34
2 12-17 8
1 18-23 15
To achieve a similar result I would:
# add a category code
df['A'] = df['A'].astype('category').cat.codes + 1
# convert format
df['A'] = df['A'].astype('string')
# use a dictionary to rename (based on former output)
dic = {
'1':'1_0-5',
'3':'3_18-23',
'2':'2_12-17',
'4':'4_6-11',
}
df['A'] = df['A'].replace(dic, regex=True)
## use a dictionary to rename again
dic = {
'1_0-5':'1_0-5',
'3_18-23':'4_18-23',
'2_12-17':'3_12-17',
'4_6-11':'2_6-11',
}
df['A'] = df['A'].replace(dic, regex=True)
by doing this, I can achieve this:
A qty
0 1_0-5 7
1 2_6-11 15
2 3_12-17 8
3 4_18-23 34
Groupby does not work for me, while it would order column A as desired, when I would do graphs, order would not be kept.
答案1
得分: 3
使用natsort_key
来进行自然排序,不要重新发明轮子:
# pip install natsort
from natsort import natsort_key
out = df.sort_values(by='A', key=natsort_key)
输出:
A qty
0 0-5 7
1 6-11 15
2 12-17 8
3 18-23 34
或者,只是为了好玩,可以使用numpy.lexsort
:
out = df.iloc[np.lexsort(df['A'].str.split('-', expand=True)
.astype(int).to_numpy()[:, ::-1].T)]
英文:
Don't reinvent the wheel, use natsort_key
for natural sorting:
# pip install natsort
from natsort import natsort_key
out = df.sort_values(by='A', key=natsort_key)
Output:
A qty
0 0-5 7
1 6-11 15
2 12-17 8
3 18-23 34
Or for fun, using numpy.lexsort
:
out = df.iloc[np.lexsort(df['A'].str.split('-', expand=True)
.astype(int).to_numpy()[:, ::-1].T)]
答案2
得分: 1
如果可能的话,可以使用 key
参数在 DataFrame.sort_values
中按第一个整数值进行排序:
out = df.sort_values('A',
key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
ignore_index=True)
print (out)
A qty
0 0-5 7
1 6-11 15
2 12-17 8
3 18-23 34
或者使用自然排序:
from natsort import natsorted
out = df.sort_values("A", key=natsorted, ignore_index=True)
print (out)
A qty
0 0-5 7
1 6-11 8
2 12-17 34
3 18-23 15
编辑:如果需要按字符串进行排序,请使用有序的分类数据:
df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})
df['A'] = (pd.Categorical(df['A'],
categories=['john','alice', 'mike','brian'],
ordered=True))
out = df.sort_values('A', ignore_index=True)
print (out)
A qty
0 john 8
1 alice 15
2 mike 7
3 brian 34
英文:
If possible sort by first integer value is possible use key
parameter in DataFrame.sort_values
:
out = df.sort_values('A',
key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
ignore_index=True)
print (out)
A qty
0 0-5 7
1 6-11 15
2 12-17 8
3 18-23 34
Or use natural sorting:
from natsort import natsorted
out = df.sort_values("A",key=natsorted, ignore_index=True)
print (out)
A qty
0 0-5 7
1 6-11 8
2 12-17 34
3 18-23 15
EDIT: If need sorting by strings use ordered Categoricals:
df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})
df['A'] = (pd.Categorical(df['A'],
categories=['john','alice', 'mike','brian'],
ordered=True))
out = df.sort_values('A', ignore_index=True)
print (out)
A qty
0 john 8
1 alice 15
2 mike 7
3 brian 34
答案3
得分: -1
要按照所需的顺序对数据框 df
中的列 A
进行排序,您可以从列 A
中提取数值,对它们进行排序,然后使用排序后的值来对整个数据框进行排序。以下是实现这一目标的代码:
import pandas as pd
df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})
# 从 'A' 列中提取数值
df['sort_key'] = df['A'].str.split('-', expand=True)[0].astype(int)
# 基于 'sort_key' 对数据框进行排序
df = df.sort_values('sort_key')
# 删除 'sort_key' 列
df = df.drop('sort_key', axis=1)
print(df)
输出结果:
A qty
0 0-5 7
3 6-11 34
2 12-17 8
1 18-23 15
英文:
To sort the column A
in the dataframe df
in the desired order, you can extract the numerical values from the A
column, sort them, and then use the sorted values to sort the entire dataframe. Here's the code to accomplish this:
import pandas as pd
df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})
# Extract numerical values from 'A' column
df['sort_key'] = df['A'].str.split('-', expand=True)[0].astype(int)
# Sort the dataframe based on the 'sort_key'
df = df.sort_values('sort_key')
# Drop the 'sort_key' column
df = df.drop('sort_key', axis=1)
print(df)
Output:
A qty
0 0-5 7
3 6-11 34
2 12-17 8
1 18-23 15
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论