按列对字符串进行排序(不包括数字),并在制作图表时保持顺序。

huangapple go评论109阅读模式
英文:

Sort column strings without numbers (and keep order when doing graphs)

问题

我有这段DataFrame代码

df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

生成的DataFrame如下:

      A  qty
0    0-5    7
1  18-23   15
2  12-17    8
3   6-11   34

我想按列'A'对DataFrame进行排序,而不必给'A'列加上数字,这样以后绘制图表时就不会有数字。

在按'A'列排序后,期望的输出如下:

      A  qty
0   0-5    7
3  6-11   34
2  12-17    8
1  18-23   15

为了达到类似的结果,我会执行以下步骤:

# 添加一个分类编码
df['A'] = df['A'].astype('category').cat.codes + 1

# 转换格式
df['A'] = df['A'].astype('string')

# 使用字典重命名(基于先前的输出)
dic = {'1':'1_0-5', '3':'3_18-23', '2':'2_12-17', '4':'4_6-11'}
df['A'] = df['A'].replace(dic, regex=True)

# 再次使用字典重命名
dic = {'1_0-5':'1_0-5', '3_18-23':'4_18-23', '2_12-17':'3_12-17', '4_6-11':'2_6-11'}
df['A'] = df['A'].replace(dic, regex=True)

通过这样做,我可以实现以下结果:

       A  qty
0   1_0-5    7
1   2_6-11   15
2  3_12-17    8
3  4_18-23   34

对于我来说,Groupby无法实现我需要的排序,虽然它可以按照期望的方式对列A进行排序,但在绘制图表时,顺序不会保持不变。

英文:

I have this df code

df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

yielding

    A	    qty
0	0-5	    7
1	18-23	15
2	12-17	8
3	6-11	34

I would like to order the df by col 'A' without having to number the A column, so that later when I do graphs I don't have the numbers.

This is the desired output after sorting the df by column A:

     A	      qty
0	0-5	       7
3	6-11	   34
2	12-17	   8
1	18-23	   15

To achieve a similar result I would:

# add a category code
df['A'] = df['A'].astype('category').cat.codes + 1
# convert format
df['A'] = df['A'].astype('string')
# use a dictionary to rename (based on former output)
dic = {
'1':'1_0-5',
'3':'3_18-23',        
'2':'2_12-17', 
'4':'4_6-11',    
}
df['A'] = df['A'].replace(dic, regex=True)
## use a dictionary to rename again
dic = {
'1_0-5':'1_0-5',
'3_18-23':'4_18-23',        
'2_12-17':'3_12-17', 
'4_6-11':'2_6-11',    
}
df['A'] = df['A'].replace(dic, regex=True)

by doing this, I can achieve this:

           A	     qty
    0	1_0-5	     7
    1	2_6-11	     15
    2	3_12-17	     8
    3	4_18-23	     34

Groupby does not work for me, while it would order column A as desired, when I would do graphs, order would not be kept.

答案1

得分: 3

使用natsort_key来进行自然排序,不要重新发明轮子:

# pip install natsort
from natsort import natsort_key

out = df.sort_values(by='A', key=natsort_key)

输出:

       A  qty
0    0-5    7
1   6-11   15
2  12-17    8
3  18-23   34

或者,只是为了好玩,可以使用numpy.lexsort

out = df.iloc[np.lexsort(df['A'].str.split('-', expand=True)
                         .astype(int).to_numpy()[:, ::-1].T)]
英文:

Don't reinvent the wheel, use natsort_key for natural sorting:

# pip install natsort
from natsort import natsort_key

out = df.sort_values(by='A', key=natsort_key)

Output:

       A  qty
0    0-5    7
1   6-11   15
2  12-17    8
3  18-23   34

Or for fun, using numpy.lexsort:

out = df.iloc[np.lexsort(df['A'].str.split('-', expand=True)
                         .astype(int).to_numpy()[:, ::-1].T)]

答案2

得分: 1

如果可能的话,可以使用 key 参数在 DataFrame.sort_values 中按第一个整数值进行排序:

out = df.sort_values('A', 
                    key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
                    ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11   15
2  12-17    8
3  18-23   34

或者使用自然排序:

from natsort import natsorted

out = df.sort_values("A", key=natsorted, ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11    8
2  12-17   34
3  18-23   15

编辑:如果需要按字符串进行排序,请使用有序的分类数据:

df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})

df['A'] = (pd.Categorical(df['A'], 
                         categories=['john','alice', 'mike','brian'], 
                         ordered=True))

out = df.sort_values('A', ignore_index=True)
print (out)
       A  qty
0   john    8
1  alice   15
2   mike    7
3  brian   34
英文:

If possible sort by first integer value is possible use key parameter in DataFrame.sort_values:

out = df.sort_values('A', 
                    key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
                    ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11   15
2  12-17    8
3  18-23   34

Or use natural sorting:

from natsort import natsorted

out = df.sort_values("A",key=natsorted, ignore_index=True)
print (out)
       A  qty
0    0-5    7
1   6-11    8
2  12-17   34
3  18-23   15

EDIT: If need sorting by strings use ordered Categoricals:

df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})



df['A'] = (pd.Categorical(df['A'], 
                         categories=['john','alice', 'mike','brian'], 
                         ordered=True))

out = df.sort_values('A', ignore_index=True)
print (out)
       A  qty
0   john    8
1  alice   15
2   mike    7
3  brian   34

答案3

得分: -1

要按照所需的顺序对数据框 df 中的列 A 进行排序,您可以从列 A 中提取数值,对它们进行排序,然后使用排序后的值来对整个数据框进行排序。以下是实现这一目标的代码:

import pandas as pd

df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

# 从 'A' 列中提取数值
df['sort_key'] = df['A'].str.split('-', expand=True)[0].astype(int)

# 基于 'sort_key' 对数据框进行排序
df = df.sort_values('sort_key')

# 删除 'sort_key' 列
df = df.drop('sort_key', axis=1)

print(df)

输出结果:

       A  qty
0    0-5    7
3   6-11   34
2  12-17    8
1  18-23   15
英文:

To sort the column A in the dataframe df in the desired order, you can extract the numerical values from the A column, sort them, and then use the sorted values to sort the entire dataframe. Here's the code to accomplish this:

import pandas as pd

df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

# Extract numerical values from 'A' column
df['sort_key'] = df['A'].str.split('-', expand=True)[0].astype(int)

# Sort the dataframe based on the 'sort_key'
df = df.sort_values('sort_key')

# Drop the 'sort_key' column
df = df.drop('sort_key', axis=1)

print(df)

Output:

       A  qty
0    0-5    7
3   6-11   34
2  12-17    8
1  18-23   15

huangapple
  • 本文由 发表于 2023年7月10日 17:16:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76652331.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定