按列对字符串进行排序(不包括数字),并在制作图表时保持顺序。

huangapple go评论134阅读模式
英文:

Sort column strings without numbers (and keep order when doing graphs)

问题

我有这段DataFrame代码

  1. df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

生成的DataFrame如下:

  1. A qty
  2. 0 0-5 7
  3. 1 18-23 15
  4. 2 12-17 8
  5. 3 6-11 34

我想按列'A'对DataFrame进行排序,而不必给'A'列加上数字,这样以后绘制图表时就不会有数字。

在按'A'列排序后,期望的输出如下:

  1. A qty
  2. 0 0-5 7
  3. 3 6-11 34
  4. 2 12-17 8
  5. 1 18-23 15

为了达到类似的结果,我会执行以下步骤:

  1. # 添加一个分类编码
  2. df['A'] = df['A'].astype('category').cat.codes + 1
  3. # 转换格式
  4. df['A'] = df['A'].astype('string')
  5. # 使用字典重命名(基于先前的输出)
  6. dic = {'1':'1_0-5', '3':'3_18-23', '2':'2_12-17', '4':'4_6-11'}
  7. df['A'] = df['A'].replace(dic, regex=True)
  8. # 再次使用字典重命名
  9. dic = {'1_0-5':'1_0-5', '3_18-23':'4_18-23', '2_12-17':'3_12-17', '4_6-11':'2_6-11'}
  10. df['A'] = df['A'].replace(dic, regex=True)

通过这样做,我可以实现以下结果:

  1. A qty
  2. 0 1_0-5 7
  3. 1 2_6-11 15
  4. 2 3_12-17 8
  5. 3 4_18-23 34

对于我来说,Groupby无法实现我需要的排序,虽然它可以按照期望的方式对列A进行排序,但在绘制图表时,顺序不会保持不变。

英文:

I have this df code

  1. df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})

yielding

  1. A qty
  2. 0 0-5 7
  3. 1 18-23 15
  4. 2 12-17 8
  5. 3 6-11 34

I would like to order the df by col 'A' without having to number the A column, so that later when I do graphs I don't have the numbers.

This is the desired output after sorting the df by column A:

  1. A qty
  2. 0 0-5 7
  3. 3 6-11 34
  4. 2 12-17 8
  5. 1 18-23 15

To achieve a similar result I would:

  1. # add a category code
  2. df['A'] = df['A'].astype('category').cat.codes + 1
  3. # convert format
  4. df['A'] = df['A'].astype('string')
  5. # use a dictionary to rename (based on former output)
  6. dic = {
  7. '1':'1_0-5',
  8. '3':'3_18-23',
  9. '2':'2_12-17',
  10. '4':'4_6-11',
  11. }
  12. df['A'] = df['A'].replace(dic, regex=True)
  13. ## use a dictionary to rename again
  14. dic = {
  15. '1_0-5':'1_0-5',
  16. '3_18-23':'4_18-23',
  17. '2_12-17':'3_12-17',
  18. '4_6-11':'2_6-11',
  19. }
  20. df['A'] = df['A'].replace(dic, regex=True)

by doing this, I can achieve this:

  1. A qty
  2. 0 1_0-5 7
  3. 1 2_6-11 15
  4. 2 3_12-17 8
  5. 3 4_18-23 34

Groupby does not work for me, while it would order column A as desired, when I would do graphs, order would not be kept.

答案1

得分: 3

使用natsort_key来进行自然排序,不要重新发明轮子:

  1. # pip install natsort
  2. from natsort import natsort_key
  3. out = df.sort_values(by='A', key=natsort_key)

输出:

  1. A qty
  2. 0 0-5 7
  3. 1 6-11 15
  4. 2 12-17 8
  5. 3 18-23 34

或者,只是为了好玩,可以使用numpy.lexsort

  1. out = df.iloc[np.lexsort(df['A'].str.split('-', expand=True)
  2. .astype(int).to_numpy()[:, ::-1].T)]
英文:

Don't reinvent the wheel, use natsort_key for natural sorting:

  1. # pip install natsort
  2. from natsort import natsort_key
  3. out = df.sort_values(by='A', key=natsort_key)

Output:

  1. A qty
  2. 0 0-5 7
  3. 1 6-11 15
  4. 2 12-17 8
  5. 3 18-23 34

Or for fun, using numpy.lexsort:

  1. out = df.iloc[np.lexsort(df['A'].str.split('-', expand=True)
  2. .astype(int).to_numpy()[:, ::-1].T)]

答案2

得分: 1

如果可能的话,可以使用 key 参数在 DataFrame.sort_values 中按第一个整数值进行排序:

  1. out = df.sort_values('A',
  2. key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
  3. ignore_index=True)
  4. print (out)
  5. A qty
  6. 0 0-5 7
  7. 1 6-11 15
  8. 2 12-17 8
  9. 3 18-23 34

或者使用自然排序:

  1. from natsort import natsorted
  2. out = df.sort_values("A", key=natsorted, ignore_index=True)
  3. print (out)
  4. A qty
  5. 0 0-5 7
  6. 1 6-11 8
  7. 2 12-17 34
  8. 3 18-23 15

编辑:如果需要按字符串进行排序,请使用有序的分类数据:

  1. df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})
  2. df['A'] = (pd.Categorical(df['A'],
  3. categories=['john','alice', 'mike','brian'],
  4. ordered=True))
  5. out = df.sort_values('A', ignore_index=True)
  6. print (out)
  7. A qty
  8. 0 john 8
  9. 1 alice 15
  10. 2 mike 7
  11. 3 brian 34
英文:

If possible sort by first integer value is possible use key parameter in DataFrame.sort_values:

  1. out = df.sort_values('A',
  2. key=lambda x: x.str.extract('(\d+)', expand=False).astype(int),
  3. ignore_index=True)
  4. print (out)
  5. A qty
  6. 0 0-5 7
  7. 1 6-11 15
  8. 2 12-17 8
  9. 3 18-23 34

Or use natural sorting:

  1. from natsort import natsorted
  2. out = df.sort_values("A",key=natsorted, ignore_index=True)
  3. print (out)
  4. A qty
  5. 0 0-5 7
  6. 1 6-11 8
  7. 2 12-17 34
  8. 3 18-23 15

EDIT: If need sorting by strings use ordered Categoricals:

  1. df = pd.DataFrame({'A': ['mike','alice', 'john','brian'], 'qty':[7,15,8,34]})
  2. df['A'] = (pd.Categorical(df['A'],
  3. categories=['john','alice', 'mike','brian'],
  4. ordered=True))
  5. out = df.sort_values('A', ignore_index=True)
  6. print (out)
  7. A qty
  8. 0 john 8
  9. 1 alice 15
  10. 2 mike 7
  11. 3 brian 34

答案3

得分: -1

要按照所需的顺序对数据框 df 中的列 A 进行排序,您可以从列 A 中提取数值,对它们进行排序,然后使用排序后的值来对整个数据框进行排序。以下是实现这一目标的代码:

  1. import pandas as pd
  2. df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})
  3. # 从 'A' 列中提取数值
  4. df['sort_key'] = df['A'].str.split('-', expand=True)[0].astype(int)
  5. # 基于 'sort_key' 对数据框进行排序
  6. df = df.sort_values('sort_key')
  7. # 删除 'sort_key' 列
  8. df = df.drop('sort_key', axis=1)
  9. print(df)

输出结果:

  1. A qty
  2. 0 0-5 7
  3. 3 6-11 34
  4. 2 12-17 8
  5. 1 18-23 15
英文:

To sort the column A in the dataframe df in the desired order, you can extract the numerical values from the A column, sort them, and then use the sorted values to sort the entire dataframe. Here's the code to accomplish this:

  1. import pandas as pd
  2. df = pd.DataFrame({'A': ['0-5', '18-23', '12-17', '6-11'], 'qty':[7,15,8,34]})
  3. # Extract numerical values from 'A' column
  4. df['sort_key'] = df['A'].str.split('-', expand=True)[0].astype(int)
  5. # Sort the dataframe based on the 'sort_key'
  6. df = df.sort_values('sort_key')
  7. # Drop the 'sort_key' column
  8. df = df.drop('sort_key', axis=1)
  9. print(df)

Output:

  1. A qty
  2. 0 0-5 7
  3. 3 6-11 34
  4. 2 12-17 8
  5. 1 18-23 15

huangapple
  • 本文由 发表于 2023年7月10日 17:16:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/76652331.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定