如何对包含字符串和整数的列中的整数进行数据框排序?

huangapple go评论94阅读模式
英文:

How do you sort a dataframe with the integer in a column with strings and integers on every row?

问题

  1. df['a'] = df['a'].str.extract('(\d+)').astype(int)
  2. df = df.sort_values(by=['a'], ignore_index=True)
英文:

How would you sort the following dataframe:

  1. df = pd.DataFrame({'a':['abc_1.2.6','abc_1.2.60','abc_1.2.7','abc_1.2.9','abc_1.3.0','abc_1.3.10','abc_1.3.100','abc_1.3.11'], 'b':[1,2,3,4,5,6,7,8]})
  2. >>>
  3. a b
  4. 0 abc_1.2.6 1
  5. 1 abc_1.2.60 2
  6. 2 abc_1.2.7 3
  7. 3 abc_1.2.9 4
  8. 4 abc_1.3.0 5
  9. 5 abc_1.3.10 6
  10. 6 abc_1.3.100 7
  11. 7 abc_1.3.11 8

to achieve this output?

  1. >>>
  2. a b
  3. 0 abc_1.2.6 1
  4. 1 abc_1.2.7 3
  5. 2 abc_1.2.9 4
  6. 3 abc_1.2.60 2
  7. 4 abc_1.3.0 5
  8. 5 abc_1.3.10 6
  9. 6 abc_1.3.11 8
  10. 7 abc_1.3.100 7

I understand that integers in strings can be accessed through string transformations, however I'm unsure how to handle this in a dataframe. Obviously df.sort_values(by=['a'],ignore_index=True) is unhelpful in this case.

答案1

得分: 2

One way to use is natsorted with iloc :

  1. #pip install natsort
  2. from natsort import natsorted
  3. out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]

Or even shorter, as suggested by @Stef, use natsort_key as a key of sort_values :

  1. from natsort import natsort_key
  2. out = df.sort_values(by="a", key=natsort_key, ignore_index=True)

Output :

  1. print(out)
  2. a b
  3. 0 abc_1.2.6 1
  4. 1 abc_1.2.7 3
  5. 2 abc_1.2.9 4
  6. 3 abc_1.2.60 2
  7. 4 abc_1.3.0 5
  8. 5 abc_1.3.10 6
  9. 6 abc_1.3.11 8
  10. 7 abc_1.3.100 7
英文:

One way to use is natsorted with iloc :

  1. #pip install natsort
  2. from natsort import natsorted
  3. out = df.iloc[natsorted(range(len(df)), key=lambda x: df.loc[x, "a"])]

Or even shorter, as suggested by @Stef, use natsort_key as a key of sort_values :

  1. from natsort import natsort_key
  2. out = df.sort_values(by="a", key=natsort_key, ignore_index=True)

Output :

  1. print(out)
  2. a b
  3. 0 abc_1.2.6 1
  4. 1 abc_1.2.7 3
  5. 2 abc_1.2.9 4
  6. 3 abc_1.2.60 2
  7. 4 abc_1.3.0 5
  8. 5 abc_1.3.10 6
  9. 6 abc_1.3.11 8
  10. 7 abc_1.3.100 7

答案2

得分: 1

你可以在排序之前对值应用key函数:

  1. df = (df.sort_values(by=['a'], ignore_index=True,
  2. key=lambda x: x.map(lambda v:
  3. tuple(map(int, v[4:].split('.'))))))

  1. a b
  2. 0 abc_1.2.6 1
  3. 1 abc_1.2.7 3
  4. 2 abc_1.2.9 4
  5. 3 abc_1.2.60 2
  6. 4 abc_1.3.0 5
  7. 5 abc_1.3.10 6
  8. 6 abc_1.3.11 8
  9. 7 abc_1.3.100 7
英文:

You can apply the key function to the values before sorting:

  1. df = (df.sort_values(by=['a'], ignore_index=True,
  2. key=lambda x: x.map(lambda v:
  3. tuple(map(int, v[4:].split('.'))))))

  1. a b
  2. 0 abc_1.2.6 1
  3. 1 abc_1.2.7 3
  4. 2 abc_1.2.9 4
  5. 3 abc_1.2.60 2
  6. 4 abc_1.3.0 5
  7. 5 abc_1.3.10 6
  8. 6 abc_1.3.11 8
  9. 7 abc_1.3.100 7

答案3

得分: 1

这是另一种方法,使用str.findall()explode()

  1. df.sort_values('a', key=lambda x: x.str.findall(r'\d+').explode().astype(int).groupby(level=0).agg(tuple))

输出:

  1. a b
  2. 0 abc_1.2.6 1
  3. 2 abc_1.2.7 3
  4. 3 abc_1.2.9 4
  5. 1 abc_1.2.60 2
  6. 4 abc_1.3.0 5
  7. 5 abc_1.3.10 6
  8. 7 abc_1.3.11 8
  9. 6 abc_1.3.100 7
英文:

Here is another way by using str.findall() and explode()

  1. df.sort_values('a',key = lambda x: x.str.findall(r'\d+').explode().astype(int).groupby(level=0).agg(tuple))

Output:

  1. a b
  2. 0 abc_1.2.6 1
  3. 2 abc_1.2.7 3
  4. 3 abc_1.2.9 4
  5. 1 abc_1.2.60 2
  6. 4 abc_1.3.0 5
  7. 5 abc_1.3.10 6
  8. 7 abc_1.3.11 8
  9. 6 abc_1.3.100 7

huangapple
  • 本文由 发表于 2023年4月19日 21:24:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76055072.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定