按照包括空字段在内的数据框排序。

huangapple go评论102阅读模式
英文:

Sort dataframe include empty field

问题

你可以使用以下代码来对DataFrame进行排序,并将空字段移到顶部:

  1. import pandas as pd
  2. data = {'Test': [' ', ' ', 'K', ],
  3. 'Name': ['A', 'B', 'B', 'B'],
  4. 'value': ['D1', 'A1', ' ', 'C1'],
  5. 'time': [227, 227, 227, 230]}
  6. df = pd.DataFrame(data)
  7. # 先将空字段替换为某个特殊值,然后进行排序
  8. df['Test'].replace(' ', 'K', inplace=True)
  9. df.sort_values(by=['Test', 'Name', 'value'], na_position='first', inplace=True)
  10. df['Test'].replace('K', ' ', inplace=True)
  11. print(df)

这将给你所需的排序结果。

英文:

I have a dataframe as below:

  1. data = {'Test': [' ', ' ', 'K', ],
  2. 'Name': ['A', 'B', 'B', 'B'],
  3. 'value': ['D1', 'A1', ' ', 'C1'],
  4. 'time': [227, 227, 227, 230]}
  5. df = pd.DataFrame(data)
  1. Test Name value time
  2. 0 A D1 227
  3. 1 B A1 227
  4. 2 K B 227
  5. 3 B C1 230

And I want to make the df to sort as:

  1. Test Name value time
  2. 0 A D1 227
  3. 1 K B 227
  4. 2 B A1 227
  5. 3 B C1 230

I've tried using sort_values, but still can't figure it out. Or should I add more condition for the empty field ' '(or NA) in sort?

答案1

得分: 1

Sure, here's the translated code:

  1. import pandas as pd
  2. data = {'Test': [' ', ' ', 'K', ' '],
  3. 'Name': ['A', 'B', 'B', 'B'],
  4. 'value': ['D1', 'A1', ' ', 'C1'],
  5. 'time': [227, 227, 227, 230]}
  6. df = pd.DataFrame(data)
  7. df["Test"] = df["Test"].replace(" ", "Zzzzz") # 将空格替换为 Zzzzz,以便将它们排在最后
  8. df = df.sort_values(by=["Name", "time", "Test"])
  9. df["Test"] = df["Test"].replace("Zzzzz", " ") # 将 Zzzzz 替换回空格
  10. # Test Name value time
  11. # 0 A D1 227
  12. # 1 K B 227
  13. # 2 B A1 227
  14. # 3 B C1 230

Is there anything else you need?

英文:
  1. import pandas as pd
  2. data = {'Test': [' ', ' ', 'K', ' '],
  3. 'Name': ['A', 'B', 'B', 'B'],
  4. 'value': ['D1', 'A1', ' ', 'C1'],
  5. 'time': [227, 227, 227, 230]}
  6. df = pd.DataFrame(data)
  7. df["Test"] = df["Test"].replace(" ", "Zzzzz") #Replace whitespaces with zs which will sort them last
  8. df = df.sort_values(by=["Name","time","Test"])
  9. df["Test"] = df["Test"].replace("Zzzzz", " ")
  10. # Test Name value time
  11. # 0 A D1 227
  12. # 1 K B 227
  13. # 2 B A1 227
  14. # 3 B C1 230

答案2

得分: 1

不要使用空字符串/空格表示空单元格,而是使用可以直接由 sort_values 处理的 NAs/NaNs:

  1. out = (df.replace(' ', pd.NA)
  2. .sort_values(by=['Name', 'time', 'Test'])
  3. )

等同于:

  1. out = (df.replace(' ', pd.NA)
  2. .sort_values(by=['Name', 'time', 'Test'], na_position='last')
  3. )

输出:

  1. Test Name value time
  2. 0 <NA> A D1 227
  3. 2 K B <NA> 227
  4. 1 <NA> B A1 227
  5. 3 <NA> B C1 230
英文:

Better not use empty strings/spaces to denote empty cells. Use NAs/NaNs that are directly handled by sort_values:

  1. out = (df.replace(&#39; &#39;, pd.NA)
  2. .sort_values(by=[&#39;Name&#39;, &#39;time&#39;, &#39;Test&#39;])
  3. )

Which is equivalents to:

  1. out = (df.replace(&#39; &#39;, pd.NA)
  2. .sort_values(by=[&#39;Name&#39;, &#39;time&#39;, &#39;Test&#39;], na_position=&#39;last&#39;)
  3. )

Output:

  1. Test Name value time
  2. 0 &lt;NA&gt; A D1 227
  3. 2 K B &lt;NA&gt; 227
  4. 1 &lt;NA&gt; B A1 227
  5. 3 &lt;NA&gt; B C1 230

huangapple
  • 本文由 发表于 2023年6月13日 01:07:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76458875.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定