按照包括空字段在内的数据框排序。

huangapple go评论63阅读模式
英文:

Sort dataframe include empty field

问题

你可以使用以下代码来对DataFrame进行排序,并将空字段移到顶部:

import pandas as pd

data = {'Test': [' ', ' ', 'K', ],
        'Name': ['A', 'B', 'B', 'B'],
        'value': ['D1', 'A1', ' ', 'C1'],
        'time': [227, 227, 227, 230]}

df = pd.DataFrame(data)

# 先将空字段替换为某个特殊值,然后进行排序
df['Test'].replace(' ', 'K', inplace=True)
df.sort_values(by=['Test', 'Name', 'value'], na_position='first', inplace=True)
df['Test'].replace('K', ' ', inplace=True)

print(df)

这将给你所需的排序结果。

英文:

I have a dataframe as below:

data = {'Test': [' ', ' ', 'K', ],
        'Name': ['A', 'B', 'B', 'B'],
        'value': ['D1', 'A1', ' ', 'C1'],
        'time': [227, 227, 227, 230]}
df = pd.DataFrame(data)
  Test 	Name value 	time
0 		 A 	  D1 	227
1 		 B 	  A1 	227
2 	K 	 B 		    227
3 		 B 	  C1 	230

And I want to make the df to sort as:

  Test 	Name value 	time
0 		 A 	  D1 	227
1 	K	 B 	     	227
2 	 	 B 	  A1    227
3 		 B 	  C1 	230

I've tried using sort_values, but still can't figure it out. Or should I add more condition for the empty field ' '(or NA) in sort?

答案1

得分: 1

Sure, here's the translated code:

import pandas as pd

data = {'Test': [' ', ' ', 'K', ' '],
        'Name': ['A', 'B', 'B', 'B'],
        'value': ['D1', 'A1', ' ', 'C1'],
        'time': [227, 227, 227, 230]}
df = pd.DataFrame(data)

df["Test"] = df["Test"].replace(" ", "Zzzzz")  # 将空格替换为 Zzzzz,以便将它们排在最后
df = df.sort_values(by=["Name", "time", "Test"])
df["Test"] = df["Test"].replace("Zzzzz", " ")  # 将 Zzzzz 替换回空格

#   Test  Name value  time
# 0        A    D1    227
# 1   K    B          227
# 2        B    A1    227
# 3        B    C1    230

Is there anything else you need?

英文:
import pandas as pd

data = {'Test': [' ', ' ', 'K', ' '],
        'Name': ['A', 'B', 'B', 'B'],
        'value': ['D1', 'A1', ' ', 'C1'],
        'time': [227, 227, 227, 230]}
df = pd.DataFrame(data)

df["Test"] = df["Test"].replace(" ", "Zzzzz") #Replace whitespaces with zs which will sort them last
df = df.sort_values(by=["Name","time","Test"])
df["Test"] = df["Test"].replace("Zzzzz", " ") 

#   Test  Name value  time
# 0        A    D1    227
# 1   K    B          227
# 2        B    A1    227
# 3        B    C1    230

答案2

得分: 1

不要使用空字符串/空格表示空单元格,而是使用可以直接由 sort_values 处理的 NAs/NaNs:

out = (df.replace(' ', pd.NA)
         .sort_values(by=['Name', 'time', 'Test'])
       )

等同于:

out = (df.replace(' ', pd.NA)
         .sort_values(by=['Name', 'time', 'Test'], na_position='last')
       )

输出:

   Test Name value  time
0  <NA>    A    D1   227
2     K    B  <NA>   227
1  <NA>    B    A1   227
3  <NA>    B    C1   230
英文:

Better not use empty strings/spaces to denote empty cells. Use NAs/NaNs that are directly handled by sort_values:

out = (df.replace(&#39; &#39;, pd.NA)
         .sort_values(by=[&#39;Name&#39;, &#39;time&#39;, &#39;Test&#39;])
       )

Which is equivalents to:

out = (df.replace(&#39; &#39;, pd.NA)
         .sort_values(by=[&#39;Name&#39;, &#39;time&#39;, &#39;Test&#39;], na_position=&#39;last&#39;)
       )

Output:

   Test Name value  time
0  &lt;NA&gt;    A    D1   227
2     K    B  &lt;NA&gt;   227
1  &lt;NA&gt;    B    A1   227
3  &lt;NA&gt;    B    C1   230

huangapple
  • 本文由 发表于 2023年6月13日 01:07:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76458875.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定