英文:
Sort dataframe include empty field
问题
你可以使用以下代码来对DataFrame进行排序,并将空字段移到顶部:
import pandas as pd
data = {'Test': [' ', ' ', 'K', ],
'Name': ['A', 'B', 'B', 'B'],
'value': ['D1', 'A1', ' ', 'C1'],
'time': [227, 227, 227, 230]}
df = pd.DataFrame(data)
# 先将空字段替换为某个特殊值,然后进行排序
df['Test'].replace(' ', 'K', inplace=True)
df.sort_values(by=['Test', 'Name', 'value'], na_position='first', inplace=True)
df['Test'].replace('K', ' ', inplace=True)
print(df)
这将给你所需的排序结果。
英文:
I have a dataframe as below:
data = {'Test': [' ', ' ', 'K', ],
'Name': ['A', 'B', 'B', 'B'],
'value': ['D1', 'A1', ' ', 'C1'],
'time': [227, 227, 227, 230]}
df = pd.DataFrame(data)
Test Name value time
0 A D1 227
1 B A1 227
2 K B 227
3 B C1 230
And I want to make the df to sort as:
Test Name value time
0 A D1 227
1 K B 227
2 B A1 227
3 B C1 230
I've tried using sort_values
, but still can't figure it out. Or should I add more condition for the empty field ' '(or NA) in sort?
答案1
得分: 1
Sure, here's the translated code:
import pandas as pd
data = {'Test': [' ', ' ', 'K', ' '],
'Name': ['A', 'B', 'B', 'B'],
'value': ['D1', 'A1', ' ', 'C1'],
'time': [227, 227, 227, 230]}
df = pd.DataFrame(data)
df["Test"] = df["Test"].replace(" ", "Zzzzz") # 将空格替换为 Zzzzz,以便将它们排在最后
df = df.sort_values(by=["Name", "time", "Test"])
df["Test"] = df["Test"].replace("Zzzzz", " ") # 将 Zzzzz 替换回空格
# Test Name value time
# 0 A D1 227
# 1 K B 227
# 2 B A1 227
# 3 B C1 230
Is there anything else you need?
英文:
import pandas as pd
data = {'Test': [' ', ' ', 'K', ' '],
'Name': ['A', 'B', 'B', 'B'],
'value': ['D1', 'A1', ' ', 'C1'],
'time': [227, 227, 227, 230]}
df = pd.DataFrame(data)
df["Test"] = df["Test"].replace(" ", "Zzzzz") #Replace whitespaces with zs which will sort them last
df = df.sort_values(by=["Name","time","Test"])
df["Test"] = df["Test"].replace("Zzzzz", " ")
# Test Name value time
# 0 A D1 227
# 1 K B 227
# 2 B A1 227
# 3 B C1 230
答案2
得分: 1
不要使用空字符串/空格表示空单元格,而是使用可以直接由 sort_values
处理的 NAs/NaNs:
out = (df.replace(' ', pd.NA)
.sort_values(by=['Name', 'time', 'Test'])
)
等同于:
out = (df.replace(' ', pd.NA)
.sort_values(by=['Name', 'time', 'Test'], na_position='last')
)
输出:
Test Name value time
0 <NA> A D1 227
2 K B <NA> 227
1 <NA> B A1 227
3 <NA> B C1 230
英文:
Better not use empty strings/spaces to denote empty cells. Use NAs/NaNs that are directly handled by sort_values
:
out = (df.replace(' ', pd.NA)
.sort_values(by=['Name', 'time', 'Test'])
)
Which is equivalents to:
out = (df.replace(' ', pd.NA)
.sort_values(by=['Name', 'time', 'Test'], na_position='last')
)
Output:
Test Name value time
0 <NA> A D1 227
2 K B <NA> 227
1 <NA> B A1 227
3 <NA> B C1 230
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论