英文:
Filtering rows based on multiple condition pandas
问题
我有这个表格
id_employee | stat | kind_sco | score |
---|---|---|---|
123 | h | 1 | 93 |
123 | h | 2 | 76 |
123 | h | 3 | 12 |
123 | h | 4 | 91 |
456 | m | 1 | 64 |
456 | m | 2 | 60 |
456 | m | 3 | 56 |
456 | m | 4 | 90 |
789 | l | 1 | 90 |
789 | l | 2 | 76 |
789 | l | 3 | 89 |
789 | l | 4 | 45 |
我想根据以下条件筛选行:
groupby(['id_employee', 'stat']) ->
如果 stat == h: 选择 'kind_sco' == 3 的行
如果 stat == m: 选择 'kind_sco' == 1 的行
否则:选择 'kind_sco' == 2 的行
我目前的解决方案是这样的
data[(data['stat']=='l') & (data['kind_sco']==2) |
(data['stat']=='m') & (data['kind_sco']==1) |
(data['stat']=='h') & (data['kind_sco']==3)]
所以期望的输出是
id_employee | stat | kind_sco | score |
---|---|---|---|
123 | h | 3 | 12 |
456 | m | 1 | 64 |
789 | l | 2 | 76 |
但是如果条件更复杂,这种方法可能不起作用
有没有更简单的解决方案?我考虑过使用 groupby 和 filter() 但不知道如何调整它。
非常感谢!
英文:
I have this table
id_employee | stat | kind_sco | score |
---|---|---|---|
123 | h | 1 | 93 |
123 | h | 2 | 76 |
123 | h | 3 | 12 |
123 | h | 4 | 91 |
456 | m | 1 | 64 |
456 | m | 2 | 60 |
456 | m | 3 | 56 |
456 | m | 4 | 90 |
789 | l | 1 | 90 |
789 | l | 2 | 76 |
789 | l | 3 | 89 |
789 | l | 4 | 45 |
And I want to filter the rows based on this condition
groupby(['id_employee', 'stat']) ->
if stat == h: take row with 'kind_sco' == 3
elif stat == m: take row with 'kind_sco' == 1
else: take row with 'kind_sco' == 2
My current solution is like this
data[(data['stat']=='l') & (data['kind_sco']==2) |
(data['stat']=='m') & (data['kind_sco']==1) |
(data['stat']=='h') & (data['kind_sco']==3)]
so the expected output is like
id_employee | stat | kind_sco | score |
---|---|---|---|
123 | h | 3 | 12 |
456 | m | 1 | 64 |
789 | l | 2 | 76 |
But it will not effective if the condition is more complicated
What is the simpler solution? I already think to use groupby and filter() but dont know how to tweak it.
Thank you in advance.
答案1
得分: 1
这里关键是使用 apply
函数,它会传递每个分组的数据框,以便您可以对其应用您的逻辑(这里是过滤),然后返回该数据。
import pandas as pd
data = [
{"id_employee": "123", "stat": "h", "kind_sco": "1", "score": "93"},
{"id_employee": "123", "stat": "h", "kind_sco": "2", "score": "76"},
{"id_employee": "123", "stat": "h", "kind_sco": "3", "score": "12"},
{"id_employee": "123", "stat": "h", "kind_sco": "4", "score": "91"},
{"id_employee": "456", "stat": "m", "kind_sco": "1", "score": "64"},
{"id_employee": "456", "stat": "m", "kind_sco": "2", "score": "60"},
{"id_employee": "456", "stat": "m", "kind_sco": "3", "score": "56"},
{"id_employee": "456", "stat": "m", "kind_sco": "4", "score": "90"},
{"id_employee": "789", "stat": "l", "kind_sco": "1", "score": "90"},
{"id_employee": "789", "stat": "l", "kind_sco": "2", "score": "76"},
{"id_employee": "789", "stat": "l", "kind_sco": "3", "score": "89"},
{"id_employee": "789", "stat": "l", "kind_sco": "4", "score": "45"},
]
df = pd.DataFrame(data)
df["kind_sco"] = df["kind_sco"].astype(int) # 转换为整数以进行比较
grouped = df.groupby(["id_employee", "stat"])
def filter_rows_in_each_group(d: pd.DataFrame):
if (d["stat"] == "h").all():
return d[d["kind_sco"] == 3]
elif (d["stat"] == "m").all():
return d[d["kind_sco"] == 1]
else:
return d[d["kind_sco"] == 2]
filtered_data = grouped.apply(filter_rows_in_each_group).reset_index(drop=True)
print(filtered_data)
输出与预期相符
id_employee stat kind_sco score
0 123 h 3 12
1 456 m 1 64
2 789 l 2 76
英文:
Here the key is to use apply
function, which passes each grouped data frame so that you can "apply" your logic to it (here filtering) and then return that data
import pandas as pd
data = [
{"id_employee": "123", "stat": "h", "kind_sco": "1", "score": "93"},
{"id_employee": "123", "stat": "h", "kind_sco": "2", "score": "76"},
{"id_employee": "123", "stat": "h", "kind_sco": "3", "score": "12"},
{"id_employee": "123", "stat": "h", "kind_sco": "4", "score": "91"},
{"id_employee": "456", "stat": "m", "kind_sco": "1", "score": "64"},
{"id_employee": "456", "stat": "m", "kind_sco": "2", "score": "60"},
{"id_employee": "456", "stat": "m", "kind_sco": "3", "score": "56"},
{"id_employee": "456", "stat": "m", "kind_sco": "4", "score": "90"},
{"id_employee": "789", "stat": "l", "kind_sco": "1", "score": "90"},
{"id_employee": "789", "stat": "l", "kind_sco": "2", "score": "76"},
{"id_employee": "789", "stat": "l", "kind_sco": "3", "score": "89"},
{"id_employee": "789", "stat": "l", "kind_sco": "4", "score": "45"},
]
df = pd.DataFrame(data)
df["kind_sco"] = df["kind_sco"].astype(int) # convert to int for comparision
grouped = df.groupby(["id_employee", "stat"])
def filter_rows_in_each_group(d: pd.DataFrame):
if (d["stat"] == "h").all():
return d[d["kind_sco"] == 3]
elif (d["stat"] == "m").all():
return d[d["kind_sco"] == 1]
else:
return d[d["kind_sco"] == 2]
filtered_data = grouped.apply(filter_rows_in_each_group).reset_index(drop=True)
print(filtered_data)
The output is as expected
id_employee stat kind_sco score
0 123 h 3 12
1 456 m 1 64
2 789 l 2 76
答案2
得分: 1
也许你可以使用mapper和apply
来创建一些内容?
mapper = {
"l": data["kind_sco"].eq(2),
"m": data["kind_sco"].eq(1),
"h": data["kind_sco"].eq(3)
# 在这里添加更多的分组/条件..
}
out = (
data.groupby(["id_employee", "stat"], group_keys=False)
.apply(lambda g: g.loc[mapper.get(g.name[1])])
)
输出:
print(out)
id_employee stat kind_sco score
2 123 h 3 12
4 456 m 1 64
9 789 l 2 76
英文:
Maybe you can make smth up with a mapper and apply
?
mapper = {
"l": data["kind_sco"].eq(2),
"m": data["kind_sco"].eq(1),
"h": data["kind_sco"].eq(3)
# add here more groups/conditions ..
}
out = (
data.groupby(["id_employee", "stat"], group_keys=False)
.apply(lambda g: g.loc[mapper.get(g.name[1])])
)
Output :
print(out)
id_employee stat kind_sco score
2 123 h 3 12
4 456 m 1 64
9 789 l 2 76
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论