基于多个条件筛选行的Pandas操作

huangapple go评论62阅读模式
英文:

Filtering rows based on multiple condition pandas

问题

我有这个表格

id_employee stat kind_sco score
123 h 1 93
123 h 2 76
123 h 3 12
123 h 4 91
456 m 1 64
456 m 2 60
456 m 3 56
456 m 4 90
789 l 1 90
789 l 2 76
789 l 3 89
789 l 4 45

我想根据以下条件筛选行:

groupby(['id_employee', 'stat']) ->
如果 stat == h: 选择 'kind_sco' == 3 的行
如果 stat == m: 选择 'kind_sco' == 1 的行
否则:选择 'kind_sco' == 2 的行

我目前的解决方案是这样的

data[(data['stat']=='l') & (data['kind_sco']==2) |
   (data['stat']=='m') & (data['kind_sco']==1) |
   (data['stat']=='h') & (data['kind_sco']==3)]

所以期望的输出是

id_employee stat kind_sco score
123 h 3 12
456 m 1 64
789 l 2 76

但是如果条件更复杂,这种方法可能不起作用

有没有更简单的解决方案?我考虑过使用 groupby 和 filter() 但不知道如何调整它。

非常感谢!

英文:

I have this table

id_employee stat kind_sco score
123 h 1 93
123 h 2 76
123 h 3 12
123 h 4 91
456 m 1 64
456 m 2 60
456 m 3 56
456 m 4 90
789 l 1 90
789 l 2 76
789 l 3 89
789 l 4 45

And I want to filter the rows based on this condition

groupby(['id_employee', 'stat']) -> 
if stat == h:  take row with 'kind_sco' == 3 
elif stat == m:  take row with 'kind_sco' == 1 
else: take row with 'kind_sco' == 2

My current solution is like this

data[(data['stat']=='l') & (data['kind_sco']==2) |
   (data['stat']=='m') & (data['kind_sco']==1) |
   (data['stat']=='h') & (data['kind_sco']==3)]

so the expected output is like

id_employee stat kind_sco score
123 h 3 12
456 m 1 64
789 l 2 76

But it will not effective if the condition is more complicated

What is the simpler solution? I already think to use groupby and filter() but dont know how to tweak it.

Thank you in advance.

答案1

得分: 1

这里关键是使用 apply 函数,它会传递每个分组的数据框,以便您可以对其应用您的逻辑(这里是过滤),然后返回该数据。

import pandas as pd

data = [
    {"id_employee": "123", "stat": "h", "kind_sco": "1", "score": "93"},
    {"id_employee": "123", "stat": "h", "kind_sco": "2", "score": "76"},
    {"id_employee": "123", "stat": "h", "kind_sco": "3", "score": "12"},
    {"id_employee": "123", "stat": "h", "kind_sco": "4", "score": "91"},
    {"id_employee": "456", "stat": "m", "kind_sco": "1", "score": "64"},
    {"id_employee": "456", "stat": "m", "kind_sco": "2", "score": "60"},
    {"id_employee": "456", "stat": "m", "kind_sco": "3", "score": "56"},
    {"id_employee": "456", "stat": "m", "kind_sco": "4", "score": "90"},
    {"id_employee": "789", "stat": "l", "kind_sco": "1", "score": "90"},
    {"id_employee": "789", "stat": "l", "kind_sco": "2", "score": "76"},
    {"id_employee": "789", "stat": "l", "kind_sco": "3", "score": "89"},
    {"id_employee": "789", "stat": "l", "kind_sco": "4", "score": "45"},
]

df = pd.DataFrame(data)
df["kind_sco"] = df["kind_sco"].astype(int)  # 转换为整数以进行比较

grouped = df.groupby(["id_employee", "stat"])

def filter_rows_in_each_group(d: pd.DataFrame):
    if (d["stat"] == "h").all():
        return d[d["kind_sco"] == 3]
    elif (d["stat"] == "m").all():
        return d[d["kind_sco"] == 1]
    else:
        return d[d["kind_sco"] == 2]

filtered_data = grouped.apply(filter_rows_in_each_group).reset_index(drop=True)
print(filtered_data)

输出与预期相符

  id_employee stat  kind_sco score
0         123    h         3    12
1         456    m         1    64
2         789    l         2    76
英文:

Here the key is to use apply function, which passes each grouped data frame so that you can "apply" your logic to it (here filtering) and then return that data

import pandas as pd

data = [
    {"id_employee": "123", "stat": "h", "kind_sco": "1", "score": "93"},
    {"id_employee": "123", "stat": "h", "kind_sco": "2", "score": "76"},
    {"id_employee": "123", "stat": "h", "kind_sco": "3", "score": "12"},
    {"id_employee": "123", "stat": "h", "kind_sco": "4", "score": "91"},
    {"id_employee": "456", "stat": "m", "kind_sco": "1", "score": "64"},
    {"id_employee": "456", "stat": "m", "kind_sco": "2", "score": "60"},
    {"id_employee": "456", "stat": "m", "kind_sco": "3", "score": "56"},
    {"id_employee": "456", "stat": "m", "kind_sco": "4", "score": "90"},
    {"id_employee": "789", "stat": "l", "kind_sco": "1", "score": "90"},
    {"id_employee": "789", "stat": "l", "kind_sco": "2", "score": "76"},
    {"id_employee": "789", "stat": "l", "kind_sco": "3", "score": "89"},
    {"id_employee": "789", "stat": "l", "kind_sco": "4", "score": "45"},
]


df = pd.DataFrame(data)
df["kind_sco"] = df["kind_sco"].astype(int)  # convert to int for comparision

grouped = df.groupby(["id_employee", "stat"])


def filter_rows_in_each_group(d: pd.DataFrame):
    if (d["stat"] == "h").all():
        return d[d["kind_sco"] == 3]
    elif (d["stat"] == "m").all():
        return d[d["kind_sco"] == 1]
    else:
        return d[d["kind_sco"] == 2]


filtered_data = grouped.apply(filter_rows_in_each_group).reset_index(drop=True)
print(filtered_data)

The output is as expected

  id_employee stat  kind_sco score
0         123    h         3    12
1         456    m         1    64
2         789    l         2    76

答案2

得分: 1

也许你可以使用mapperapply来创建一些内容?

mapper = {
    "l": data["kind_sco"].eq(2),
    "m": data["kind_sco"].eq(1),
    "h": data["kind_sco"].eq(3)
    # 在这里添加更多的分组/条件..
}

out = (
    data.groupby(["id_employee", "stat"], group_keys=False)
        .apply(lambda g: g.loc[mapper.get(g.name[1])])
)

输出:

print(out)

   id_employee stat  kind_sco  score
2          123    h         3     12
4          456    m         1     64
9          789    l         2     76
英文:

Maybe you can make smth up with a mapper and apply ?

mapper = {
"l": data["kind_sco"].eq(2),
"m": data["kind_sco"].eq(1),
"h": data["kind_sco"].eq(3)
# add here more groups/conditions ..
}
out = (
data.groupby(["id_employee", "stat"], group_keys=False)
.apply(lambda g: g.loc[mapper.get(g.name[1])])
)

Output :

print(out)
id_employee stat  kind_sco  score
2          123    h         3     12
4          456    m         1     64
9          789    l         2     76

huangapple
  • 本文由 发表于 2023年5月29日 15:24:57
  • 转载请务必保留本文链接:https://go.coder-hub.com/76355390.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定