2023年5月29日 15:24:57go评论66阅读模式

英文:

Filtering rows based on multiple condition pandas

问题

我有这个表格

id_employee	stat	kind_sco	score
123	h	1	93
123	h	2	76
123	h	3	12
123	h	4	91
456	m	1	64
456	m	2	60
456	m	3	56
456	m	4	90
789	l	1	90
789	l	2	76
789	l	3	89
789	l	4	45

我想根据以下条件筛选行：

groupby(['id_employee', 'stat']) ->
如果 stat == h: 选择 'kind_sco' == 3 的行
如果 stat == m: 选择 'kind_sco' == 1 的行
否则：选择 'kind_sco' == 2 的行

我目前的解决方案是这样的

data[(data['stat']=='l') & (data['kind_sco']==2) |
   (data['stat']=='m') & (data['kind_sco']==1) |
   (data['stat']=='h') & (data['kind_sco']==3)]

所以期望的输出是

id_employee	stat	kind_sco	score
123	h	3	12
456	m	1	64
789	l	2	76

但是如果条件更复杂，这种方法可能不起作用

有没有更简单的解决方案？我考虑过使用 groupby 和 filter() 但不知道如何调整它。

非常感谢！

英文:

I have this table

id_employee	stat	kind_sco	score
123	h	1	93
123	h	2	76
123	h	3	12
123	h	4	91
456	m	1	64
456	m	2	60
456	m	3	56
456	m	4	90
789	l	1	90
789	l	2	76
789	l	3	89
789	l	4	45

And I want to filter the rows based on this condition

groupby([&#39;id_employee&#39;, &#39;stat&#39;]) -&gt; 
if stat == h:  take row with &#39;kind_sco&#39; == 3 
elif stat == m:  take row with &#39;kind_sco&#39; == 1 
else: take row with &#39;kind_sco&#39; == 2

My current solution is like this

data[(data[&#39;stat&#39;]==&#39;l&#39;) &amp; (data[&#39;kind_sco&#39;]==2) |
   (data[&#39;stat&#39;]==&#39;m&#39;) &amp; (data[&#39;kind_sco&#39;]==1) |
   (data[&#39;stat&#39;]==&#39;h&#39;) &amp; (data[&#39;kind_sco&#39;]==3)]

so the expected output is like

id_employee	stat	kind_sco	score
123	h	3	12
456	m	1	64
789	l	2	76

But it will not effective if the condition is more complicated

What is the simpler solution? I already think to use groupby and filter() but dont know how to tweak it.

Thank you in advance.

答案1

得分: 1

这里关键是使用 apply 函数，它会传递每个分组的数据框，以便您可以对其应用您的逻辑（这里是过滤），然后返回该数据。

import pandas as pd

data = [
    {"id_employee": "123", "stat": "h", "kind_sco": "1", "score": "93"},
    {"id_employee": "123", "stat": "h", "kind_sco": "2", "score": "76"},
    {"id_employee": "123", "stat": "h", "kind_sco": "3", "score": "12"},
    {"id_employee": "123", "stat": "h", "kind_sco": "4", "score": "91"},
    {"id_employee": "456", "stat": "m", "kind_sco": "1", "score": "64"},
    {"id_employee": "456", "stat": "m", "kind_sco": "2", "score": "60"},
    {"id_employee": "456", "stat": "m", "kind_sco": "3", "score": "56"},
    {"id_employee": "456", "stat": "m", "kind_sco": "4", "score": "90"},
    {"id_employee": "789", "stat": "l", "kind_sco": "1", "score": "90"},
    {"id_employee": "789", "stat": "l", "kind_sco": "2", "score": "76"},
    {"id_employee": "789", "stat": "l", "kind_sco": "3", "score": "89"},
    {"id_employee": "789", "stat": "l", "kind_sco": "4", "score": "45"},
]

df = pd.DataFrame(data)
df["kind_sco"] = df["kind_sco"].astype(int)  # 转换为整数以进行比较

grouped = df.groupby(["id_employee", "stat"])

def filter_rows_in_each_group(d: pd.DataFrame):
    if (d["stat"] == "h").all():
        return d[d["kind_sco"] == 3]
    elif (d["stat"] == "m").all():
        return d[d["kind_sco"] == 1]
    else:
        return d[d["kind_sco"] == 2]

filtered_data = grouped.apply(filter_rows_in_each_group).reset_index(drop=True)
print(filtered_data)

输出与预期相符

  id_employee stat  kind_sco score
0         123    h         3    12
1         456    m         1    64
2         789    l         2    76

英文:

Here the key is to use apply function, which passes each grouped data frame so that you can "apply" your logic to it (here filtering) and then return that data

import pandas as pd

data = [
    {&quot;id_employee&quot;: &quot;123&quot;, &quot;stat&quot;: &quot;h&quot;, &quot;kind_sco&quot;: &quot;1&quot;, &quot;score&quot;: &quot;93&quot;},
    {&quot;id_employee&quot;: &quot;123&quot;, &quot;stat&quot;: &quot;h&quot;, &quot;kind_sco&quot;: &quot;2&quot;, &quot;score&quot;: &quot;76&quot;},
    {&quot;id_employee&quot;: &quot;123&quot;, &quot;stat&quot;: &quot;h&quot;, &quot;kind_sco&quot;: &quot;3&quot;, &quot;score&quot;: &quot;12&quot;},
    {&quot;id_employee&quot;: &quot;123&quot;, &quot;stat&quot;: &quot;h&quot;, &quot;kind_sco&quot;: &quot;4&quot;, &quot;score&quot;: &quot;91&quot;},
    {&quot;id_employee&quot;: &quot;456&quot;, &quot;stat&quot;: &quot;m&quot;, &quot;kind_sco&quot;: &quot;1&quot;, &quot;score&quot;: &quot;64&quot;},
    {&quot;id_employee&quot;: &quot;456&quot;, &quot;stat&quot;: &quot;m&quot;, &quot;kind_sco&quot;: &quot;2&quot;, &quot;score&quot;: &quot;60&quot;},
    {&quot;id_employee&quot;: &quot;456&quot;, &quot;stat&quot;: &quot;m&quot;, &quot;kind_sco&quot;: &quot;3&quot;, &quot;score&quot;: &quot;56&quot;},
    {&quot;id_employee&quot;: &quot;456&quot;, &quot;stat&quot;: &quot;m&quot;, &quot;kind_sco&quot;: &quot;4&quot;, &quot;score&quot;: &quot;90&quot;},
    {&quot;id_employee&quot;: &quot;789&quot;, &quot;stat&quot;: &quot;l&quot;, &quot;kind_sco&quot;: &quot;1&quot;, &quot;score&quot;: &quot;90&quot;},
    {&quot;id_employee&quot;: &quot;789&quot;, &quot;stat&quot;: &quot;l&quot;, &quot;kind_sco&quot;: &quot;2&quot;, &quot;score&quot;: &quot;76&quot;},
    {&quot;id_employee&quot;: &quot;789&quot;, &quot;stat&quot;: &quot;l&quot;, &quot;kind_sco&quot;: &quot;3&quot;, &quot;score&quot;: &quot;89&quot;},
    {&quot;id_employee&quot;: &quot;789&quot;, &quot;stat&quot;: &quot;l&quot;, &quot;kind_sco&quot;: &quot;4&quot;, &quot;score&quot;: &quot;45&quot;},
]


df = pd.DataFrame(data)
df[&quot;kind_sco&quot;] = df[&quot;kind_sco&quot;].astype(int)  # convert to int for comparision

grouped = df.groupby([&quot;id_employee&quot;, &quot;stat&quot;])


def filter_rows_in_each_group(d: pd.DataFrame):
    if (d[&quot;stat&quot;] == &quot;h&quot;).all():
        return d[d[&quot;kind_sco&quot;] == 3]
    elif (d[&quot;stat&quot;] == &quot;m&quot;).all():
        return d[d[&quot;kind_sco&quot;] == 1]
    else:
        return d[d[&quot;kind_sco&quot;] == 2]


filtered_data = grouped.apply(filter_rows_in_each_group).reset_index(drop=True)
print(filtered_data)

The output is as expected

  id_employee stat  kind_sco score
0         123    h         3    12
1         456    m         1    64
2         789    l         2    76

答案2

得分: 1

也许你可以使用mapper和apply来创建一些内容？

mapper = {
    "l": data["kind_sco"].eq(2),
    "m": data["kind_sco"].eq(1),
    "h": data["kind_sco"].eq(3)
    # 在这里添加更多的分组/条件..
}

out = (
    data.groupby(["id_employee", "stat"], group_keys=False)
        .apply(lambda g: g.loc[mapper.get(g.name[1])])
)

输出:

print(out)

   id_employee stat  kind_sco  score
2          123    h         3     12
4          456    m         1     64
9          789    l         2     76

英文:

Maybe you can make smth up with a mapper and apply ?

mapper = {
&quot;l&quot;: data[&quot;kind_sco&quot;].eq(2),
&quot;m&quot;: data[&quot;kind_sco&quot;].eq(1),
&quot;h&quot;: data[&quot;kind_sco&quot;].eq(3)
# add here more groups/conditions ..
}
out = (
data.groupby([&quot;id_employee&quot;, &quot;stat&quot;], group_keys=False)
.apply(lambda g: g.loc[mapper.get(g.name[1])])
)

Output :

print(out)
id_employee stat  kind_sco  score
2          123    h         3     12
4          456    m         1     64
9          789    l         2     76

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

基于多个条件筛选行的Pandas操作

问题

答案1

答案2

将图像分类模型转化为分层模型

BigQuery Cloud Function 的入口点是什么？

找到符合给定单词的列表中单词的所有排列。

如何在tkinter中使用Python生成器函数？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论