2023年6月11日 22:28:01go评论57阅读模式

英文:

Two conditional clause count in pandas

问题

我想计算唯一的 api_spec_id 数量，条件如下：

所有 type_of_change 均为 breaking，预期输出为 2（id: 678, 123）
所有 type_of_change 均为 non-breaking，预期输出为 1（id: 345）
至少有一个 type_of_change 是 breaking，预期输出为 3（id: 213, 678, 123）
至少有一个 type_of_change 是 non-breaking，预期输出为 2（id: 213, 345）

你不确定如何实现吗？以下是我的建议：

# 导入必要的库
import pandas as pd

# 创建DataFrame
data = {
    'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    'type_of_change': ['Breaking', 'Breaking', 'Non-Breaking', 'Non-Breaking', 'Non-Breaking', 'Non-Breaking', 'Breaking', 'Breaking', 'Breaking', 'Breaking', 'Breaking'],
    'label': [None, 'major', 'patch', None, 'patch', 'patch', None, 'minor', 'major', None, None]
}

df = pd.DataFrame(data)

# 使用groupby和条件筛选来计算结果
result = df.groupby('api_spec_id').filter(lambda x: all(x['type_of_change'] == 'Breaking')).nunique()['api_spec_id']
print("所有 'type_of_change' 均为 breaking 的唯一数量:", result)

result = df.groupby('api_spec_id').filter(lambda x: all(x['type_of_change'] == 'Non-Breaking')).nunique()['api_spec_id']
print("所有 'type_of_change' 均为 non-breaking 的唯一数量:", result)

result = df.groupby('api_spec_id').filter(lambda x: any(x['type_of_change'] == 'Breaking')).nunique()['api_spec_id']
print("至少有一个 'type_of_change' 是 breaking 的唯一数量:", result)

result = df.groupby('api_spec_id').filter(lambda x: any(x['type_of_change'] == 'Non-Breaking')).nunique()['api_spec_id']
print("至少有一个 'type_of_change' 是 non-breaking 的唯一数量:", result)

上述代码可以根据你的条件计算出唯一的 api_spec_id 数量。

英文:

I have a df which looks like this:

api_spec_id  type_of_change   label
213              Breaking      NaN
213              Breaking      major
213              Non-Breaking  patch
345              Non-Breaking  NaN
345              Non-Breaking  patch
345              Non-Breaking  patch
678              Breaking      NaN
678              Breaking      minor
678              Breaking      major
123              Breaking      NaN
123              Breaking      NaN

I want to calculate the unique number of api_spec_id where:

all type_of_change are breaking, where expected output would be 2 (ids: 678,123)

all type_of_change are non-breaking, where expected output would be 1 (id:345)

at least one type_of_change is breaking, expected output: 3 (ids:213,678,123)

at least one type_of_change is non-breaking, expected output: 2 (ids:213,345)

I am not sure how I can achieve this, any suggestions or ideas would be greatly appreciated.

答案1

得分: 3

from functools import partial

grp = df.groupby("api_spec_id")

def detect(g, how, change):
    if how == "all":
        return g["type_of_change"].eq(change).all()
    elif how == "any":
        return g["type_of_change"].eq(change).any()

def get_id(df):
    return df["api_spec_id"].unique().tolist()

v1 = grp.filter(partial(detect, how="all", change="Breaking")).pipe(get_id)
v2 = grp.filter(partial(detect, how="all", change="Non-Breaking")).pipe(get_id)
v3 = grp.filter(partial(detect, how="any", change="Breaking")).pipe(get_id)
v4 = grp.filter(partial(detect, how="any", change="Non-Breaking")).pipe(get_id)

print(v1) # [678, 123]
print(v2) # [345]
print(v3) # [213, 678, 123]
print(v4) # [213, 345]

英文:

A possible solution :

from functools import partial

grp = df.groupby(&quot;api_spec_id&quot;)

def detect(g, how, change):
    if how == &quot;all&quot;:
        return g[&quot;type_of_change&quot;].eq(change).all()
    elif how == &quot;any&quot;:
        return g[&quot;type_of_change&quot;].eq(change).any()

def get_id(df):
    return df[&quot;api_spec_id&quot;].unique().tolist()
    
v1 = grp.filter(partial(detect, how=&quot;all&quot;, change=&quot;Breaking&quot;)).pipe(get_id)
v2 = grp.filter(partial(detect, how=&quot;all&quot;, change=&quot;Non-Breaking&quot;)).pipe(get_id)
v3 = grp.filter(partial(detect, how=&quot;any&quot;, change=&quot;Breaking&quot;)).pipe(get_id)
v4 = grp.filter(partial(detect, how=&quot;any&quot;, change=&quot;Non-Breaking&quot;)).pipe(get_id)

Output :

print(v1) # [678, 123]
print(v2) # [345]
print(v3) # [213, 678, 123]
print(v4) # [213, 345]

答案2

得分: 2

我们可以通过按api_spec_id和标签进行分组，然后应用一个lambda函数来跳过每个组的第一个元素，然后创建一个新列，指示label == 'major'和type_of_change == 'Breaking'的位置。

要获取每个api_spec_id的计数，我们可以再次使用groupby，然后可以使用相同的过程迭代唯一的标签值。以下是代码，不要忘记将'Breaking'替换为'Non-Breaking'和其他标签类型。

编辑：根据您的评论，现在我们将按api_spec_id进行分组，然后检查每个组内的type_of_change的唯一值。

import pandas as pd

df = pd.DataFrame({
    "api_spec_id": [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    "type_of_change": ["Breaking", "Breaking", "Non-Breaking", "Non-Breaking", 
                       "Non-Breaking", "Non-Breaking", "Breaking", "Breaking", 
                       "Breaking", "Breaking", "Breaking"],
    "label": [None, "major", "patch", None, "patch", "patch", None, "minor", 
              "major", None, None]
})

unique_changes = df.groupby('api_spec_id')['type_of_change'].unique()

all_breaking = unique_changes.apply(lambda x: len(x)==1 and "Breaking" in x).sum()
all_non_breaking = unique_changes.apply(lambda x: len(x)==1 and "Non-Breaking" in x).sum()
at_least_one_breaking = unique_changes.apply(lambda x: "Breaking" in x).sum()
at_least_one_non_breaking = unique_changes.apply(lambda x: "Non-Breaking" in x).sum()

print('All breaking: ', all_breaking)
print('All non-breaking: ', all_non_breaking)
print('At least one breaking: ', at_least_one_breaking)
print('At least one non-breaking: ', at_least_one_non_breaking)

请注意，这段代码会执行您描述的操作，计算各种条件下的计数并将其打印出来。

英文:

We can do that by grouping by the api_spec_id and label, then applying a lambda function that skips the first element for each group, then we create a new column indicating where label == 'major' and type_of_change == 'Breaking'.

To get the counts of each api_spec_id we can use groupby again, then we can iterate over the unique label values with the same process. Here is the code below, dont forget to replace 'Breaking' with 'Non-Breaking' and other label types.

edit: based on your comments, now we will group by api_spec_id and then check the unique values of type_of_change within each group

import pandas as pd

df = pd.DataFrame({
    &quot;api_spec_id&quot;: [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    &quot;type_of_change&quot;: [&quot;Breaking&quot;, &quot;Breaking&quot;, &quot;Non-Breaking&quot;, &quot;Non-Breaking&quot;, 
                       &quot;Non-Breaking&quot;, &quot;Non-Breaking&quot;, &quot;Breaking&quot;, &quot;Breaking&quot;, 
                       &quot;Breaking&quot;, &quot;Breaking&quot;, &quot;Breaking&quot;],
    &quot;label&quot;: [None, &quot;major&quot;, &quot;patch&quot;, None, &quot;patch&quot;, &quot;patch&quot;, None, &quot;minor&quot;, 
              &quot;major&quot;, None, None]
})

unique_changes = df.groupby(&#39;api_spec_id&#39;)[&#39;type_of_change&#39;].unique()

all_breaking = unique_changes.apply(lambda x: len(x)==1 and &quot;Breaking&quot; in x).sum()
all_non_breaking = unique_changes.apply(lambda x: len(x)==1 and &quot;Non-Breaking&quot; in x).sum()
at_least_one_breaking = unique_changes.apply(lambda x: &quot;Breaking&quot; in x).sum()
at_least_one_non_breaking = unique_changes.apply(lambda x: &quot;Non-Breaking&quot; in x).sum()

print(&#39;All breaking: &#39;, all_breaking)
print(&#39;All non-breaking: &#39;, all_non_breaking)
print(&#39;At least one breaking: &#39;, at_least_one_breaking)
print(&#39;At least one non-breaking: &#39;, at_least_one_non_breaking)

答案3

得分: 0

这是一个选项：

l = ['None', 'All Breaking, Has Breaking', 'All Non Breaking, Has Non Breaking', 'Has Breaking, Has Non Breaking']
d = df.set_index('api_spec_id')['type_of_change'].str.get_dummies().groupby(level=0).sum().gt(0)
d = (d.mul([1,2])
 .sum(axis=1)
 .map(dict(enumerate(l)))
 .str.split(', ')
 .explode()
 .reset_index(name='Type')
 .groupby('Type')['api_spec_id'].agg(list)
 .to_dict())

输出：

{'All Breaking': [123, 678],
 'All Non Breaking': [345],
 'Has Breaking': [123, 213, 678],
 'Has Non Breaking': [213, 345]}

英文:

Here is an option:

l = [&#39;None&#39;,&#39;All Breaking, Has Breaking&#39;,&#39;All Non Breaking, Has Non Breaking&#39;,&#39;Has Breaking, Has Non Breaking&#39;]
d = df.set_index(&#39;api_spec_id&#39;)[&#39;type_of_change&#39;].str.get_dummies().groupby(level=0).sum().gt(0)
d = (d.mul([1,2])
 .sum(axis=1)
 .map(dict(enumerate(l)))
 .str.split(&#39;, &#39;)
 .explode()
 .reset_index(name = &#39;Type&#39;)
 .groupby(&#39;Type&#39;)[&#39;api_spec_id&#39;].agg(list)
 .to_dict())

Output:

{&#39;All Breaking&#39;: [123, 678],
 &#39;All Non Breaking&#39;: [345],
 &#39;Has Breaking&#39;: [123, 213, 678],
 &#39;Has Non Breaking&#39;: [213, 345]}

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas中计算两个条件子句的数量。

问题

答案1

答案2

答案3

Vectorize多个pandas列的加法

这段代码是如何成功地向集合中添加新元素的？

pypy pandas correlation slower than python

Comparing 2 excel files to extract rows based on a reference number in one file and copy them to a new file

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论