在pandas中计算两个条件子句的数量。

huangapple go评论57阅读模式
英文:

Two conditional clause count in pandas

问题

我想计算唯一的 api_spec_id 数量,条件如下:

  1. 所有 type_of_change 均为 breaking,预期输出为 2(id: 678, 123)

  2. 所有 type_of_change 均为 non-breaking,预期输出为 1(id: 345)

  3. 至少有一个 type_of_change 是 breaking,预期输出为 3(id: 213, 678, 123)

  4. 至少有一个 type_of_change 是 non-breaking,预期输出为 2(id: 213, 345)

你不确定如何实现吗?以下是我的建议:

# 导入必要的库
import pandas as pd

# 创建DataFrame
data = {
    'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    'type_of_change': ['Breaking', 'Breaking', 'Non-Breaking', 'Non-Breaking', 'Non-Breaking', 'Non-Breaking', 'Breaking', 'Breaking', 'Breaking', 'Breaking', 'Breaking'],
    'label': [None, 'major', 'patch', None, 'patch', 'patch', None, 'minor', 'major', None, None]
}

df = pd.DataFrame(data)

# 使用groupby和条件筛选来计算结果
result = df.groupby('api_spec_id').filter(lambda x: all(x['type_of_change'] == 'Breaking')).nunique()['api_spec_id']
print("所有 'type_of_change' 均为 breaking 的唯一数量:", result)

result = df.groupby('api_spec_id').filter(lambda x: all(x['type_of_change'] == 'Non-Breaking')).nunique()['api_spec_id']
print("所有 'type_of_change' 均为 non-breaking 的唯一数量:", result)

result = df.groupby('api_spec_id').filter(lambda x: any(x['type_of_change'] == 'Breaking')).nunique()['api_spec_id']
print("至少有一个 'type_of_change' 是 breaking 的唯一数量:", result)

result = df.groupby('api_spec_id').filter(lambda x: any(x['type_of_change'] == 'Non-Breaking')).nunique()['api_spec_id']
print("至少有一个 'type_of_change' 是 non-breaking 的唯一数量:", result)

上述代码可以根据你的条件计算出唯一的 api_spec_id 数量。

英文:

I have a df which looks like this:

api_spec_id  type_of_change   label
213              Breaking      NaN
213              Breaking      major
213              Non-Breaking  patch
345              Non-Breaking  NaN
345              Non-Breaking  patch
345              Non-Breaking  patch
678              Breaking      NaN
678              Breaking      minor
678              Breaking      major
123              Breaking      NaN
123              Breaking      NaN

I want to calculate the unique number of api_spec_id where:

all type_of_change are breaking, where expected output would be 2 (ids: 678,123)

all type_of_change are non-breaking, where expected output would be 1 (id:345)

at least one type_of_change is breaking, expected output: 3 (ids:213,678,123)

at least one type_of_change is non-breaking, expected output: 2 (ids:213,345)

I am not sure how I can achieve this, any suggestions or ideas would be greatly appreciated.

答案1

得分: 3

from functools import partial

grp = df.groupby("api_spec_id")

def detect(g, how, change):
    if how == "all":
        return g["type_of_change"].eq(change).all()
    elif how == "any":
        return g["type_of_change"].eq(change).any()

def get_id(df):
    return df["api_spec_id"].unique().tolist()

v1 = grp.filter(partial(detect, how="all", change="Breaking")).pipe(get_id)
v2 = grp.filter(partial(detect, how="all", change="Non-Breaking")).pipe(get_id)
v3 = grp.filter(partial(detect, how="any", change="Breaking")).pipe(get_id)
v4 = grp.filter(partial(detect, how="any", change="Non-Breaking")).pipe(get_id)

print(v1) # [678, 123]
print(v2) # [345]
print(v3) # [213, 678, 123]
print(v4) # [213, 345]
英文:

A possible solution :

from functools import partial

grp = df.groupby("api_spec_id")

def detect(g, how, change):
    if how == "all":
        return g["type_of_change"].eq(change).all()
    elif how == "any":
        return g["type_of_change"].eq(change).any()

def get_id(df):
    return df["api_spec_id"].unique().tolist()
    
v1 = grp.filter(partial(detect, how="all", change="Breaking")).pipe(get_id)
v2 = grp.filter(partial(detect, how="all", change="Non-Breaking")).pipe(get_id)
v3 = grp.filter(partial(detect, how="any", change="Breaking")).pipe(get_id)
v4 = grp.filter(partial(detect, how="any", change="Non-Breaking")).pipe(get_id)

Output :

print(v1) # [678, 123]
print(v2) # [345]
print(v3) # [213, 678, 123]
print(v4) # [213, 345]

答案2

得分: 2

我们可以通过按api_spec_id和标签进行分组,然后应用一个lambda函数来跳过每个组的第一个元素,然后创建一个新列,指示label == 'major'type_of_change == 'Breaking'的位置。

要获取每个api_spec_id的计数,我们可以再次使用groupby,然后可以使用相同的过程迭代唯一的标签值。以下是代码,不要忘记将'Breaking'替换为'Non-Breaking'和其他标签类型。

编辑:根据您的评论,现在我们将按api_spec_id进行分组,然后检查每个组内的type_of_change的唯一值。

import pandas as pd

df = pd.DataFrame({
    "api_spec_id": [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    "type_of_change": ["Breaking", "Breaking", "Non-Breaking", "Non-Breaking", 
                       "Non-Breaking", "Non-Breaking", "Breaking", "Breaking", 
                       "Breaking", "Breaking", "Breaking"],
    "label": [None, "major", "patch", None, "patch", "patch", None, "minor", 
              "major", None, None]
})

unique_changes = df.groupby('api_spec_id')['type_of_change'].unique()

all_breaking = unique_changes.apply(lambda x: len(x)==1 and "Breaking" in x).sum()
all_non_breaking = unique_changes.apply(lambda x: len(x)==1 and "Non-Breaking" in x).sum()
at_least_one_breaking = unique_changes.apply(lambda x: "Breaking" in x).sum()
at_least_one_non_breaking = unique_changes.apply(lambda x: "Non-Breaking" in x).sum()

print('All breaking: ', all_breaking)
print('All non-breaking: ', all_non_breaking)
print('At least one breaking: ', at_least_one_breaking)
print('At least one non-breaking: ', at_least_one_non_breaking)

请注意,这段代码会执行您描述的操作,计算各种条件下的计数并将其打印出来。

英文:

We can do that by grouping by the api_spec_id and label, then applying a lambda function that skips the first element for each group, then we create a new column indicating where label == 'major' and type_of_change == 'Breaking'.

To get the counts of each api_spec_id we can use groupby again, then we can iterate over the unique label values with the same process. Here is the code below, dont forget to replace 'Breaking' with 'Non-Breaking' and other label types.

edit: based on your comments, now we will group by api_spec_id and then check the unique values of type_of_change within each group

import pandas as pd

df = pd.DataFrame({
    "api_spec_id": [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    "type_of_change": ["Breaking", "Breaking", "Non-Breaking", "Non-Breaking", 
                       "Non-Breaking", "Non-Breaking", "Breaking", "Breaking", 
                       "Breaking", "Breaking", "Breaking"],
    "label": [None, "major", "patch", None, "patch", "patch", None, "minor", 
              "major", None, None]
})

unique_changes = df.groupby('api_spec_id')['type_of_change'].unique()

all_breaking = unique_changes.apply(lambda x: len(x)==1 and "Breaking" in x).sum()
all_non_breaking = unique_changes.apply(lambda x: len(x)==1 and "Non-Breaking" in x).sum()
at_least_one_breaking = unique_changes.apply(lambda x: "Breaking" in x).sum()
at_least_one_non_breaking = unique_changes.apply(lambda x: "Non-Breaking" in x).sum()

print('All breaking: ', all_breaking)
print('All non-breaking: ', all_non_breaking)
print('At least one breaking: ', at_least_one_breaking)
print('At least one non-breaking: ', at_least_one_non_breaking)

答案3

得分: 0

这是一个选项:

l = ['None', 'All Breaking, Has Breaking', 'All Non Breaking, Has Non Breaking', 'Has Breaking, Has Non Breaking']
d = df.set_index('api_spec_id')['type_of_change'].str.get_dummies().groupby(level=0).sum().gt(0)
d = (d.mul([1,2])
 .sum(axis=1)
 .map(dict(enumerate(l)))
 .str.split(', ')
 .explode()
 .reset_index(name='Type')
 .groupby('Type')['api_spec_id'].agg(list)
 .to_dict())

输出:

{'All Breaking': [123, 678],
 'All Non Breaking': [345],
 'Has Breaking': [123, 213, 678],
 'Has Non Breaking': [213, 345]}
英文:

Here is an option:

l = ['None','All Breaking, Has Breaking','All Non Breaking, Has Non Breaking','Has Breaking, Has Non Breaking']
d = df.set_index('api_spec_id')['type_of_change'].str.get_dummies().groupby(level=0).sum().gt(0)
d = (d.mul([1,2])
 .sum(axis=1)
 .map(dict(enumerate(l)))
 .str.split(', ')
 .explode()
 .reset_index(name = 'Type')
 .groupby('Type')['api_spec_id'].agg(list)
 .to_dict())

Output:

{'All Breaking': [123, 678],
 'All Non Breaking': [345],
 'Has Breaking': [123, 213, 678],
 'Has Non Breaking': [213, 345]}

huangapple
  • 本文由 发表于 2023年6月11日 22:28:01
  • 转载请务必保留本文链接:https://go.coder-hub.com/76450952.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定