英文:
Two conditional clause count in pandas
问题
我想计算唯一的 api_spec_id
数量,条件如下:
-
所有
type_of_change
均为 breaking,预期输出为 2(id: 678, 123) -
所有
type_of_change
均为 non-breaking,预期输出为 1(id: 345) -
至少有一个
type_of_change
是 breaking,预期输出为 3(id: 213, 678, 123) -
至少有一个
type_of_change
是 non-breaking,预期输出为 2(id: 213, 345)
你不确定如何实现吗?以下是我的建议:
# 导入必要的库
import pandas as pd
# 创建DataFrame
data = {
'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
'type_of_change': ['Breaking', 'Breaking', 'Non-Breaking', 'Non-Breaking', 'Non-Breaking', 'Non-Breaking', 'Breaking', 'Breaking', 'Breaking', 'Breaking', 'Breaking'],
'label': [None, 'major', 'patch', None, 'patch', 'patch', None, 'minor', 'major', None, None]
}
df = pd.DataFrame(data)
# 使用groupby和条件筛选来计算结果
result = df.groupby('api_spec_id').filter(lambda x: all(x['type_of_change'] == 'Breaking')).nunique()['api_spec_id']
print("所有 'type_of_change' 均为 breaking 的唯一数量:", result)
result = df.groupby('api_spec_id').filter(lambda x: all(x['type_of_change'] == 'Non-Breaking')).nunique()['api_spec_id']
print("所有 'type_of_change' 均为 non-breaking 的唯一数量:", result)
result = df.groupby('api_spec_id').filter(lambda x: any(x['type_of_change'] == 'Breaking')).nunique()['api_spec_id']
print("至少有一个 'type_of_change' 是 breaking 的唯一数量:", result)
result = df.groupby('api_spec_id').filter(lambda x: any(x['type_of_change'] == 'Non-Breaking')).nunique()['api_spec_id']
print("至少有一个 'type_of_change' 是 non-breaking 的唯一数量:", result)
上述代码可以根据你的条件计算出唯一的 api_spec_id
数量。
英文:
I have a df which looks like this:
api_spec_id type_of_change label
213 Breaking NaN
213 Breaking major
213 Non-Breaking patch
345 Non-Breaking NaN
345 Non-Breaking patch
345 Non-Breaking patch
678 Breaking NaN
678 Breaking minor
678 Breaking major
123 Breaking NaN
123 Breaking NaN
I want to calculate the unique number of api_spec_id
where:
all type_of_change
are breaking, where expected output would be 2 (ids: 678,123)
all type_of_change
are non-breaking, where expected output would be 1 (id:345)
at least one type_of_change
is breaking, expected output: 3 (ids:213,678,123)
at least one type_of_change
is non-breaking, expected output: 2 (ids:213,345)
I am not sure how I can achieve this, any suggestions or ideas would be greatly appreciated.
答案1
得分: 3
from functools import partial
grp = df.groupby("api_spec_id")
def detect(g, how, change):
if how == "all":
return g["type_of_change"].eq(change).all()
elif how == "any":
return g["type_of_change"].eq(change).any()
def get_id(df):
return df["api_spec_id"].unique().tolist()
v1 = grp.filter(partial(detect, how="all", change="Breaking")).pipe(get_id)
v2 = grp.filter(partial(detect, how="all", change="Non-Breaking")).pipe(get_id)
v3 = grp.filter(partial(detect, how="any", change="Breaking")).pipe(get_id)
v4 = grp.filter(partial(detect, how="any", change="Non-Breaking")).pipe(get_id)
print(v1) # [678, 123]
print(v2) # [345]
print(v3) # [213, 678, 123]
print(v4) # [213, 345]
英文:
A possible solution :
from functools import partial
grp = df.groupby("api_spec_id")
def detect(g, how, change):
if how == "all":
return g["type_of_change"].eq(change).all()
elif how == "any":
return g["type_of_change"].eq(change).any()
def get_id(df):
return df["api_spec_id"].unique().tolist()
v1 = grp.filter(partial(detect, how="all", change="Breaking")).pipe(get_id)
v2 = grp.filter(partial(detect, how="all", change="Non-Breaking")).pipe(get_id)
v3 = grp.filter(partial(detect, how="any", change="Breaking")).pipe(get_id)
v4 = grp.filter(partial(detect, how="any", change="Non-Breaking")).pipe(get_id)
Output :
print(v1) # [678, 123]
print(v2) # [345]
print(v3) # [213, 678, 123]
print(v4) # [213, 345]
答案2
得分: 2
我们可以通过按api_spec_id
和标签进行分组,然后应用一个lambda函数来跳过每个组的第一个元素,然后创建一个新列,指示label == 'major'
和type_of_change == 'Breaking'
的位置。
要获取每个api_spec_id
的计数,我们可以再次使用groupby
,然后可以使用相同的过程迭代唯一的标签值。以下是代码,不要忘记将'Breaking'
替换为'Non-Breaking'
和其他标签类型。
编辑:根据您的评论,现在我们将按api_spec_id
进行分组,然后检查每个组内的type_of_change
的唯一值。
import pandas as pd
df = pd.DataFrame({
"api_spec_id": [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
"type_of_change": ["Breaking", "Breaking", "Non-Breaking", "Non-Breaking",
"Non-Breaking", "Non-Breaking", "Breaking", "Breaking",
"Breaking", "Breaking", "Breaking"],
"label": [None, "major", "patch", None, "patch", "patch", None, "minor",
"major", None, None]
})
unique_changes = df.groupby('api_spec_id')['type_of_change'].unique()
all_breaking = unique_changes.apply(lambda x: len(x)==1 and "Breaking" in x).sum()
all_non_breaking = unique_changes.apply(lambda x: len(x)==1 and "Non-Breaking" in x).sum()
at_least_one_breaking = unique_changes.apply(lambda x: "Breaking" in x).sum()
at_least_one_non_breaking = unique_changes.apply(lambda x: "Non-Breaking" in x).sum()
print('All breaking: ', all_breaking)
print('All non-breaking: ', all_non_breaking)
print('At least one breaking: ', at_least_one_breaking)
print('At least one non-breaking: ', at_least_one_non_breaking)
请注意,这段代码会执行您描述的操作,计算各种条件下的计数并将其打印出来。
英文:
We can do that by grouping by the api_spec_id
and label, then applying a lambda function that skips the first element for each group, then we create a new column indicating where label == 'major'
and type_of_change == 'Breaking'
.
To get the counts of each api_spec_id
we can use groupby
again, then we can iterate over the unique label values with the same process. Here is the code below, dont forget to replace 'Breaking'
with 'Non-Breaking'
and other label types.
edit: based on your comments, now we will group by api_spec_id
and then check the unique values of type_of_change
within each group
import pandas as pd
df = pd.DataFrame({
"api_spec_id": [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
"type_of_change": ["Breaking", "Breaking", "Non-Breaking", "Non-Breaking",
"Non-Breaking", "Non-Breaking", "Breaking", "Breaking",
"Breaking", "Breaking", "Breaking"],
"label": [None, "major", "patch", None, "patch", "patch", None, "minor",
"major", None, None]
})
unique_changes = df.groupby('api_spec_id')['type_of_change'].unique()
all_breaking = unique_changes.apply(lambda x: len(x)==1 and "Breaking" in x).sum()
all_non_breaking = unique_changes.apply(lambda x: len(x)==1 and "Non-Breaking" in x).sum()
at_least_one_breaking = unique_changes.apply(lambda x: "Breaking" in x).sum()
at_least_one_non_breaking = unique_changes.apply(lambda x: "Non-Breaking" in x).sum()
print('All breaking: ', all_breaking)
print('All non-breaking: ', all_non_breaking)
print('At least one breaking: ', at_least_one_breaking)
print('At least one non-breaking: ', at_least_one_non_breaking)
答案3
得分: 0
这是一个选项:
l = ['None', 'All Breaking, Has Breaking', 'All Non Breaking, Has Non Breaking', 'Has Breaking, Has Non Breaking']
d = df.set_index('api_spec_id')['type_of_change'].str.get_dummies().groupby(level=0).sum().gt(0)
d = (d.mul([1,2])
.sum(axis=1)
.map(dict(enumerate(l)))
.str.split(', ')
.explode()
.reset_index(name='Type')
.groupby('Type')['api_spec_id'].agg(list)
.to_dict())
输出:
{'All Breaking': [123, 678],
'All Non Breaking': [345],
'Has Breaking': [123, 213, 678],
'Has Non Breaking': [213, 345]}
英文:
Here is an option:
l = ['None','All Breaking, Has Breaking','All Non Breaking, Has Non Breaking','Has Breaking, Has Non Breaking']
d = df.set_index('api_spec_id')['type_of_change'].str.get_dummies().groupby(level=0).sum().gt(0)
d = (d.mul([1,2])
.sum(axis=1)
.map(dict(enumerate(l)))
.str.split(', ')
.explode()
.reset_index(name = 'Type')
.groupby('Type')['api_spec_id'].agg(list)
.to_dict())
Output:
{'All Breaking': [123, 678],
'All Non Breaking': [345],
'Has Breaking': [123, 213, 678],
'Has Non Breaking': [213, 345]}
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论