英文:
Count of id with change generates wrong values
问题
这是您提供的代码和描述的翻译结果:
我有一个看起来像这样的数据框:
```python
import pandas as pd
data = {
'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
'type': ['BR', 'BR', 'NBR', 'NBR', 'NBR', 'NBR', 'BR', 'BR', 'BR', 'BR', 'BR']
}
df = pd.DataFrame(data)
我试图计算4种情况,一种情况是在所有api_spec_id
的行中,type=BR
,另一种情况是在至少有一行的api_spec_id
中,type
是BR
。
这是我正在使用的代码,但似乎是错误的,因为它为最后两种情况生成相同的输出:
import pandas as pd
至少有一个破坏性变更 = df[df['type'] == 'BR']['api_spec_id'].nunique()
包括所有提交的破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'NBR' not in x['type'].unique()) \
.sum()
至少有一个非破坏性变更 = df[df['type'] == 'NBR']['api_spec_id'].nunique()
包括所有提交的非破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'BR' not in x['type'].unique()) \
.sum()
对于我发送的示例数据框,预期输出将是:
至少有一个破坏性变更 = 3
包括所有提交的破坏性变更 = 3
至少有一个非破坏性变更 = 2
包括所有提交的非破坏性变更 = 1
我有点困在这个问题上,任何建议或想法将不胜感激。
希望这能帮助您理解代码并获得所需的结果。如果您有任何进一步的问题或需要进一步的协助,请随时提问。
<details>
<summary>英文:</summary>
I have a df which looks like this:
```python
import pandas as pd
data = {
'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
'type': ['BR', 'BR', 'NBR', 'NBR', 'NBR', 'NBR', 'BR', 'BR', 'BR', 'BR', 'BR']
}
df = pd.DataFrame(data)
I am trying to count 4 cases, one where every for all rows in api_spec_id
,the type= BR
and second where for atleast one row in api_spec_id
, the type is BR
.
This is the code I am working with but it seems wrong as it is generating the same output for the last two:
import pandas as pd
at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()
all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' not in x['type'].unique()) \
.sum()
at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()
all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'BR' not in x['type'].unique()) \
.sum()
The expected output for the sample df I sent will be:
at_least_one_breaking_change = 3
all_commits_including_breaking = 3
at_least_one_non_breaking_change = 2
all_commits_including_non_breaking = 1
I am a bit stuck on this and any suggestions or ideas will be greatly appreciated.
答案1
得分: 1
我认为你可以使用 `pd.crosstab`:
```python
m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)
至少有一个破坏性变化 = sum(m['BR'])
包括破坏性变化的所有提交 = sum(m['BR'] & ~m['NBR'])
至少有一个非破坏性变化 = sum(m['NBR'])
包括非破坏性变化的所有提交 = sum(m['NBR'] & ~m['BR'])
输出:
>>> 至少有一个破坏性变化
3
>>> 包括破坏性变化的所有提交
2
>>> 至少有一个非破坏性变化
2
>>> 包括非破坏性变化的所有提交
1
>>> m
type BR NBR
api_spec_id
123 True False
213 True True
345 False True
678 True False
<details>
<summary>英文:</summary>
I think you can use `pd.crosstab`:
m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)
at_least_one_breaking_change = sum(m['BR'])
all_commits_including_breaking = sum(m['BR'] & ~m['NBR'])
at_least_one_non_breaking_change = sum(m['NBR'])
all_commits_including_non_breaking = sum(m['NBR'] & ~m['BR'])
Output:
>>> at_least_one_breaking_change
3
>>> all_commits_including_breaking
2
>>> at_least_one_non_breaking_change
2
>>> all_commits_including_non_breaking
1
>>> m
type BR NBR
api_spec_id
123 True False
213 True True
345 False True
678 True False
</details>
# 答案2
**得分**: 0
我已经看过并运行了你的代码,它的输出是:
[![你的代码的输出][1]][1]
这个代码中的条件有点错误。
看一下更新,
```python
import pandas as pd
at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()
all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' in x['type'].unique()) \
.sum()
at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()
all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'Breaking' in x['type'].unique()) \
.sum()
此外,没有"Breaking"
类型。
英文:
I have seen and run your code it's output is :
Conditions in this code are bit wrong.
Look at the updates,
import pandas as pd
at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()
all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' in x['type'].unique()) \
.sum()
at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()
all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'Breaking' in x['type'].unique()) \
.sum()
Also, there is no type of "Breaking"
.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论