2023年6月12日 01:16:03go评论97阅读模式

英文:

Count of id with change generates wrong values

问题

这是您提供的代码和描述的翻译结果：

我有一个看起来像这样的数据框：
```python
import pandas as pd
data = {
    'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    'type': ['BR', 'BR', 'NBR', 'NBR', 'NBR', 'NBR', 'BR', 'BR', 'BR', 'BR', 'BR']
}
df = pd.DataFrame(data)

我试图计算4种情况，一种情况是在所有api_spec_id的行中，type=BR，另一种情况是在至少有一行的api_spec_id中，type是BR。

这是我正在使用的代码，但似乎是错误的，因为它为最后两种情况生成相同的输出：

import pandas as pd
至少有一个破坏性变更 = df[df['type'] == 'BR']['api_spec_id'].nunique()
包括所有提交的破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'NBR' not in x['type'].unique()) \
                                .sum()
至少有一个非破坏性变更 = df[df['type'] == 'NBR']['api_spec_id'].nunique()
包括所有提交的非破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'BR' not in x['type'].unique()) \
                                    .sum()

对于我发送的示例数据框，预期输出将是：

至少有一个破坏性变更 = 3
包括所有提交的破坏性变更 = 3
至少有一个非破坏性变更 = 2
包括所有提交的非破坏性变更 = 1

我有点困在这个问题上，任何建议或想法将不胜感激。


希望这能帮助您理解代码并获得所需的结果。如果您有任何进一步的问题或需要进一步的协助，请随时提问。
<details>
<summary>英文:</summary>
I have a df which looks like this:
```python
import pandas as pd
data = {
    &#39;api_spec_id&#39;: [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    &#39;type&#39;: [&#39;BR&#39;, &#39;BR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;]
}
df = pd.DataFrame(data)

I am trying to count 4 cases, one where every for all rows in api_spec_id,the type= BR and second where for atleast one row in api_spec_id, the type is BR.

This is the code I am working with but it seems wrong as it is generating the same output for the last two:

import pandas as pd
at_least_one_breaking_change = df[df[&#39;type&#39;] == &#39;BR&#39;][&#39;api_spec_id&#39;].nunique()
all_commits_including_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;NBR&#39; not in x[&#39;type&#39;].unique()) \
                                .sum()
at_least_one_non_breaking_change = df[df[&#39;type&#39;] == &#39;NBR&#39;][&#39;api_spec_id&#39;].nunique()
all_commits_including_non_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;BR&#39; not in x[&#39;type&#39;].unique()) \
                                    .sum()

The expected output for the sample df I sent will be:

at_least_one_breaking_change = 3
all_commits_including_breaking = 3
at_least_one_non_breaking_change = 2
all_commits_including_non_breaking = 1

I am a bit stuck on this and any suggestions or ideas will be greatly appreciated.

答案1

得分: 1

我认为你可以使用 `pd.crosstab`：
```python
m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)
至少有一个破坏性变化 = sum(m['BR'])
包括破坏性变化的所有提交 = sum(m['BR'] & ~m['NBR'])
至少有一个非破坏性变化 = sum(m['NBR'])
包括非破坏性变化的所有提交 = sum(m['NBR'] & ~m['BR'])

输出：

>>> 至少有一个破坏性变化
3
>>> 包括破坏性变化的所有提交
2
>>> 至少有一个非破坏性变化
2
>>> 包括非破坏性变化的所有提交
1
>>> m
type            BR    NBR
api_spec_id              
123           True  False
213           True   True
345          False   True
678           True  False


<details>
<summary>英文:</summary>
I think you can use `pd.crosstab`:

m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)

at_least_one_breaking_change = sum(m['BR'])
all_commits_including_breaking = sum(m['BR'] & ~m['NBR'])

at_least_one_non_breaking_change = sum(m['NBR'])
all_commits_including_non_breaking = sum(m['NBR'] & ~m['BR'])


Output:

>>> at_least_one_breaking_change
3

>>> all_commits_including_breaking
2

>>> at_least_one_non_breaking_change
2

>>> all_commits_including_non_breaking
1

>>> m
type BR NBR
api_spec_id
123 True False
213 True True
345 False True
678 True False


</details>
# 答案2
**得分**: 0
我已经看过并运行了你的代码，它的输出是：
[![你的代码的输出][1]][1]
这个代码中的条件有点错误。
看一下更新，
```python
import pandas as pd
at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()
all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' in x['type'].unique()) \
                                .sum()
at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()
all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'Breaking' in x['type'].unique()) \
                                    .sum()

此外，没有"Breaking"类型。

英文:

I have seen and run your code it's output is :

Conditions in this code are bit wrong.

Look at the updates,

import pandas as pd
at_least_one_breaking_change = df[df[&#39;type&#39;] == &#39;BR&#39;][&#39;api_spec_id&#39;].nunique()
all_commits_including_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;NBR&#39; in x[&#39;type&#39;].unique()) \
                                .sum()
at_least_one_non_breaking_change = df[df[&#39;type&#39;] == &#39;NBR&#39;][&#39;api_spec_id&#39;].nunique()
all_commits_including_non_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;Breaking&#39; in x[&#39;type&#39;].unique()) \
                                    .sum()

Also, there is no type of "Breaking".

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

id变更计数生成了错误的值

问题

答案1

Python: For a 2D array, sum the 2nd col of the non-unique elements in first col?

Running Linux Commands in Jupyter Notebook

从Firestore集合中获取文档ID – Python

SSH是否会将代码保存在远程计算机中？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。