id变更计数生成了错误的值

huangapple go评论63阅读模式
英文:

Count of id with change generates wrong values

问题

这是您提供的代码和描述的翻译结果:

我有一个看起来像这样的数据框

```python
import pandas as pd

data = {
    'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    'type': ['BR', 'BR', 'NBR', 'NBR', 'NBR', 'NBR', 'BR', 'BR', 'BR', 'BR', 'BR']
}

df = pd.DataFrame(data)

我试图计算4种情况,一种情况是在所有api_spec_id的行中,type=BR,另一种情况是在至少有一行的api_spec_id中,typeBR

这是我正在使用的代码,但似乎是错误的,因为它为最后两种情况生成相同的输出:

import pandas as pd

至少有一个破坏性变更 = df[df['type'] == 'BR']['api_spec_id'].nunique()

包括所有提交的破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'NBR' not in x['type'].unique()) \
                                .sum()

至少有一个非破坏性变更 = df[df['type'] == 'NBR']['api_spec_id'].nunique()

包括所有提交的非破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'BR' not in x['type'].unique()) \
                                    .sum()

对于我发送的示例数据框,预期输出将是:

至少有一个破坏性变更 = 3
包括所有提交的破坏性变更 = 3
至少有一个非破坏性变更 = 2
包括所有提交的非破坏性变更 = 1

我有点困在这个问题上,任何建议或想法将不胜感激。


希望这能帮助您理解代码并获得所需的结果。如果您有任何进一步的问题或需要进一步的协助,请随时提问。

<details>
<summary>英文:</summary>

I have a df which looks like this:

```python
import pandas as pd

data = {
    &#39;api_spec_id&#39;: [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
    &#39;type&#39;: [&#39;BR&#39;, &#39;BR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;]
}

df = pd.DataFrame(data)

I am trying to count 4 cases, one where every for all rows in api_spec_id,the type= BR and second where for atleast one row in api_spec_id, the type is BR.

This is the code I am working with but it seems wrong as it is generating the same output for the last two:

import pandas as pd

at_least_one_breaking_change = df[df[&#39;type&#39;] == &#39;BR&#39;][&#39;api_spec_id&#39;].nunique()

all_commits_including_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;NBR&#39; not in x[&#39;type&#39;].unique()) \
                                .sum()

at_least_one_non_breaking_change = df[df[&#39;type&#39;] == &#39;NBR&#39;][&#39;api_spec_id&#39;].nunique()

all_commits_including_non_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;BR&#39; not in x[&#39;type&#39;].unique()) \
                                    .sum()

The expected output for the sample df I sent will be:

at_least_one_breaking_change = 3
all_commits_including_breaking = 3
at_least_one_non_breaking_change = 2
all_commits_including_non_breaking = 1

I am a bit stuck on this and any suggestions or ideas will be greatly appreciated.

答案1

得分: 1

我认为你可以使用 `pd.crosstab`
```python
m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)

至少有一个破坏性变化 = sum(m['BR'])
包括破坏性变化的所有提交 = sum(m['BR'] & ~m['NBR'])

至少有一个非破坏性变化 = sum(m['NBR'])
包括非破坏性变化的所有提交 = sum(m['NBR'] & ~m['BR'])

输出:

>>> 至少有一个破坏性变化
3

>>> 包括破坏性变化的所有提交
2

>>> 至少有一个非破坏性变化
2

>>> 包括非破坏性变化的所有提交
1

>>> m
type            BR    NBR
api_spec_id              
123           True  False
213           True   True
345          False   True
678           True  False

<details>
<summary>英文:</summary>

I think you can use `pd.crosstab`:

m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)

at_least_one_breaking_change = sum(m['BR'])
all_commits_including_breaking = sum(m['BR'] & ~m['NBR'])

at_least_one_non_breaking_change = sum(m['NBR'])
all_commits_including_non_breaking = sum(m['NBR'] & ~m['BR'])


Output:

>>> at_least_one_breaking_change
3

>>> all_commits_including_breaking
2

>>> at_least_one_non_breaking_change
2

>>> all_commits_including_non_breaking
1

>>> m
type BR NBR
api_spec_id
123 True False
213 True True
345 False True
678 True False


</details>



# 答案2
**得分**: 0

我已经看过并运行了你的代码,它的输出是:

[![你的代码的输出][1]][1]

这个代码中的条件有点错误。

看一下更新,

```python
import pandas as pd

at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()

all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' in x['type'].unique()) \
                                .sum()

at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()

all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'Breaking' in x['type'].unique()) \
                                    .sum()

此外,没有"Breaking"类型。

英文:

I have seen and run your code it's output is :

id变更计数生成了错误的值

Conditions in this code are bit wrong.

Look at the updates,

import pandas as pd


at_least_one_breaking_change = df[df[&#39;type&#39;] == &#39;BR&#39;][&#39;api_spec_id&#39;].nunique()

all_commits_including_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;NBR&#39; in x[&#39;type&#39;].unique()) \
                                .sum()

at_least_one_non_breaking_change = df[df[&#39;type&#39;] == &#39;NBR&#39;][&#39;api_spec_id&#39;].nunique()

all_commits_including_non_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;Breaking&#39; in x[&#39;type&#39;].unique()) \
                                    .sum()

Also, there is no type of &quot;Breaking&quot;.

huangapple
  • 本文由 发表于 2023年6月12日 01:16:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76451633.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定