id变更计数生成了错误的值

huangapple go评论86阅读模式
英文:

Count of id with change generates wrong values

问题

这是您提供的代码和描述的翻译结果:

  1. 我有一个看起来像这样的数据框
  2. ```python
  3. import pandas as pd
  4. data = {
  5. 'api_spec_id': [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
  6. 'type': ['BR', 'BR', 'NBR', 'NBR', 'NBR', 'NBR', 'BR', 'BR', 'BR', 'BR', 'BR']
  7. }
  8. df = pd.DataFrame(data)

我试图计算4种情况,一种情况是在所有api_spec_id的行中,type=BR,另一种情况是在至少有一行的api_spec_id中,typeBR

这是我正在使用的代码,但似乎是错误的,因为它为最后两种情况生成相同的输出:

  1. import pandas as pd
  2. 至少有一个破坏性变更 = df[df['type'] == 'BR']['api_spec_id'].nunique()
  3. 包括所有提交的破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'NBR' not in x['type'].unique()) \
  4. .sum()
  5. 至少有一个非破坏性变更 = df[df['type'] == 'NBR']['api_spec_id'].nunique()
  6. 包括所有提交的非破坏性变更 = df.groupby('api_spec_id').apply(lambda x: 'BR' not in x['type'].unique()) \
  7. .sum()

对于我发送的示例数据框,预期输出将是:

  1. 至少有一个破坏性变更 = 3
  2. 包括所有提交的破坏性变更 = 3
  3. 至少有一个非破坏性变更 = 2
  4. 包括所有提交的非破坏性变更 = 1

我有点困在这个问题上,任何建议或想法将不胜感激。

  1. 希望这能帮助您理解代码并获得所需的结果。如果您有任何进一步的问题或需要进一步的协助,请随时提问。
  2. <details>
  3. <summary>英文:</summary>
  4. I have a df which looks like this:
  5. ```python
  6. import pandas as pd
  7. data = {
  8. &#39;api_spec_id&#39;: [213, 213, 213, 345, 345, 345, 678, 678, 678, 123, 123],
  9. &#39;type&#39;: [&#39;BR&#39;, &#39;BR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;NBR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;, &#39;BR&#39;]
  10. }
  11. df = pd.DataFrame(data)

I am trying to count 4 cases, one where every for all rows in api_spec_id,the type= BR and second where for atleast one row in api_spec_id, the type is BR.

This is the code I am working with but it seems wrong as it is generating the same output for the last two:

  1. import pandas as pd
  2. at_least_one_breaking_change = df[df[&#39;type&#39;] == &#39;BR&#39;][&#39;api_spec_id&#39;].nunique()
  3. all_commits_including_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;NBR&#39; not in x[&#39;type&#39;].unique()) \
  4. .sum()
  5. at_least_one_non_breaking_change = df[df[&#39;type&#39;] == &#39;NBR&#39;][&#39;api_spec_id&#39;].nunique()
  6. all_commits_including_non_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;BR&#39; not in x[&#39;type&#39;].unique()) \
  7. .sum()

The expected output for the sample df I sent will be:

  1. at_least_one_breaking_change = 3
  2. all_commits_including_breaking = 3
  3. at_least_one_non_breaking_change = 2
  4. all_commits_including_non_breaking = 1

I am a bit stuck on this and any suggestions or ideas will be greatly appreciated.

答案1

得分: 1

  1. 我认为你可以使用 `pd.crosstab`
  2. ```python
  3. m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)
  4. 至少有一个破坏性变化 = sum(m['BR'])
  5. 包括破坏性变化的所有提交 = sum(m['BR'] & ~m['NBR'])
  6. 至少有一个非破坏性变化 = sum(m['NBR'])
  7. 包括非破坏性变化的所有提交 = sum(m['NBR'] & ~m['BR'])

输出:

  1. >>> 至少有一个破坏性变化
  2. 3
  3. >>> 包括破坏性变化的所有提交
  4. 2
  5. >>> 至少有一个非破坏性变化
  6. 2
  7. >>> 包括非破坏性变化的所有提交
  8. 1
  9. >>> m
  10. type BR NBR
  11. api_spec_id
  12. 123 True False
  13. 213 True True
  14. 345 False True
  15. 678 True False
  1. <details>
  2. <summary>英文:</summary>
  3. I think you can use `pd.crosstab`:

m = pd.crosstab(df['api_spec_id'], df['type']).astype(bool)

at_least_one_breaking_change = sum(m['BR'])
all_commits_including_breaking = sum(m['BR'] & ~m['NBR'])

at_least_one_non_breaking_change = sum(m['NBR'])
all_commits_including_non_breaking = sum(m['NBR'] & ~m['BR'])

  1. Output:

>>> at_least_one_breaking_change
3

>>> all_commits_including_breaking
2

>>> at_least_one_non_breaking_change
2

>>> all_commits_including_non_breaking
1

>>> m
type BR NBR
api_spec_id
123 True False
213 True True
345 False True
678 True False

  1. </details>
  2. # 答案2
  3. **得分**: 0
  4. 我已经看过并运行了你的代码,它的输出是:
  5. [![你的代码的输出][1]][1]
  6. 这个代码中的条件有点错误。
  7. 看一下更新,
  8. ```python
  9. import pandas as pd
  10. at_least_one_breaking_change = df[df['type'] == 'BR']['api_spec_id'].nunique()
  11. all_commits_including_breaking = df.groupby('api_spec_id').apply(lambda x: 'NBR' in x['type'].unique()) \
  12. .sum()
  13. at_least_one_non_breaking_change = df[df['type'] == 'NBR']['api_spec_id'].nunique()
  14. all_commits_including_non_breaking = df.groupby('api_spec_id').apply(lambda x: 'Breaking' in x['type'].unique()) \
  15. .sum()

此外,没有"Breaking"类型。

英文:

I have seen and run your code it's output is :

id变更计数生成了错误的值

Conditions in this code are bit wrong.

Look at the updates,

  1. import pandas as pd
  2. at_least_one_breaking_change = df[df[&#39;type&#39;] == &#39;BR&#39;][&#39;api_spec_id&#39;].nunique()
  3. all_commits_including_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;NBR&#39; in x[&#39;type&#39;].unique()) \
  4. .sum()
  5. at_least_one_non_breaking_change = df[df[&#39;type&#39;] == &#39;NBR&#39;][&#39;api_spec_id&#39;].nunique()
  6. all_commits_including_non_breaking = df.groupby(&#39;api_spec_id&#39;).apply(lambda x: &#39;Breaking&#39; in x[&#39;type&#39;].unique()) \
  7. .sum()

Also, there is no type of &quot;Breaking&quot;.

huangapple
  • 本文由 发表于 2023年6月12日 01:16:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/76451633.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定