计算连续行中仅有两列的值为0的数量。

huangapple go评论94阅读模式
英文:

How to count the number of consecutive rows where only 2 columns have 0 as a value

问题

以下是您要翻译的部分:

  1. 我有一个看起来像这样的Dataframe

A B C
0 0 4 1
1 0 0 2
2 0 0 1
3 2 0 3
4 1 1 1

  1. 我需要计算连续的行数,其中AB列都有0作为值。
  2. 如果计数器小于10或大于20,我需要删除它们。
  3. 在上面的示例中,计数器为2,所以我期望这是输出:

A B C
0 0 4 1
3 2 0 3
4 1 1 1

  1. 我尝试过这样做:
  2. ```python
  3. m1 = (df['A'].eq(0) & df['B'].eq(0))
  4. m2 = df.groupby(m1.ne(m1.shift()).cumsum()).transform('size').le(9)
  5. out = df[~(m1&m2)]
  6. return out

但它什么都没做。

  1. <details>
  2. <summary>英文:</summary>
  3. I have a Dataframe that looks like this:

A B C
0 0 4 1
1 0 0 2
2 0 0 1
3 2 0 3
4 1 1 1

  1. I need to count the number of consecutive rows where both A and B columns have 0 as a value.
  2. If the counter is less than 10 or more than 20 I need to delete all of them.
  3. In the example above the counter is 2, so I&#39;m expecting this as an output:

A B C
0 0 4 1
3 2 0 3
4 1 1 1

  1. I tried this:

m1 = (df['A'].eq(0) & df['B'].eq(0))
m2 = df.groupby(m1.ne(m1.shift()).cumsum()).transform('size').le(9)
out = df[~(m1&m2)]
return out

  1. But it does nothing.
  2. </details>
  3. # 答案1
  4. **得分**: 2
  5. 以下是翻译好的部分:
  6. 使用[布尔索引](https://pandas.pydata.org/docs/user_guide/indexing.html#boolean-indexing)结合[`groupby.transform`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html)来设置对连续行的阈值条件:
  7. ```python
  8. # 用于删除行的下限/上限(不包括)
  9. LOW, HIGH = 1, 2
  10. # 对于其中A和B都为0的行
  11. m = df[['A', 'B']].eq(0).all(axis=1)
  12. # 计算连续出现的次数
  13. count = m.groupby((m != m.shift()).cumsum()).transform('size')
  14. # 保留具有非零值或具有大于LOW /小于HIGH的连续零值的行
  15. out = df.loc[(~m | count.between(LOW, HIGH, inclusive='neither'))]

输出:

  1. A B C
  2. 0 0 4 1
  3. 3 2 0 3
  4. 4 1 1 1

注意:输出部分保持不变。

英文:

Use boolean indexing with groupby.transform to set up the threshold condition on consecutive rows:

  1. # boundaries below/above which
  2. # to drop the rows (exclusive)
  3. LOW, HIGH = 1, 2
  4. # rows for which both A and B are 0
  5. m = df[[&#39;A&#39;, &#39;B&#39;]].eq(0).all(axis=1)
  6. # count the consecutive
  7. count = m.groupby((m != m.shift()).cumsum()).transform(&#39;size&#39;)
  8. # keep only the values with non zero
  9. # or with &gt; LOW / &lt; HIGH consecutive zeros
  10. out = df.loc[(~m|count.between(LOW, HIGH, inclusive=&#39;neither&#39;))]

Output:

  1. A B C
  2. 0 0 4 1
  3. 3 2 0 3
  4. 4 1 1 1

huangapple
  • 本文由 发表于 2023年5月21日 18:27:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76299420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定