计算连续行中仅有两列的值为0的数量。

huangapple go评论53阅读模式
英文:

How to count the number of consecutive rows where only 2 columns have 0 as a value

问题

以下是您要翻译的部分:

我有一个看起来像这样的Dataframe

A B C
0 0 4 1
1 0 0 2
2 0 0 1
3 2 0 3
4 1 1 1

我需要计算连续的行数,其中A和B列都有0作为值。
如果计数器小于10或大于20,我需要删除它们。

在上面的示例中,计数器为2,所以我期望这是输出:

A B C
0 0 4 1
3 2 0 3
4 1 1 1


我尝试过这样做:
```python
m1 = (df['A'].eq(0) & df['B'].eq(0)) 
m2 = df.groupby(m1.ne(m1.shift()).cumsum()).transform('size').le(9) 
out = df[~(m1&m2)] 
return out

但它什么都没做。


<details>
<summary>英文:</summary>

I have a Dataframe that looks like this:

A B C
0 0 4 1
1 0 0 2
2 0 0 1
3 2 0 3
4 1 1 1

I need to count the number of consecutive rows where both A and B columns have 0 as a value.
If the counter is less than 10 or more than 20 I need to delete all of them.

In the example above the counter is 2, so I&#39;m expecting this as an output:

A B C
0 0 4 1
3 2 0 3
4 1 1 1


I tried this:

m1 = (df['A'].eq(0) & df['B'].eq(0))
m2 = df.groupby(m1.ne(m1.shift()).cumsum()).transform('size').le(9)
out = df[~(m1&m2)]
return out

But it does nothing.

</details>


# 答案1
**得分**: 2

以下是翻译好的部分:

使用[布尔索引](https://pandas.pydata.org/docs/user_guide/indexing.html#boolean-indexing)结合[`groupby.transform`](https://pandas.pydata.org/docs/reference/api/pandas.core.groupby.DataFrameGroupBy.transform.html)来设置对连续行的阈值条件:

```python
# 用于删除行的下限/上限(不包括)
LOW, HIGH = 1, 2

# 对于其中A和B都为0的行
m = df[['A', 'B']].eq(0).all(axis=1)

# 计算连续出现的次数
count = m.groupby((m != m.shift()).cumsum()).transform('size')

# 保留具有非零值或具有大于LOW /小于HIGH的连续零值的行
out = df.loc[(~m | count.between(LOW, HIGH, inclusive='neither'))]

输出:

  A B C
0 0 4 1
3 2 0 3
4 1 1 1

注意:输出部分保持不变。

英文:

Use boolean indexing with groupby.transform to set up the threshold condition on consecutive rows:

# boundaries below/above which
# to drop the rows (exclusive)
LOW, HIGH = 1, 2

# rows for which both A and B are 0
m = df[[&#39;A&#39;, &#39;B&#39;]].eq(0).all(axis=1)

# count the consecutive
count = m.groupby((m != m.shift()).cumsum()).transform(&#39;size&#39;)

# keep only the values with non zero
# or with &gt; LOW / &lt; HIGH consecutive zeros
out = df.loc[(~m|count.between(LOW, HIGH, inclusive=&#39;neither&#39;))]

Output:

  A B C
0 0 4 1
3 2 0 3
4 1 1 1

huangapple
  • 本文由 发表于 2023年5月21日 18:27:29
  • 转载请务必保留本文链接:https://go.coder-hub.com/76299420.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定