Delete a row or consecutive rows (from 1 to 5) in which there is 0 as a value of one of the two columns (or both)

huangapple go评论70阅读模式
英文:

Delete a row or consecutive rows (from 1 to 5) in which there is 0 as a value of one of the two columns (or both)

问题

我必须根据以下条件对其进行更改:

  • 删除包含A列或B列(或两者)中的0的行或连续行(从1到5)

  • 删除包含A列或B列中的0的连续行(超过20个)

我期望的输出如下:

A    B
1    2
3    1
英文:

I have a Dataframe that looks like this:

A    B
1    2
0    0
0    0
0    1
2    0
3    1

I have to change it based on the following conditions:

-Delete a row or consecutive rows (from 1 to 5) in which there is a 0 on either the A or the B (or both) columns

-Delete consecutive rows (more than 20) in which there is a 0 on either the A or the B columns

I'm expecting this as an output:

A    B
1    2
3    1

答案1

得分: 0

你可以使用布尔索引groupby.transform来计算连续的零值:

# 是否有任何值为0?
m = df.eq(0).any(axis=1)

# 行是否是连续的最多N个0的一部分?
m2 = df.groupby(m.ne(m.shift()).cumsum()).transform('size').le(5)

# 保留不符合两个条件的行
out = df[~(m & m2)]

输出:

   A  B
0  1  2
5  3  1

可复现的输入:

df = pd.DataFrame({'A': [1,0,0,0,2,3],
                   'B': [2,0,0,1,0,1]})
英文:

You can use boolean indexing with groupby.transform to count the successive zeros:

# is any value 0?
m = df.eq(0).any(axis=1)

# is the row part of a stretch of
# up to N consecutive 0?
m2 = df.groupby(m.ne(m.shift()).cumsum()).transform('size').le(5)

# keep only rows that don't match the two conditions
out = df[~(m&m2)]

Output:

   A  B
0  1  2
5  3  1

Reproducible input:

df = pd.DataFrame({'A': [1,0,0,0,2,3],
                   'B': [2,0,0,1,0,1]})

huangapple
  • 本文由 发表于 2023年5月20日 22:37:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76295753.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定