英文:
Delete a row or consecutive rows (from 1 to 5) in which there is 0 as a value of one of the two columns (or both)
问题
我必须根据以下条件对其进行更改:
-
删除包含A列或B列(或两者)中的0的行或连续行(从1到5)
-
删除包含A列或B列中的0的连续行(超过20个)
我期望的输出如下:
A B
1 2
3 1
英文:
I have a Dataframe that looks like this:
A B
1 2
0 0
0 0
0 1
2 0
3 1
I have to change it based on the following conditions:
-Delete a row or consecutive rows (from 1 to 5) in which there is a 0 on either the A or the B (or both) columns
-Delete consecutive rows (more than 20) in which there is a 0 on either the A or the B columns
I'm expecting this as an output:
A B
1 2
3 1
答案1
得分: 0
你可以使用布尔索引和groupby.transform
来计算连续的零值:
# 是否有任何值为0?
m = df.eq(0).any(axis=1)
# 行是否是连续的最多N个0的一部分?
m2 = df.groupby(m.ne(m.shift()).cumsum()).transform('size').le(5)
# 保留不符合两个条件的行
out = df[~(m & m2)]
输出:
A B
0 1 2
5 3 1
可复现的输入:
df = pd.DataFrame({'A': [1,0,0,0,2,3],
'B': [2,0,0,1,0,1]})
英文:
You can use boolean indexing with groupby.transform
to count the successive zeros:
# is any value 0?
m = df.eq(0).any(axis=1)
# is the row part of a stretch of
# up to N consecutive 0?
m2 = df.groupby(m.ne(m.shift()).cumsum()).transform('size').le(5)
# keep only rows that don't match the two conditions
out = df[~(m&m2)]
Output:
A B
0 1 2
5 3 1
Reproducible input:
df = pd.DataFrame({'A': [1,0,0,0,2,3],
'B': [2,0,0,1,0,1]})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论