英文:
Drop rows where a certain column has more than n zeros in a row
问题
我正在处理来自数据记录器的数据,该记录器以毫秒间隔收集数据,无论是否发生有趣的事情。因此,存在大段的数据,在其中某些列的值为零。我想要删除这些行,但保留其中的一些作为填充,以便进一步处理。
示例输入:
a b c
1 2 3
0 3 2
0 1 0
0 3 7
0 9 6
4 0 1
0 2 5
0 6 3
0 1 6
示例输出 - 在列a
上进行筛选,将包含超过2个零行的列替换为正好有2个零行(删除哪些行可以是任意的):
a b c
1 2 3
0 3 2
0 1 0
4 0 1
0 2 5
0 6 3
我的当前代码删除了所有列中值为零的行:
df = df[df['ColName'] != 0]
英文:
I am processing data from a datalogger which collects data at millisecond intervals, regardless of whether anything interesting is happening. As a result, there are long stretches of data where certain columns are zero. I would like to drop these rows, but leave some number of them as padding, which will help in further processing.
Example input:
a b c
1 2 3
0 3 2
0 1 0
0 3 7
0 9 6
4 0 1
0 2 5
0 6 3
0 1 6
Example output - filter on column a
, replace column with more than 2 zero rows with exactly 2 zero rows (selection of which rows to drop can be arbitrary):
a b c
1 2 3
0 3 2
0 1 0
4 0 1
0 2 5
0 6 3
My current code drops all rows where a column is zero:
df= df[df['ColName'] != 0]
答案1
得分: 1
你可以使用滚动的(n+1)总和来排除你不想保留的行。例如,对于连续两个0,你可以使用:
exclude = df['a'].rolling(3).sum() != 0
然后:
output = df.loc[exclude,:]
返回:
a b c
0 1 2 3
1 0 3 2
2 0 1 0
5 4 0 1
6 0 2 5
7 0 6 3
或者,可以使用一行代码实现:
output = df[df['a'].rolling(3).sum() != 0]
你可以更改滚动函数中的数字以适应你想要检查的周期数量。
英文:
You can use a rolling (n+1) sum to exclude the rows you wouldn't like to keep. For example for two consecutive 0s you can use:
exclude = df['a'].rolling(3).sum() != 0
And then:
output = df.loc[exclude,:]
Returning:
a b c
0 1 2 3
1 0 3 2
2 0 1 0
5 4 0 1
6 0 2 5
7 0 6 3
Or, in a one-liner:
output = df[df['a'].rolling(3).sum() != 0]
You can change the number in the rolling function to fit the amount of periods you'd like to check.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论