英文:
How to remove trailing rows that contain zero of pandas DataFrame
问题
我有一个带有单列的pandas数据框,该列以一些值为零结尾,如下所示:
index value
0 4.0
1 34.0
2 -2.0
3 15.0
... ...
96 0.0
97 45
98 0.0
99 0.0
100 0.0
我想要删除包含零值的尾部行,生成以下数据框:
index value
0 4.0
1 34.0
2 -2.0
3 15.0
... ...
96 0.0
97 45
如何通过利用pandas的函数来实现呢?
我知道可以通过迭代地检查数据框的最后一个值并删除它,但我更愿意通过使用pandas的内置函数来实现,因为这会更快。
while df.iloc[-1, 0] == 0:
df.drop(df.tail(1).index, inplace=True)
编辑:需要明确的是,数据框可能包含其他零值,但我只想删除尾部的零值,而其他零值应保持不变。我已相应地编辑了示例。
英文:
I have a pandas dataframe with a single column, which ends with some values being zero, like so:
index value
0 4.0
1 34.0
2 -2.0
3 15.0
... ...
96 0.0
97 45
98 0.0
99 0.0
100 0.0
I would like to strip away the trailing rows that contain the zero value, producing the following dataframe:
index value
0 4.0
1 34.0
2 -2.0
3 15.0
... ...
96 0.0
97 45
How can I do it by leveraging pandas's functions?
I know that I can check the last value of the dataframe iteratively and remove it if it's zero, but I'd rather do it in a way that leverages pandas's built-in function because this would be much faster.
while df.iloc[-1,0] == 0:
df.drop(df.tail(1).index,inplace=True)
EDIT: to be clear, the dataframe may or may not contain other zeros. However, I only want to strip trailing zeros, while the other zeros should stay untouched. I have edited the example accordingly.
答案1
得分: 2
假设零值都堆叠在DataFrame的末尾:
# 找到最后一个非零值的索引
last_nonzero_index = df['value'].to_numpy().nonzero()[0][-1]
# 创建一个只包含非零行的新DataFrame
new_df = df.iloc[:last_nonzero_index + 1]
否则,如果零值分散在整个DataFrame中:
# 找到非零值的索引
nonzero_index = df['value'].to_numpy().nonzero()[0]
# 创建一个只包含非零行的新DataFrame
new_df = df.iloc[nonzero_index]
英文:
Assuming that the zero values are all stacked at the end of the DataFrame:
# find the index of the last non-zero value
last_nonzero_index = df['value'].to_numpy().nonzero()[0][-1]
# create a new DataFrame with only the non-zero rows
new_df = df.iloc[:last_nonzero_index + 1]
Otherwise, if they are scattered throughout the DataFrame:
# find index of non-zero values
nonzero_index = df['value'].to_numpy().nonzero()[0]
# create a new DataFrame with only the non-zero rows
new_df = df.iloc[nonzero_index]
答案2
得分: 2
使用反转的 cummax
和布尔索引(boolean indexing):
out = df[df.loc[::-1, 'value'].ne(0).cummax()]
输出:
value
index
0 4.0
1 34.0
2 -2.0
3 15.0
97 45.0
中间步骤:
value mask
index
0 4.0 True
1 34.0 True
2 -2.0 True
3 15.0 True
97 45.0 True
98 0.0 False
99 0.0 False
100 0.0 False
或者,如果您确保至少有一个非零值:
out = df.loc[:df.loc[::-1, 'value'].ne(0).idxmax()]
英文:
Use boolean indexing with a reversed cummax
:
out = df[df.loc[::-1, 'value'].ne(0).cummax()]
Output:
value
index
0 4.0
1 34.0
2 -2.0
3 15.0
97 45.0
Intermediate:
value mask
index
0 4.0 True
1 34.0 True
2 -2.0 True
3 15.0 True
97 45.0 True
98 0.0 False
99 0.0 False
100 0.0 False
Alternatively, if you are sure that there is at least one non-zero value:
out = df.loc[:df.loc[::-1, 'value'].ne(0).idxmax()]
答案3
得分: 1
你可以使用广播来完成
df = df[(df != 0.0).any(axis=1)]
英文:
You can do it with broadcasting
df = df[(df != 0.0).any(axis=1)]
答案4
得分: 1
您可以将“value”列与0进行比较,并对布尔结果进行反向累加和。在累加后,末尾的0将保持为0。
```python
out = df[df.loc[::-1, 'value'].ne(0).cumsum()[::-1].ne(0)]
print(out)
value
0 4.0
1 34.0
2 -2.0
3 15.0
4 0.0
97 45.0
英文:
You can compare value
column with 0 and do a reverse cumsum of the boolean result. The tailing 0 would keep 0 after the cumsum.
out = df[df.loc[::-1, 'value'].ne(0).cumsum()[::-1].ne(0)]
print(out)
value
0 4.0
1 34.0
2 -2.0
3 15.0
4 0.0
97 45.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论