2023年5月24日 18:28:44go评论61阅读模式

英文:

How to remove trailing rows that contain zero of pandas DataFrame

问题

我有一个带有单列的pandas数据框，该列以一些值为零结尾，如下所示：

    index value
    0    4.0
    1    34.0
    2    -2.0
    3    15.0
    ...    ...
    96     0.0
    97     45
    98     0.0
    99     0.0
    100    0.0

我想要删除包含零值的尾部行，生成以下数据框：

    index value
    0    4.0
    1    34.0
    2    -2.0
    3    15.0
    ...    ...
    96     0.0
    97     45

如何通过利用pandas的函数来实现呢？

我知道可以通过迭代地检查数据框的最后一个值并删除它，但我更愿意通过使用pandas的内置函数来实现，因为这会更快。

while df.iloc[-1, 0] == 0:
    df.drop(df.tail(1).index, inplace=True)

编辑：需要明确的是，数据框可能包含其他零值，但我只想删除尾部的零值，而其他零值应保持不变。我已相应地编辑了示例。

英文:

I have a pandas dataframe with a single column, which ends with some values being zero, like so:

index value
0    4.0
1    34.0
2    -2.0
3    15.0
...    ...
96     0.0
97     45
98     0.0
99     0.0
100    0.0

I would like to strip away the trailing rows that contain the zero value, producing the following dataframe:

index value
0    4.0
1    34.0
2    -2.0
3    15.0
...    ...
96     0.0
97     45

How can I do it by leveraging pandas's functions?

I know that I can check the last value of the dataframe iteratively and remove it if it's zero, but I'd rather do it in a way that leverages pandas's built-in function because this would be much faster.

while df.iloc[-1,0] == 0:
    df.drop(df.tail(1).index,inplace=True)

EDIT: to be clear, the dataframe may or may not contain other zeros. However, I only want to strip trailing zeros, while the other zeros should stay untouched. I have edited the example accordingly.

答案1

得分: 2

假设零值都堆叠在DataFrame的末尾：

# 找到最后一个非零值的索引
last_nonzero_index = df['value'].to_numpy().nonzero()[0][-1]

# 创建一个只包含非零行的新DataFrame
new_df = df.iloc[:last_nonzero_index + 1]

否则，如果零值分散在整个DataFrame中：

# 找到非零值的索引
nonzero_index = df['value'].to_numpy().nonzero()[0]

# 创建一个只包含非零行的新DataFrame
new_df = df.iloc[nonzero_index]

英文:

Assuming that the zero values are all stacked at the end of the DataFrame:

# find the index of the last non-zero value
last_nonzero_index = df[&#39;value&#39;].to_numpy().nonzero()[0][-1]

# create a new DataFrame with only the non-zero rows
new_df = df.iloc[:last_nonzero_index + 1]

Otherwise, if they are scattered throughout the DataFrame:

# find index of non-zero values
nonzero_index = df[&#39;value&#39;].to_numpy().nonzero()[0]

# create a new DataFrame with only the non-zero rows
new_df = df.iloc[nonzero_index]

答案2

得分: 2

使用反转的 cummax 和布尔索引（boolean indexing）：

out = df[df.loc[::-1, 'value'].ne(0).cummax()]

输出：

       value
index       
0        4.0
1       34.0
2       -2.0
3       15.0
97      45.0

中间步骤：

       value   mask
index              
0        4.0   True
1       34.0   True
2       -2.0   True
3       15.0   True
97      45.0   True
98       0.0  False
99       0.0  False
100      0.0  False

或者，如果您确保至少有一个非零值：

out = df.loc[:df.loc[::-1, 'value'].ne(0).idxmax()]

英文:

Use boolean indexing with a reversed cummax:

out = df[df.loc[::-1, &#39;value&#39;].ne(0).cummax()]

Output:

       value
index       
0        4.0
1       34.0
2       -2.0
3       15.0
97      45.0

Intermediate:

       value   mask
index              
0        4.0   True
1       34.0   True
2       -2.0   True
3       15.0   True
97      45.0   True
98       0.0  False
99       0.0  False
100      0.0  False

Alternatively, if you are sure that there is at least one non-zero value:

out = df.loc[:df.loc[::-1, &#39;value&#39;].ne(0).idxmax()]

答案3

得分: 1

你可以使用广播来完成

df = df[(df != 0.0).any(axis=1)]

英文:

You can do it with broadcasting

df = df[(df != 0.0).any(axis=1)]

答案4

得分: 1

您可以将“value”列与0进行比较，并对布尔结果进行反向累加和。在累加后，末尾的0将保持为0。

```python
out = df[df.loc[::-1, 'value'].ne(0).cumsum()[::-1].ne(0)]

print(out)

    value
0     4.0
1    34.0
2    -2.0
3    15.0
4     0.0
97   45.0

英文:

You can compare value column with 0 and do a reverse cumsum of the boolean result. The tailing 0 would keep 0 after the cumsum.

out = df[df.loc[::-1, &#39;value&#39;].ne(0).cumsum()[::-1].ne(0)]

print(out)

    value
0     4.0
1    34.0
2    -2.0
3    15.0
4     0.0
97   45.0

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何删除pandas DataFrame中包含零的尾行

问题

答案1

答案2

答案3

答案4

Connect remote Hive server in VS Code.

获取请求中的URL视图名称的Django方法是什么？

无法在Django和Chart.js之间传递数据。

不需要的重复Kivy小部件

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论