2023年5月28日 05:22:59go评论35阅读模式

英文:

How to select rows until an element is encountered in a column?

问题

让我们假设我们有以下的数据框：

```python
import pandas as pd

df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])

这将给我们以下的数据框：

df
   0
A  1
B  2
C  3
D  3

我在寻找一种快速的方法（在一定时间内）来提取直到第一次遇到3为止的行。

我找到了一个解决方案（写在下面的答案部分），但我想知道是否有其他更常规的方法。

提前感谢您的贡献。

找到的解决方案

import pandas as pd

df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])

mask = df[0].eq(3).cumsum().cumsum().le(1)
r = df[mask]

print(r)


<details>
<summary>英文:</summary>

Let&#39;s suppose we have the following dataframe :

```python
import pandas as pd

df = pd.DataFrame(index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], data = [1,2,3,3])

which gives us the following dataframe :

df
   0
A  1
B  2
C  3
D  3

I was looking for a quick way (during a certain time) to extract the rows until the first occurrence of 3 for instance is encountered.

I found a solution (written in the answer section below) but I wonder if there are other more conventional approaches.

Thanks in advance for your contributions.

Solution found

import pandas as pd

df = pd.DataFrame(index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], data = [1,2,3,3])

mask = df[0].eq(3).cumsum().cumsum().le(1)
r = df[mask]

print(r)

答案1

得分: 1

可以使用布尔掩码与 shift 和 cummax 来实现：

# 找到前一行的值为3的情况
m = df[0].shift().eq(3)

# 仅保留第一个匹配之前的行
out = df[~m.cummax()]

作为一行代码：

out = df[~df[0].shift().eq(3).cummax()]

输出结果：

另一种方法：

m = df[0].shift().ne(3)

out = df[m.cummin()]

英文:

You could use a boolean mask with shift and cummax:

# find values for which the previous row is 3
m = df[0].shift().eq(3)

# keep only before the first one
out = df[~m.cummax()]

As a one-liner:

out = df[~df[0].shift().eq(3).cummax()]

Output:

Alternative:

m = df[0].shift().ne(3)

out = df[m.cummin()]

答案2

得分: 1

你可以直接使用 .loc 方法来找到第一个出现数字 "3" 的索引位置：

# 获取数字3出现的索引列表
index_with_num = df.loc[df[0] == 3].index.tolist()

# 如果出现了3，我们筛选数据框
if index_with_num:
    df_new = df.loc[:index_with_num[0], :].copy()

或者，你可以遍历数据框，在找到第一个出现3的情况下停止循环：

# 遍历数据框
for index, row in df.iterrows():
    if row[0] == 3:
        break

# 将筛选后的数据框分配给一个新的副本
df_new = df.loc[:index, :].copy()

英文:

You could use .loc directly to find the first index where the number "3" is found:

# Getting list of index where 3 appears
index_with_num = df.loc[df[0] == 3].index.tolist()

# If 3 appeared, we filter the dataframe
if index_with_num:
    df_new = df.loc[:index_with_num[0], :].copy()

Or, you could loop through the dataframe and stop the loop after you find the first 3 occurrence:

# Looping through the dataframe
for index, row in df.iterrows():
    if row[0] == 3:
        break

# assigning the filtered dataframe to a new copy
df_new = df.loc[:index, :].copy()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何选择直到在某一列中遇到一个元素？

问题

答案1

答案2

如何使用Python从长列表名称中仅获取第一个名称，其中名称之间用点分隔。

Pandas DF: 创建新列，通过删除现有列的最后一个单词。

复制远程的PostgreSQL数据库到第二个远程服务器。

这个设计是否代表循环依赖？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论