如何选择直到在某一列中遇到一个元素?

huangapple go评论35阅读模式
英文:

How to select rows until an element is encountered in a column?

问题

让我们假设我们有以下的数据框

```python
import pandas as pd

df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])

这将给我们以下的数据框:

df
   0
A  1
B  2
C  3
D  3

我在寻找一种快速的方法(在一定时间内)来提取直到第一次遇到3为止的行。

我找到了一个解决方案(写在下面的答案部分),但我想知道是否有其他更常规的方法。

提前感谢您的贡献。

找到的解决方案

import pandas as pd

df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])

mask = df[0].eq(3).cumsum().cumsum().le(1)
r = df[mask]

print(r)
   0
A  1
B  2
C  3

<details>
<summary>英文:</summary>

Let&#39;s suppose we have the following dataframe :

```python
import pandas as pd

df = pd.DataFrame(index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], data = [1,2,3,3])

which gives us the following dataframe :

df
   0
A  1
B  2
C  3
D  3

I was looking for a quick way (during a certain time) to extract the rows until the first occurrence of 3 for instance is encountered.

I found a solution (written in the answer section below) but I wonder if there are other more conventional approaches.

Thanks in advance for your contributions.

Solution found

import pandas as pd

df = pd.DataFrame(index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], data = [1,2,3,3])

mask = df[0].eq(3).cumsum().cumsum().le(1)
r = df[mask]

print(r)
   0
A  1
B  2
C  3

答案1

得分: 1

可以使用布尔掩码与 shiftcummax 来实现:

# 找到前一行的值为3的情况
m = df[0].shift().eq(3)

# 仅保留第一个匹配之前的行
out = df[~m.cummax()]

作为一行代码:

out = df[~df[0].shift().eq(3).cummax()]

输出结果:

   0
A  1
B  2
C  3

另一种方法:

m = df[0].shift().ne(3)

out = df[m.cummin()]
英文:

You could use a boolean mask with shift and cummax:

# find values for which the previous row is 3
m = df[0].shift().eq(3)

# keep only before the first one
out = df[~m.cummax()]

As a one-liner:

out = df[~df[0].shift().eq(3).cummax()]

Output:

   0
A  1
B  2
C  3

Alternative:

m = df[0].shift().ne(3)

out = df[m.cummin()]

答案2

得分: 1

你可以直接使用 .loc 方法来找到第一个出现数字 "3" 的索引位置:

# 获取数字3出现的索引列表
index_with_num = df.loc[df[0] == 3].index.tolist()

# 如果出现了3,我们筛选数据框
if index_with_num:
    df_new = df.loc[:index_with_num[0], :].copy()

或者,你可以遍历数据框,在找到第一个出现3的情况下停止循环:

# 遍历数据框
for index, row in df.iterrows():
    if row[0] == 3:
        break

# 将筛选后的数据框分配给一个新的副本
df_new = df.loc[:index, :].copy()
英文:

You could use .loc directly to find the first index where the number "3" is found:

# Getting list of index where 3 appears
index_with_num = df.loc[df[0] == 3].index.tolist()

# If 3 appeared, we filter the dataframe
if index_with_num:
    df_new = df.loc[:index_with_num[0], :].copy()

Or, you could loop through the dataframe and stop the loop after you find the first 3 occurrence:

# Looping through the dataframe
for index, row in df.iterrows():
    if row[0] == 3:
        break

# assigning the filtered dataframe to a new copy
df_new = df.loc[:index, :].copy()

huangapple
  • 本文由 发表于 2023年5月28日 05:22:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76349101.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定