英文:
How to select rows until an element is encountered in a column?
问题
让我们假设我们有以下的数据框:
```python
import pandas as pd
df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])
这将给我们以下的数据框:
df
0
A 1
B 2
C 3
D 3
我在寻找一种快速的方法(在一定时间内)来提取直到第一次遇到3
为止的行。
我找到了一个解决方案(写在下面的答案部分),但我想知道是否有其他更常规的方法。
提前感谢您的贡献。
找到的解决方案
import pandas as pd
df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])
mask = df[0].eq(3).cumsum().cumsum().le(1)
r = df[mask]
print(r)
0
A 1
B 2
C 3
<details>
<summary>英文:</summary>
Let's suppose we have the following dataframe :
```python
import pandas as pd
df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data = [1,2,3,3])
which gives us the following dataframe :
df
0
A 1
B 2
C 3
D 3
I was looking for a quick way (during a certain time) to extract the rows until the first occurrence of 3
for instance is encountered.
I found a solution (written in the answer section below) but I wonder if there are other more conventional approaches.
Thanks in advance for your contributions.
Solution found
import pandas as pd
df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data = [1,2,3,3])
mask = df[0].eq(3).cumsum().cumsum().le(1)
r = df[mask]
print(r)
0
A 1
B 2
C 3
答案1
得分: 1
# 找到前一行的值为3的情况
m = df[0].shift().eq(3)
# 仅保留第一个匹配之前的行
out = df[~m.cummax()]
作为一行代码:
out = df[~df[0].shift().eq(3).cummax()]
输出结果:
0
A 1
B 2
C 3
另一种方法:
m = df[0].shift().ne(3)
out = df[m.cummin()]
英文:
You could use a boolean mask with shift
and cummax
:
# find values for which the previous row is 3
m = df[0].shift().eq(3)
# keep only before the first one
out = df[~m.cummax()]
As a one-liner:
out = df[~df[0].shift().eq(3).cummax()]
Output:
0
A 1
B 2
C 3
Alternative:
m = df[0].shift().ne(3)
out = df[m.cummin()]
答案2
得分: 1
你可以直接使用 .loc 方法来找到第一个出现数字 "3" 的索引位置:
# 获取数字3出现的索引列表
index_with_num = df.loc[df[0] == 3].index.tolist()
# 如果出现了3,我们筛选数据框
if index_with_num:
df_new = df.loc[:index_with_num[0], :].copy()
或者,你可以遍历数据框,在找到第一个出现3的情况下停止循环:
# 遍历数据框
for index, row in df.iterrows():
if row[0] == 3:
break
# 将筛选后的数据框分配给一个新的副本
df_new = df.loc[:index, :].copy()
英文:
You could use .loc directly to find the first index where the number "3" is found:
# Getting list of index where 3 appears
index_with_num = df.loc[df[0] == 3].index.tolist()
# If 3 appeared, we filter the dataframe
if index_with_num:
df_new = df.loc[:index_with_num[0], :].copy()
Or, you could loop through the dataframe and stop the loop after you find the first 3 occurrence:
# Looping through the dataframe
for index, row in df.iterrows():
if row[0] == 3:
break
# assigning the filtered dataframe to a new copy
df_new = df.loc[:index, :].copy()
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论