如何选择直到在某一列中遇到一个元素?

huangapple go评论106阅读模式
英文:

How to select rows until an element is encountered in a column?

问题

  1. 让我们假设我们有以下的数据框
  2. ```python
  3. import pandas as pd
  4. df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])

这将给我们以下的数据框:

  1. df
  2. 0
  3. A 1
  4. B 2
  5. C 3
  6. D 3

我在寻找一种快速的方法(在一定时间内)来提取直到第一次遇到3为止的行。

我找到了一个解决方案(写在下面的答案部分),但我想知道是否有其他更常规的方法。

提前感谢您的贡献。

找到的解决方案

  1. import pandas as pd
  2. df = pd.DataFrame(index=['A', 'B', 'C', 'D'], data=[1, 2, 3, 3])
  3. mask = df[0].eq(3).cumsum().cumsum().le(1)
  4. r = df[mask]
  5. print(r)
  1. 0
  2. A 1
  3. B 2
  4. C 3
  1. <details>
  2. <summary>英文:</summary>
  3. Let&#39;s suppose we have the following dataframe :
  4. ```python
  5. import pandas as pd
  6. df = pd.DataFrame(index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], data = [1,2,3,3])

which gives us the following dataframe :

  1. df
  2. 0
  3. A 1
  4. B 2
  5. C 3
  6. D 3

I was looking for a quick way (during a certain time) to extract the rows until the first occurrence of 3 for instance is encountered.

I found a solution (written in the answer section below) but I wonder if there are other more conventional approaches.

Thanks in advance for your contributions.

Solution found

  1. import pandas as pd
  2. df = pd.DataFrame(index=[&#39;A&#39;, &#39;B&#39;, &#39;C&#39;, &#39;D&#39;], data = [1,2,3,3])
  3. mask = df[0].eq(3).cumsum().cumsum().le(1)
  4. r = df[mask]
  5. print(r)
  1. 0
  2. A 1
  3. B 2
  4. C 3

答案1

得分: 1

可以使用布尔掩码与 shiftcummax 来实现:

  1. # 找到前一行的值为3的情况
  2. m = df[0].shift().eq(3)
  3. # 仅保留第一个匹配之前的行
  4. out = df[~m.cummax()]

作为一行代码:

  1. out = df[~df[0].shift().eq(3).cummax()]

输出结果:

  1. 0
  2. A 1
  3. B 2
  4. C 3

另一种方法:

  1. m = df[0].shift().ne(3)
  2. out = df[m.cummin()]
英文:

You could use a boolean mask with shift and cummax:

  1. # find values for which the previous row is 3
  2. m = df[0].shift().eq(3)
  3. # keep only before the first one
  4. out = df[~m.cummax()]

As a one-liner:

  1. out = df[~df[0].shift().eq(3).cummax()]

Output:

  1. 0
  2. A 1
  3. B 2
  4. C 3

Alternative:

  1. m = df[0].shift().ne(3)
  2. out = df[m.cummin()]

答案2

得分: 1

你可以直接使用 .loc 方法来找到第一个出现数字 "3" 的索引位置:

  1. # 获取数字3出现的索引列表
  2. index_with_num = df.loc[df[0] == 3].index.tolist()
  3. # 如果出现了3,我们筛选数据框
  4. if index_with_num:
  5. df_new = df.loc[:index_with_num[0], :].copy()

或者,你可以遍历数据框,在找到第一个出现3的情况下停止循环:

  1. # 遍历数据框
  2. for index, row in df.iterrows():
  3. if row[0] == 3:
  4. break
  5. # 将筛选后的数据框分配给一个新的副本
  6. df_new = df.loc[:index, :].copy()
英文:

You could use .loc directly to find the first index where the number "3" is found:

  1. # Getting list of index where 3 appears
  2. index_with_num = df.loc[df[0] == 3].index.tolist()
  3. # If 3 appeared, we filter the dataframe
  4. if index_with_num:
  5. df_new = df.loc[:index_with_num[0], :].copy()

Or, you could loop through the dataframe and stop the loop after you find the first 3 occurrence:

  1. # Looping through the dataframe
  2. for index, row in df.iterrows():
  3. if row[0] == 3:
  4. break
  5. # assigning the filtered dataframe to a new copy
  6. df_new = df.loc[:index, :].copy()

huangapple
  • 本文由 发表于 2023年5月28日 05:22:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76349101.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定