从字符串数组中提取包含子字符串的字符串(Python)

huangapple go评论99阅读模式
英文:

Extracting from an array of strings, strings that contain a substring in them (Python)

问题

以下是您要翻译的内容:

  1. A question in Python (3.9.5) and Pandas:
  2. Suppose I have an array of strings `x` and I want to extract all the elements that contains a certain substring, e.g. `feb05`. Is there a Pythonic way to do it in one-line, including using a Pandas functions?
  3. Example for what I mean:

x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]

  1. I can run a loop,

import numpy as np
import pandas as pd

desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
if must_contain in test:
desired_output.append(test)
indices_bool[idx] = 1

  1. 但我寻求一种更具Python风格的方法。
  2. 在我的应用程序中,`x` Pandas数据框中的一列,因此使用Pandas函数的答案也将受欢迎。目标是筛选具有 `must_contain` 的字段 `x` 的所有行(例如,`x = df["names"]`)。
英文:

A question in Python (3.9.5) and Pandas:

Suppose I have an array of strings x and I want to extract all the elements that contains a certain substring, e.g. feb05. Is there a Pythonic way to do it in one-line, including using a Pandas functions?

Example for what I mean:

  1. x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
  2. must_contain = "feb05"
  3. desired_output = ["2023_feb05", "2024_feb05"]

I can run a loop,

  1. import numpy as np
  2. import pandas as pd
  3. desired_output = []
  4. indices_bool = np.zeros(len(x))
  5. for idx, test in enumerate(x):
  6. if must_contain in test:
  7. desired_output.append(test)
  8. indices_bool[idx] = 1

but I seek for a more Pythonic way to do it.

In my application x is a column in a Pandas dataframe, so answers with Pandas functions will also be welcomed. The goal is to filter all the rows that has must_contain in the field x (e.g. x = df["names"]).

答案1

得分: 1

由于您正在使用pandas,您可以使用str.contains来获取布尔条件:

  1. import pandas as pd
  2. df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
  3. must_contain = "feb05"
  4. df.x.str.contains(must_contain)
  5. #0 False
  6. #1 False
  7. #2 False
  8. #3 True
  9. #4 True
  10. #Name: x, dtype: bool

根据条件筛选:

  1. df[df.x.str.contains(must_contain)]
  2. # x
  3. #3 2023_feb05
  4. #4 2024_feb05
英文:

Since you are with pandas, you can use str.contains to get the boolean condition:

  1. import pandas as pd
  2. df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
  3. must_contain = "feb05"
  4. df.x.str.contains(must_contain)
  5. #0 False
  6. #1 False
  7. #2 False
  8. #3 True
  9. #4 True
  10. #Name: x, dtype: bool

Filter by the condition:

  1. df[df.x.str.contains(must_contain)]
  2. # x
  3. #3 2023_feb05
  4. #4 2024_feb05

答案2

得分: 1

  1. no pandas

list(filter(lambda y: must_contain in y,x))

["2023_feb05", "2024_feb05"]

  1. pandas

series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()

["2023_feb05", "2024_feb05"]

  1. <details>
  2. <summary>英文:</summary>
  3. no pandas

list(filter(lambda y: must_contain in y,x))

["2023_feb05", "2024_feb05"]

  1. pandas

series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()

["2023_feb05", "2024_feb05"]

  1. </details>

huangapple
  • 本文由 发表于 2023年2月6日 03:11:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/75354820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定