从字符串数组中提取包含子字符串的字符串(Python)

huangapple go评论65阅读模式
英文:

Extracting from an array of strings, strings that contain a substring in them (Python)

问题

以下是您要翻译的内容:

A question in Python (3.9.5) and Pandas:

Suppose I have an array of strings `x` and I want to extract all the elements that contains a certain substring, e.g. `feb05`. Is there a Pythonic way to do it in one-line, including using a Pandas functions?

Example for what I mean:

x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]

I can run a loop,

import numpy as np
import pandas as pd

desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
if must_contain in test:
desired_output.append(test)
indices_bool[idx] = 1

但我寻求一种更具Python风格的方法。

在我的应用程序中,`x` 是Pandas数据框中的一列,因此使用Pandas函数的答案也将受欢迎。目标是筛选具有 `must_contain` 的字段 `x` 的所有行(例如,`x = df["names"]`)。
英文:

A question in Python (3.9.5) and Pandas:

Suppose I have an array of strings x and I want to extract all the elements that contains a certain substring, e.g. feb05. Is there a Pythonic way to do it in one-line, including using a Pandas functions?

Example for what I mean:

x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]

I can run a loop,

import numpy as np
import pandas as pd

desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
   if must_contain in test:
      desired_output.append(test)
      indices_bool[idx] = 1
      

but I seek for a more Pythonic way to do it.

In my application x is a column in a Pandas dataframe, so answers with Pandas functions will also be welcomed. The goal is to filter all the rows that has must_contain in the field x (e.g. x = df["names"]).

答案1

得分: 1

由于您正在使用pandas,您可以使用str.contains来获取布尔条件:

import pandas as pd
df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
must_contain = "feb05"

df.x.str.contains(must_contain)
#0    False
#1    False
#2    False
#3     True
#4     True
#Name: x, dtype: bool

根据条件筛选:

df[df.x.str.contains(must_contain)]
#            x
#3  2023_feb05
#4  2024_feb05
英文:

Since you are with pandas, you can use str.contains to get the boolean condition:

import pandas as pd
df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
must_contain = "feb05"

df.x.str.contains(must_contain)
#0    False
#1    False
#2    False
#3     True
#4     True
#Name: x, dtype: bool

Filter by the condition:

df[df.x.str.contains(must_contain)]
#            x
#3  2023_feb05
#4  2024_feb05

答案2

得分: 1

no pandas

list(filter(lambda y: must_contain in y,x))

["2023_feb05", "2024_feb05"]


pandas

series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()

["2023_feb05", "2024_feb05"]


<details>
<summary>英文:</summary>

no pandas

list(filter(lambda y: must_contain in y,x))

["2023_feb05", "2024_feb05"]


pandas

series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()

["2023_feb05", "2024_feb05"]



</details>



huangapple
  • 本文由 发表于 2023年2月6日 03:11:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/75354820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定