英文:
Extracting from an array of strings, strings that contain a substring in them (Python)
问题
以下是您要翻译的内容:
A question in Python (3.9.5) and Pandas:
Suppose I have an array of strings `x` and I want to extract all the elements that contains a certain substring, e.g. `feb05`. Is there a Pythonic way to do it in one-line, including using a Pandas functions?
Example for what I mean:
x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]
I can run a loop,
import numpy as np
import pandas as pd
desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
if must_contain in test:
desired_output.append(test)
indices_bool[idx] = 1
但我寻求一种更具Python风格的方法。
在我的应用程序中,`x` 是Pandas数据框中的一列,因此使用Pandas函数的答案也将受欢迎。目标是筛选具有 `must_contain` 的字段 `x` 的所有行(例如,`x = df["names"]`)。
英文:
A question in Python (3.9.5) and Pandas:
Suppose I have an array of strings x
and I want to extract all the elements that contains a certain substring, e.g. feb05
. Is there a Pythonic way to do it in one-line, including using a Pandas functions?
Example for what I mean:
x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]
I can run a loop,
import numpy as np
import pandas as pd
desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
if must_contain in test:
desired_output.append(test)
indices_bool[idx] = 1
but I seek for a more Pythonic way to do it.
In my application x
is a column in a Pandas dataframe, so answers with Pandas functions will also be welcomed. The goal is to filter all the rows that has must_contain
in the field x
(e.g. x = df["names"]
).
答案1
得分: 1
由于您正在使用pandas,您可以使用str.contains
来获取布尔条件:
import pandas as pd
df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
must_contain = "feb05"
df.x.str.contains(must_contain)
#0 False
#1 False
#2 False
#3 True
#4 True
#Name: x, dtype: bool
根据条件筛选:
df[df.x.str.contains(must_contain)]
# x
#3 2023_feb05
#4 2024_feb05
英文:
Since you are with pandas, you can use str.contains
to get the boolean condition:
import pandas as pd
df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
must_contain = "feb05"
df.x.str.contains(must_contain)
#0 False
#1 False
#2 False
#3 True
#4 True
#Name: x, dtype: bool
Filter by the condition:
df[df.x.str.contains(must_contain)]
# x
#3 2023_feb05
#4 2024_feb05
答案2
得分: 1
no pandas
list(filter(lambda y: must_contain in y,x))
["2023_feb05", "2024_feb05"]
pandas
series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()
["2023_feb05", "2024_feb05"]
<details>
<summary>英文:</summary>
no pandas
list(filter(lambda y: must_contain in y,x))
["2023_feb05", "2024_feb05"]
pandas
series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()
["2023_feb05", "2024_feb05"]
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论