2023年2月6日 03:11:44go评论99阅读模式

英文:

Extracting from an array of strings, strings that contain a substring in them (Python)

问题

以下是您要翻译的内容：

A question in Python (3.9.5) and Pandas:
Suppose I have an array of strings `x` and I want to extract all the elements that contains a certain substring, e.g. `feb05`. Is there a Pythonic way to do it in one-line, including using a Pandas functions?
Example for what I mean:

x = ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]
must_contain = "feb05"
desired_output = ["2023_feb05", "2024_feb05"]

I can run a loop,

import numpy as np
import pandas as pd

desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
if must_contain in test:
desired_output.append(test)
indices_bool[idx] = 1

但我寻求一种更具Python风格的方法。
在我的应用程序中，`x` 是Pandas数据框中的一列，因此使用Pandas函数的答案也将受欢迎。目标是筛选具有 `must_contain` 的字段 `x` 的所有行（例如，`x = df[&quot;names&quot;]`）。

英文:

A question in Python (3.9.5) and Pandas:

Suppose I have an array of strings x and I want to extract all the elements that contains a certain substring, e.g. feb05. Is there a Pythonic way to do it in one-line, including using a Pandas functions?

Example for what I mean:

x = [&quot;2023_jan05&quot;, &quot;2023_jan_27&quot;, &quot;2023_feb04&quot;, &quot;2023_feb05&quot;, &quot;2024_feb05&quot;]
must_contain = &quot;feb05&quot;
desired_output = [&quot;2023_feb05&quot;, &quot;2024_feb05&quot;]

I can run a loop,

import numpy as np
import pandas as pd
desired_output = []
indices_bool = np.zeros(len(x))
for idx, test in enumerate(x):
   if must_contain in test:
      desired_output.append(test)
      indices_bool[idx] = 1

but I seek for a more Pythonic way to do it.

In my application x is a column in a Pandas dataframe, so answers with Pandas functions will also be welcomed. The goal is to filter all the rows that has must_contain in the field x (e.g. x = df["names"]).

答案1

得分: 1

由于您正在使用pandas，您可以使用str.contains来获取布尔条件：

import pandas as pd
df = pd.DataFrame({'x': ["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"]})
must_contain = "feb05"
df.x.str.contains(must_contain)
#0    False
#1    False
#2    False
#3     True
#4     True
#Name: x, dtype: bool

根据条件筛选：

df[df.x.str.contains(must_contain)]
#            x
#3  2023_feb05
#4  2024_feb05

英文:

Since you are with pandas, you can use str.contains to get the boolean condition:

import pandas as pd
df = pd.DataFrame({&#39;x&#39;: [&quot;2023_jan05&quot;, &quot;2023_jan_27&quot;, &quot;2023_feb04&quot;, &quot;2023_feb05&quot;, &quot;2024_feb05&quot;]})
must_contain = &quot;feb05&quot;
df.x.str.contains(must_contain)
#0    False
#1    False
#2    False
#3     True
#4     True
#Name: x, dtype: bool

Filter by the condition:

df[df.x.str.contains(must_contain)]
#            x
#3  2023_feb05
#4  2024_feb05

答案2

得分: 1

no pandas

list(filter(lambda y: must_contain in y,x))

["2023_feb05", "2024_feb05"]


pandas

series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()

["2023_feb05", "2024_feb05"]


<details>
<summary>英文:</summary>
no pandas

list(filter(lambda y: must_contain in y,x))

["2023_feb05", "2024_feb05"]


pandas

series=pd.Series(["2023_jan05", "2023_jan_27", "2023_feb04", "2023_feb05", "2024_feb05"])
must_contain = "feb05"
series[series.str.contains(must_contain)].to_list()

["2023_feb05", "2024_feb05"]


</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从字符串数组中提取包含子字符串的字符串（Python）

问题

答案1

答案2

我想使用Python查询MongoDB。

在SLES 12上安装较新版本的Python。

使用来自数组的参数解决ODEs。

如何从包含非数字列的数据框中计算相关性。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论