问题

I want to mark the second one if the first one contains a pattern. Very large of rows (>10000's)

date      | items 
20100605  | apple is red 
20110606  | orange is orange 
20120607  | apple is green

B: shorter with a few hundred rows.

id   |  color
123  |  is Red
234  |  not orange
235  |  is green

Result would be to flag all columns in B if pattern found in A, possibly adding a column to B like

B:
id   |  color       | found
123  |  is Red      | true
234  |  not orange  | false
235  |  is green    | true

英文:

I have two dataframes, and I want to mark the second one if the first one contains a pattern. Very large of rows (>10000's)

date      | items 
20100605  | apple is red 
20110606  | orange is orange 
20120607  | apple is green

B: shorter with a few hundred rows.

id   |  color
123  |  is Red
234  |  not orange
235  |  is green

Result would be to flag all columns in B if pattern found in A, possibly adding a column to B like

B:
id   |  color       | found
123  |  is Red      | true
234  |  not orange  | false
235  |  is green    | true

thinking of something like, dfB['found'] = dfB['color'].isin(dfA['items']) but don't see any way to ignore case. Also, with this approach it will change true to false. Don't want to change those which are already set true. Also, I believe it's inefficient to loop large dataframes more than once. Running through A once and marking B would be better way but not sure how to achieve that using isin(). Any other ways? Especially ignoring case sensitivity of pattern.

答案1

得分: 1

你可以使用类似以下的代码：

df2['check'] = df2['color'].apply(lambda x: True if any(x.casefold() in i.casefold() for i in df['items']) else False)

或者你可以使用 str.contains：

df2['check'] = df2['color'].str.contains('|'.join(df['items'].str.split(" ").str[1] + ' ' + df['items'].str.split(" ").str[2]), case=False)

# 获取第二和第三个单词

英文:

You can use something like this:

df2[&#39;check&#39;] = df2[&#39;color&#39;].apply(lambda x: True if any(x.casefold() in i.casefold() for i in df[&#39;items&#39;]) else False)

or you can use str.contains:

df2[&#39;check&#39;] = df2[&#39;color&#39;].str.contains(&#39;|&#39;.join(df[&#39;items&#39;].str.split(&quot; &quot;).str[1] + &#39; &#39; + df[&#39;items&#39;].str.split(&quot; &quot;).str[2]),case=False)

#get second and third words

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

标记一个数据框中是否找到另一个数据框中的模式。

问题

答案1

对于每个组，根据另一列中的数值添加一个新的偏移列。

Why do these two regular expressions work differently with re.sub(), but return the same match with re.search()?

从Python的SQLAlchemy连接对象和表名字符串中获取表描述。

Getting an error whenever the string is too long while passing it back to go from python script with cmd.Output()

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论