2023年5月10日 20:43:53go评论71阅读模式

英文:

Extract part of string in Pandas column to a new column

问题

以下是翻译好的内容：

我有一个简单的Python Pandas数据框，其中包含一些列，就像下面的示例一样：

df_test = pd.DataFrame(data=None, columns=['file', 'comment'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_2_v2', 'file_3', 'file_3_v2']
df_test.comment = ['none: 5', 'Replacing: file_1', 'none', 'Replacing: file_2', 'none', 'Replacing: file_3']

我想要做的是创建一个新列，以以下方式合并其他列中的字符串：

如果“comment”列的字符串以“Replacing:”开头，新列应该包含“comment”列中字符串的第二部分。

如果“comment”列不以这个字符串开头，新列应该填充为该位置上的“file”值。

这个示例的最终结果应该是一个包含如下字符串的列：

['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']

如果“comment”列中的其他条目也包含冒号，而不仅仅是应该使用的条目，那么就会变得比较复杂。希望这可以帮助你，谢谢！

英文:

I have a simple Pandas dataframe in Python consisting of a few columns like in the example below:

df_test = pd.DataFrame(data = None, columns = [&#39;file&#39;,&#39;comment&#39;])
df_test.file = [&#39;file_1&#39;, &#39;file_1_v2&#39;, &#39;file_2&#39;, &#39;file_2_v2&#39;, &#39;file_3&#39;, &#39;file_3_v2&#39;]
df_test.comment = [&#39;none: 5&#39;, &#39;Replacing: file_1&#39;, &#39;none&#39;, &#39;Replacing: file_2&#39;, &#39;none&#39;, &#39;Replacing: file_3&#39;]

What I want to do is to create a new column that combines strings from the other ones in the following manner:

The new column should contain the second part of the string in the 'comment' column if that string starts with 'Replacing: '.

If the 'comment' column does not start with this string, it should instead fill it with the value of 'file' in that position.

The end result for this example should be a column with the strings

[&#39;file_1&#39;, &#39;file_1&#39;, &#39;file_2&#39;, &#39;file_2&#39;, &#39;file_3&#39;, &#39;file_3&#39;]

It would be pretty easy if no other entries in 'comment' contained a colon than the ones that should be used, but like I entered in the example some of them do, meaning something like

df_test[&#39;comment&#39;].str.extract(r&#39;\s(.*)$&#39;, expand=False).fillna(df_test[&#39;file&#39;])

will not work, as this one would split the string along every colon, which should not be the case. Any help is appreciated, thanks!

答案1

得分: 2

Use Replacing: (.*) 作为正则表达式以强制匹配 "Replacing: "，不匹配的部分将变为 NaN:

df_test['comment'].str.extract(r'Replacing: (.*)', expand=False).fillna(df_test['file'])

输出:

0    file_1
1    file_1
2    file_2
3    file_2
4    file_3
5    file_3
Name: comment, dtype: object

英文:

Use Replacing: (.*) as regex to force matching the "Replacing: ", the non-matches will be NaN:

df_test[&#39;comment&#39;].str.extract(r&#39;Replacing: (.*)&#39;, expand=False).fillna(df_test[&#39;file&#39;])

Output:

0    file_1
1    file_1
2    file_2
3    file_2
4    file_3
5    file_3
Name: comment, dtype: object

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

提取Pandas列中的字符串部分到一个新列。

问题

答案1

Tkinter 图片和几何创建

No context hint and false "unresolved reference" on discord.Interaction.response.send_message in PyCharm

更快的将大型嵌套XML转换为R数据框的方法

动态数据框和Dash回调中的条件样式化

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论