提取Pandas列中的字符串部分到一个新列。

huangapple go评论71阅读模式
英文:

Extract part of string in Pandas column to a new column

问题

以下是翻译好的内容:

我有一个简单的Python Pandas数据框,其中包含一些列,就像下面的示例一样:

df_test = pd.DataFrame(data=None, columns=['file', 'comment'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_2_v2', 'file_3', 'file_3_v2']
df_test.comment = ['none: 5', 'Replacing: file_1', 'none', 'Replacing: file_2', 'none', 'Replacing: file_3']

我想要做的是创建一个新列,以以下方式合并其他列中的字符串:

如果“comment”列的字符串以“Replacing:”开头,新列应该包含“comment”列中字符串的第二部分。

如果“comment”列不以这个字符串开头,新列应该填充为该位置上的“file”值。

这个示例的最终结果应该是一个包含如下字符串的列:

['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']

如果“comment”列中的其他条目也包含冒号,而不仅仅是应该使用的条目,那么就会变得比较复杂。希望这可以帮助你,谢谢!

英文:

I have a simple Pandas dataframe in Python consisting of a few columns like in the example below:

df_test = pd.DataFrame(data = None, columns = ['file','comment'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_2_v2', 'file_3', 'file_3_v2']
df_test.comment = ['none: 5', 'Replacing: file_1', 'none', 'Replacing: file_2', 'none', 'Replacing: file_3']

What I want to do is to create a new column that combines strings from the other ones in the following manner:

The new column should contain the second part of the string in the 'comment' column if that string starts with 'Replacing: '.

If the 'comment' column does not start with this string, it should instead fill it with the value of 'file' in that position.

The end result for this example should be a column with the strings

['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']

It would be pretty easy if no other entries in 'comment' contained a colon than the ones that should be used, but like I entered in the example some of them do, meaning something like

df_test['comment'].str.extract(r'\s(.*)$', expand=False).fillna(df_test['file'])

will not work, as this one would split the string along every colon, which should not be the case. Any help is appreciated, thanks!

答案1

得分: 2

Use Replacing: (.*) 作为正则表达式以强制匹配 "Replacing: ",不匹配的部分将变为 NaN:

df_test['comment'].str.extract(r'Replacing: (.*)', expand=False).fillna(df_test['file'])

输出:

0    file_1
1    file_1
2    file_2
3    file_2
4    file_3
5    file_3
Name: comment, dtype: object
英文:

Use Replacing: (.*) as regex to force matching the "Replacing: ", the non-matches will be NaN:

df_test['comment'].str.extract(r'Replacing: (.*)', expand=False).fillna(df_test['file'])

Output:

0    file_1
1    file_1
2    file_2
3    file_2
4    file_3
5    file_3
Name: comment, dtype: object

huangapple
  • 本文由 发表于 2023年5月10日 20:43:53
  • 转载请务必保留本文链接:https://go.coder-hub.com/76218589.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定