英文:
Extract part of string in Pandas column to a new column
问题
以下是翻译好的内容:
我有一个简单的Python Pandas数据框,其中包含一些列,就像下面的示例一样:
df_test = pd.DataFrame(data=None, columns=['file', 'comment'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_2_v2', 'file_3', 'file_3_v2']
df_test.comment = ['none: 5', 'Replacing: file_1', 'none', 'Replacing: file_2', 'none', 'Replacing: file_3']
我想要做的是创建一个新列,以以下方式合并其他列中的字符串:
如果“comment”列的字符串以“Replacing:”开头,新列应该包含“comment”列中字符串的第二部分。
如果“comment”列不以这个字符串开头,新列应该填充为该位置上的“file”值。
这个示例的最终结果应该是一个包含如下字符串的列:
['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']
如果“comment”列中的其他条目也包含冒号,而不仅仅是应该使用的条目,那么就会变得比较复杂。希望这可以帮助你,谢谢!
英文:
I have a simple Pandas dataframe in Python consisting of a few columns like in the example below:
df_test = pd.DataFrame(data = None, columns = ['file','comment'])
df_test.file = ['file_1', 'file_1_v2', 'file_2', 'file_2_v2', 'file_3', 'file_3_v2']
df_test.comment = ['none: 5', 'Replacing: file_1', 'none', 'Replacing: file_2', 'none', 'Replacing: file_3']
What I want to do is to create a new column that combines strings from the other ones in the following manner:
The new column should contain the second part of the string in the 'comment' column if that string starts with 'Replacing: '.
If the 'comment' column does not start with this string, it should instead fill it with the value of 'file' in that position.
The end result for this example should be a column with the strings
['file_1', 'file_1', 'file_2', 'file_2', 'file_3', 'file_3']
It would be pretty easy if no other entries in 'comment' contained a colon than the ones that should be used, but like I entered in the example some of them do, meaning something like
df_test['comment'].str.extract(r'\s(.*)$', expand=False).fillna(df_test['file'])
will not work, as this one would split the string along every colon, which should not be the case. Any help is appreciated, thanks!
答案1
得分: 2
Use Replacing: (.*)
作为正则表达式以强制匹配 "Replacing: ",不匹配的部分将变为 NaN:
df_test['comment'].str.extract(r'Replacing: (.*)', expand=False).fillna(df_test['file'])
输出:
0 file_1
1 file_1
2 file_2
3 file_2
4 file_3
5 file_3
Name: comment, dtype: object
英文:
Use Replacing: (.*)
as regex to force matching the "Replacing: ", the non-matches will be NaN:
df_test['comment'].str.extract(r'Replacing: (.*)', expand=False).fillna(df_test['file'])
Output:
0 file_1
1 file_1
2 file_2
3 file_2
4 file_3
5 file_3
Name: comment, dtype: object
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论