英文:
Keep original string values after pandas.series.str.extract() if the regex doesn't match
问题
以下是您要翻译的内容:
我正在尝试从字符串中提取电子邮件地址,并希望确保如果原始值已经按我预期的方式格式化,它不会被更改为NaN,而是保持原样。
示例输入
<class 'pandas.core.series.Series'>
1    <doe.b.john@gmail.com>
2    <doe.c.jane@gmail.com>
3    person.anonymous@hotmail.com
4    dent.arthur@space.com
我正在使用
# curr_emails is <class 'pandas.core.series.Series'>
curr_emails = curr_emails.str.extract(r"<([^<>]+)>").squeeze()` # 正则表达式提取在<和>之间的文本
我收到的结果是
1    doe.b.john@gmail.com
2    doe.c.jane@gmail.com
3    NaN
4    Nan
但我希望的结果是
1    doe.b.john@gmail.com
2    doe.c.jane@gmail.com
3    person.anonymous@hotmail.com
4    dent.arthur@space.com
类似的问题在这里有人提出,但我似乎无法使其适用于我的当前方法。
英文:
I am trying to extract emails from strings, and want to make sure that if the original value is formatted how I expect already, that it is not changed to Nan and instead is kept as is.
Example input
<class 'pandas.core.series.Series'>
1    <doe.b.john@gmail.com>
2    <doe.c.jane@gmail.com>
3    person.anonymous@hotmail.com
4    dent.arthur@space.com
I am using
# curr_emails is <class 'pandas.core.series.Series'>
curr_emails = curr_emails.str.extract(r"<([^<>]+)>").squeeze()` # regex extracts text between < >
I receive back
1    doe.b.john@gmail.com
2    doe.c.jane@gmail.com
3    NaN
4    Nan
But I instead would like
1    doe.b.john@gmail.com
2    doe.c.jane@gmail.com
3    person.anonymous@hotmail.com
4    dent.arthur@space.com
A similar question is posted here, but I could not seem to make it work with my current approach.
答案1
得分: 1
如果没有<>模式,您可以用原始值填充它们。或者如果<>位于电子邮件的开头和结尾,您可以将它们删除。
curr_emails = (curr_emails.str.extract(r"<([^<>]+)>").squeeze()
               .fillna(curr_emails))
# 或者
curr_emails = curr_emails.str.strip(r'<>')
英文:
If there is no <> pattern, you can fill them by the original value. Or if the <> is at the beginning and the end of email, you can just strip them
curr_emails = (curr_emails.str.extract(r"<([^<>]+)>").squeeze()
               .fillna(curr_emails))
# or
curr_emails = curr_emails.str.strip(r'<>')
答案2
得分: 0
尝试使用 str.replace 而不是 str.extract 来将字符串开头的 < 或字符串结尾的 > 替换为 ''
curr_emails.str.replace('^<|>$', '''', regex=True)
0            doe.b.john@gmail.com
1            doe.c.jane@gmail.com
2    person.anonymous@hotmail.com
3           dent.arthur@space.com
英文:
Try using str.replace instead of str.extract to replace < at the start of a string or > at the end of a string with ''
curr_emails.str.replace('^<|>$', '', regex=True)
0            doe.b.john@gmail.com
1            doe.c.jane@gmail.com
2    person.anonymous@hotmail.com
3           dent.arthur@space.com
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论