英文:
Removing Prefix from column of names in python
问题
Desired Output
ID Name
101 ADAM SMITH
102 BEN DAVIS
103 ASHELY JOHNSON
104 CATHY JONES
105 JOHN DOE SMITH
我需要去掉前缀,我尝试了 df['Name'] = df['Name'].replace(to_replace = 'DR. ', value = '')
,我对所有前缀重复了相同的代码,但是什么都没有发生。有什么原因吗?
谢谢你提前帮助。
英文:
I have this dataset
ID Name
101 DR. ADAM SMITH
102 BEN DAVIS
103 MRS. ASHELY JOHNSON
104 DR. CATHY JONES
105 JOHN DOE SMITH
Desired Output
ID Name
101 ADAM SMITH
102 BEN DAVIS
103 ASHELY JOHNSON
104 CATHY JONES
105 JOHN DOE SMITH
I need to get rid of the prefix I tried df['Name'] = df['Name'].replace(to_replace = 'DR. ', value = '')
I repeated the same code for all prefixes, but I have when I do it nothing happens. Any reason for this?
Thank you in advance.
答案1
得分: 2
使用正则表达式来匹配以.
结尾的第一个单词。
df['Name'] = df['Name'].str.replace(r'^[A-Z]+\.\s+', '', regex=True)
英文:
Use a regular expression to match the first word if it ends with .
.
df['Name'] = df['Name'].str.replace(r'^[A-Z]+\.\s+', '', regex=True)
答案2
得分: 1
# 示例数据
data = {
'ID': [101, 102, 103, 104, 105],
'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}
# 创建一个数据框
df = pd.DataFrame(data)
# 从姓名中删除前缀的函数
def remove_prefix(name):
prefixes = ['DR.', 'MRS.', 'MR.', 'MS.'] # 如有需要,可以添加更多前缀
for prefix in prefixes:
if name.startswith(prefix):
return name[len(prefix)+1:]
return name
# 将函数应用于“Name”列
df['Name'] = df['Name'].apply(remove_prefix)
# 打印修改后的数据框
print(df)
英文:
'''
# Sample data
data = {
'ID': [101, 102, 103, 104, 105],
'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}
# Create a DataFrame
df = pd.DataFrame(data)
# Function to remove prefixes from names
def remove_prefix(name):
prefixes = ['DR.', 'MRS.', 'MR.', 'MS.'] # Add more prefixes if needed
for prefix in prefixes:
if name.startswith(prefix):
return name[len(prefix)+1:]
return name
# Apply the function to the 'Name' column
df['Name'] = df['Name'].apply(remove_prefix)
# Print the modified DataFrame
print(df)
'''
答案3
得分: 0
你可以使用正则表达式来替换字符串的部分。例如:
df['Name'] = df['Name'].str.replace(r'^(?:DR|MRS?)\.\s*', '', regex=True)
print(df)
输出:
ID Name
0 101 ADAM SMITH
1 102 BEN DAVIS
2 103 ASHELY JOHNSON
3 104 CATHY JONES
4 105 JOHN DOE SMITH
注意:.replace(r'DR. ', '')
试图替换整个 DR.
为一个空字符串,而不仅仅是字符串的一部分。
英文:
You can use use regular expression to replace the part of string. For example:
df['Name'] = df['Name'].str.replace(r'^(?:DR|MRS?)\.\s*', '', regex=True)
print(df)
Prints:
ID Name
0 101 ADAM SMITH
1 102 BEN DAVIS
2 103 ASHELY JOHNSON
3 104 CATHY JONES
4 105 JOHN DOE SMITH
Note: .replace(r'DR. ', '')
is trying to replace the whole DR.
with empty string, not only the part of string.
答案4
得分: 0
import re
name = "DR. ADAM SMITH"
print(re.sub(r".*\.\s", "", name)) # ADAM SMITH
这个表达式匹配以句号和空格结尾的任何内容,并应该匹配大多数前缀("DR."、"MRS."、"MR." 等)。您可以像这样将其集成到您的代码中:
-
在代码顶部添加
import re
行。 -
使用以下行代替
df['Name'] = df['Name'].replace(to_replace='DR. ', value='')
:
df['Name'] = re.sub(r".*\.\s", "", df['Name'])
有关正则表达式的更多信息,请参阅:https://www.w3schools.com/python/python_regex.asp
<details>
<summary>英文:</summary>
**Use a Regular Expression:**
import re
name = "DR. ADAM SMITH"
print(re.sub(r".*.\s", "", name)) # ADAM SMITH
This expression matches anything that ends with a period and a space, and should match most prefixes ("DR.", "MRS.", "MR.", etc). You can integrate it into your code like this:
1. Add the line `import re` at the top of your code.
2. Use the line `df['Name'] = re.sub(r".*\.\s", "", df['Name'])` instead of `df['Name'] = df['Name'].replace(to_replace = 'DR. ', value = '')`
For more on regular expressions see: https://www.w3schools.com/python/python_regex.asp
</details>
# 答案5
**得分**: 0
你差点就做到了。你需要添加 .str:
df['Name'] = df['Name'].str.replace('DR. ', '')
<details>
<summary>英文:</summary>
You were nearly there. You needed to add .str:
df['Name'] = df['Name'].str.replace('DR. ', '')
</details>
# 答案6
**得分**: 0
`replace()` 函数没有生效的原因是它将输入视为字面字符串并寻找精确匹配。
在你的情况下,'Name' 列中的值在前缀之前和之后包含额外的空格,因此无法找到精确匹配。
为了解决这个问题,你可以使用 `re` 模块的正则表达式 (regex) 来从 'Name' 列中移除前缀:
```python
import re
import pandas as pd
data = {
'ID': [101, 102, 103, 104, 105],
'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}
df = pd.DataFrame(data)
df['Name'] = df['Name'].apply(lambda x: re.sub(r'\b(?:DR\.|MRS\.)\s*', '', x))
print(df)
英文:
The reason nothing happens when you use the replace()
function is because it treats the input as a literal string and looks for an exact match.
In your case, the values in the 'Name' column contain additional spaces before and after the prefixes, so the exact match is not found.
To overcome this issue, you can use regular expressions (regex) from the re
module to remove the prefixes from the 'Name' column:
import re
import pandas as pd
data = {
'ID': [101, 102, 103, 104, 105],
'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}
df = pd.DataFrame(data)
df['Name'] = df['Name'].apply(lambda x: re.sub(r'\b(?:DR\.|MRS\.)\s*', '', x))
print(df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论