2023年6月2日 04:11:16go评论101阅读模式

英文:

Removing Prefix from column of names in python

问题

Desired Output

ID        Name 
101     ADAM SMITH
102     BEN DAVIS
103     ASHELY JOHNSON
104     CATHY JONES
105     JOHN DOE SMITH

我需要去掉前缀，我尝试了 df['Name'] = df['Name'].replace(to_replace = 'DR. ', value = '')，我对所有前缀重复了相同的代码，但是什么都没有发生。有什么原因吗？

谢谢你提前帮助。

英文:

I have this dataset

ID      Name     
101    DR. ADAM SMITH
102    BEN DAVIS
103    MRS. ASHELY JOHNSON
104    DR. CATHY JONES 
105    JOHN DOE SMITH

Desired Output

ID        Name 
101     ADAM SMITH
102     BEN DAVIS
103     ASHELY JOHNSON
104     CATHY JONES
105     JOHN DOE SMITH

I need to get rid of the prefix I tried df['Name'] = df['Name'].replace(to_replace = 'DR. ', value = '')I repeated the same code for all prefixes, but I have when I do it nothing happens. Any reason for this?

Thank you in advance.

答案1

得分: 2

使用正则表达式来匹配以.结尾的第一个单词。

df['Name'] = df['Name'].str.replace(r'^[A-Z]+\.\s+', '', regex=True)

英文:

Use a regular expression to match the first word if it ends with ..

df[&#39;Name&#39;] = df[&#39;Name&#39;].str.replace(r&#39;^[A-Z]+\.\s+&#39;, &#39;&#39;, regex=True)

答案2

得分: 1

# 示例数据
data = {
    'ID': [101, 102, 103, 104, 105],
    'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}

# 创建一个数据框
df = pd.DataFrame(data)

# 从姓名中删除前缀的函数
def remove_prefix(name):
    prefixes = ['DR.', 'MRS.', 'MR.', 'MS.']  # 如有需要，可以添加更多前缀
    for prefix in prefixes:
        if name.startswith(prefix):
            return name[len(prefix)+1:]
    return name

# 将函数应用于“Name”列
df['Name'] = df['Name'].apply(remove_prefix)

# 打印修改后的数据框
print(df)

英文:

'''
# Sample data
data = {
'ID': [101, 102, 103, 104, 105],
'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Function to remove prefixes from names
def remove_prefix(name):
    prefixes = [&#39;DR.&#39;, &#39;MRS.&#39;, &#39;MR.&#39;, &#39;MS.&#39;]  # Add more prefixes if needed
    for prefix in prefixes:
        if name.startswith(prefix):
            return name[len(prefix)+1:]
    return name

# Apply the function to the &#39;Name&#39; column
df[&#39;Name&#39;] = df[&#39;Name&#39;].apply(remove_prefix)

# Print the modified DataFrame
print(df)

'''

答案3

得分: 0

你可以使用正则表达式来替换字符串的部分。例如：

df['Name'] = df['Name'].str.replace(r'^(?:DR|MRS?)\.\s*', '', regex=True)
print(df)

输出：

    ID            Name
0  101      ADAM SMITH
1  102       BEN DAVIS
2  103  ASHELY JOHNSON
3  104     CATHY JONES
4  105  JOHN DOE SMITH

注意：.replace(r'DR. ', '') 试图替换整个 DR. 为一个空字符串，而不仅仅是字符串的一部分。

英文:

You can use use regular expression to replace the part of string. For example:

df[&#39;Name&#39;] = df[&#39;Name&#39;].str.replace(r&#39;^(?:DR|MRS?)\.\s*&#39;, &#39;&#39;, regex=True)
print(df)

Prints:

    ID            Name
0  101      ADAM SMITH
1  102       BEN DAVIS
2  103  ASHELY JOHNSON
3  104     CATHY JONES
4  105  JOHN DOE SMITH

Note: .replace(r'DR. ', '') is trying to replace the whole DR. with empty string, not only the part of string.

答案4

得分: 0

import re

name = "DR. ADAM SMITH"
print(re.sub(r".*\.\s", "", name)) # ADAM SMITH

这个表达式匹配以句号和空格结尾的任何内容，并应该匹配大多数前缀（"DR."、"MRS."、"MR." 等）。您可以像这样将其集成到您的代码中：

在代码顶部添加 import re 行。
使用以下行代替 df['Name'] = df['Name'].replace(to_replace='DR. ', value='')：

df['Name'] = re.sub(r".*\.\s", "", df['Name'])

有关正则表达式的更多信息，请参阅：https://www.w3schools.com/python/python_regex.asp


<details>
<summary>英文:</summary>

**Use a Regular Expression:**

import re

name = "DR. ADAM SMITH"
print(re.sub(r".*.\s", "", name)) # ADAM SMITH


This expression matches anything that ends with a period and a space, and should match most prefixes (&quot;DR.&quot;, &quot;MRS.&quot;, &quot;MR.&quot;, etc). You can integrate it into your code like this:

1. Add the line `import re` at the top of your code.

2. Use the line `df[&#39;Name&#39;] = re.sub(r&quot;.*\.\s&quot;, &quot;&quot;, df[&#39;Name&#39;])` instead of `df[&#39;Name&#39;] = df[&#39;Name&#39;].replace(to_replace = &#39;DR. &#39;, value = &#39;&#39;)`

For more on regular expressions see: https://www.w3schools.com/python/python_regex.asp

</details>



# 答案5
**得分**: 0

你差点就做到了。你需要添加 .str：

    df['Name'] = df['Name'].str.replace('DR. ', '')

<details>
<summary>英文:</summary>

You were nearly there. You needed to add .str:

    df[&#39;Name&#39;] = df[&#39;Name&#39;].str.replace(&#39;DR. &#39;, &#39;&#39;)

</details>



# 答案6
**得分**: 0

`replace()` 函数没有生效的原因是它将输入视为字面字符串并寻找精确匹配。

在你的情况下，'Name' 列中的值在前缀之前和之后包含额外的空格，因此无法找到精确匹配。

为了解决这个问题，你可以使用 `re` 模块的正则表达式 (regex) 来从 'Name' 列中移除前缀：

```python
import re
import pandas as pd

data = {
    'ID': [101, 102, 103, 104, 105],
    'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}

df = pd.DataFrame(data)
df['Name'] = df['Name'].apply(lambda x: re.sub(r'\b(?:DR\.|MRS\.)\s*', '', x))

print(df)

英文:

The reason nothing happens when you use the replace() function is because it treats the input as a literal string and looks for an exact match.

In your case, the values in the 'Name' column contain additional spaces before and after the prefixes, so the exact match is not found.

To overcome this issue, you can use regular expressions (regex) from the re module to remove the prefixes from the 'Name' column:

import re
import pandas as pd

data = {
    &#39;ID&#39;: [101, 102, 103, 104, 105],
    &#39;Name&#39;: [&#39;DR. ADAM SMITH&#39;, &#39;BEN DAVIS&#39;, &#39;MRS. ASHELY JOHNSON&#39;, &#39;DR. CATHY JONES&#39;, &#39;JOHN DOE SMITH&#39;]
}

df = pd.DataFrame(data)
df[&#39;Name&#39;] = df[&#39;Name&#39;].apply(lambda x: re.sub(r&#39;\b(?:DR\.|MRS\.)\s*&#39;, &#39;&#39;, x))

print(df)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从Python中的列名中删除前缀

问题

答案1

答案2

答案3

答案4

How to extract text from very large XML files in Python without interrupting tags while parsing incrementally?

Intel MKL调用SciPy函数时出错，来自MATLAB。

Date conversion error in DataFrame in pandas, can anyone point why this issue is happening and how to fix it

可以使用Django制作实时时钟吗？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论