从Python中的列名中删除前缀

huangapple go评论66阅读模式
英文:

Removing Prefix from column of names in python

问题

Desired Output

ID        Name 
101     ADAM SMITH
102     BEN DAVIS
103     ASHELY JOHNSON
104     CATHY JONES
105     JOHN DOE SMITH 

我需要去掉前缀,我尝试了 df['Name'] = df['Name'].replace(to_replace = 'DR. ', value = ''),我对所有前缀重复了相同的代码,但是什么都没有发生。有什么原因吗?

谢谢你提前帮助。

英文:

I have this dataset

ID      Name     
101    DR. ADAM SMITH
102    BEN DAVIS
103    MRS. ASHELY JOHNSON
104    DR. CATHY JONES 
105    JOHN DOE SMITH

Desired Output

ID        Name 
101     ADAM SMITH
102     BEN DAVIS
103     ASHELY JOHNSON
104     CATHY JONES
105     JOHN DOE SMITH 

I need to get rid of the prefix I tried df['Name'] = df['Name'].replace(to_replace = 'DR. ', value = '')I repeated the same code for all prefixes, but I have when I do it nothing happens. Any reason for this?

Thank you in advance.

答案1

得分: 2

使用正则表达式来匹配以.结尾的第一个单词。

df['Name'] = df['Name'].str.replace(r'^[A-Z]+\.\s+', '', regex=True)
英文:

Use a regular expression to match the first word if it ends with ..

df['Name'] = df['Name'].str.replace(r'^[A-Z]+\.\s+', '', regex=True)

答案2

得分: 1

# 示例数据
data = {
    'ID': [101, 102, 103, 104, 105],
    'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}

# 创建一个数据框
df = pd.DataFrame(data)

# 从姓名中删除前缀的函数
def remove_prefix(name):
    prefixes = ['DR.', 'MRS.', 'MR.', 'MS.']  # 如有需要,可以添加更多前缀
    for prefix in prefixes:
        if name.startswith(prefix):
            return name[len(prefix)+1:]
    return name

# 将函数应用于“Name”列
df['Name'] = df['Name'].apply(remove_prefix)

# 打印修改后的数据框
print(df)
英文:

'''
# Sample data
data = {
'ID': [101, 102, 103, 104, 105],
'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}

# Create a DataFrame
df = pd.DataFrame(data)

# Function to remove prefixes from names
def remove_prefix(name):
    prefixes = ['DR.', 'MRS.', 'MR.', 'MS.']  # Add more prefixes if needed
    for prefix in prefixes:
        if name.startswith(prefix):
            return name[len(prefix)+1:]
    return name

# Apply the function to the 'Name' column
df['Name'] = df['Name'].apply(remove_prefix)

# Print the modified DataFrame
print(df)

'''

答案3

得分: 0

你可以使用正则表达式来替换字符串的部分。例如:

df['Name'] = df['Name'].str.replace(r'^(?:DR|MRS?)\.\s*', '', regex=True)
print(df)

输出:

    ID            Name
0  101      ADAM SMITH
1  102       BEN DAVIS
2  103  ASHELY JOHNSON
3  104     CATHY JONES
4  105  JOHN DOE SMITH

注意:.replace(r'DR. ', '') 试图替换整个 DR. 为一个空字符串,而不仅仅是字符串的一部分。

英文:

You can use use regular expression to replace the part of string. For example:

df['Name'] = df['Name'].str.replace(r'^(?:DR|MRS?)\.\s*', '', regex=True)
print(df)

Prints:

    ID            Name
0  101      ADAM SMITH
1  102       BEN DAVIS
2  103  ASHELY JOHNSON
3  104     CATHY JONES
4  105  JOHN DOE SMITH

Note: .replace(r'DR. ', '') is trying to replace the whole DR. with empty string, not only the part of string.

答案4

得分: 0

import re

name = "DR. ADAM SMITH"
print(re.sub(r".*\.\s", "", name)) # ADAM SMITH 

这个表达式匹配以句号和空格结尾的任何内容,并应该匹配大多数前缀("DR."、"MRS."、"MR." 等)。您可以像这样将其集成到您的代码中:

  1. 在代码顶部添加 import re 行。

  2. 使用以下行代替 df['Name'] = df['Name'].replace(to_replace='DR. ', value='')

df['Name'] = re.sub(r".*\.\s", "", df['Name'])

有关正则表达式的更多信息,请参阅:https://www.w3schools.com/python/python_regex.asp


<details>
<summary>英文:</summary>

**Use a Regular Expression:**

import re

name = "DR. ADAM SMITH"
print(re.sub(r".*.\s", "", name)) # ADAM SMITH


This expression matches anything that ends with a period and a space, and should match most prefixes (&quot;DR.&quot;, &quot;MRS.&quot;, &quot;MR.&quot;, etc). You can integrate it into your code like this:

1. Add the line `import re` at the top of your code.

2. Use the line `df[&#39;Name&#39;] = re.sub(r&quot;.*\.\s&quot;, &quot;&quot;, df[&#39;Name&#39;])` instead of `df[&#39;Name&#39;] = df[&#39;Name&#39;].replace(to_replace = &#39;DR. &#39;, value = &#39;&#39;)`

For more on regular expressions see: https://www.w3schools.com/python/python_regex.asp

</details>



# 答案5
**得分**: 0

你差点就做到了。你需要添加 .str:

    df['Name'] = df['Name'].str.replace('DR. ', '')

<details>
<summary>英文:</summary>

You were nearly there. You needed to add .str:

    df[&#39;Name&#39;] = df[&#39;Name&#39;].str.replace(&#39;DR. &#39;, &#39;&#39;)

</details>



# 答案6
**得分**: 0

`replace()` 函数没有生效的原因是它将输入视为字面字符串并寻找精确匹配。

在你的情况下,'Name' 列中的值在前缀之前和之后包含额外的空格,因此无法找到精确匹配。

为了解决这个问题,你可以使用 `re` 模块的正则表达式 (regex) 来从 'Name' 列中移除前缀:

```python
import re
import pandas as pd

data = {
    'ID': [101, 102, 103, 104, 105],
    'Name': ['DR. ADAM SMITH', 'BEN DAVIS', 'MRS. ASHELY JOHNSON', 'DR. CATHY JONES', 'JOHN DOE SMITH']
}

df = pd.DataFrame(data)
df['Name'] = df['Name'].apply(lambda x: re.sub(r'\b(?:DR\.|MRS\.)\s*', '', x))

print(df)
英文:

The reason nothing happens when you use the replace() function is because it treats the input as a literal string and looks for an exact match.

In your case, the values in the 'Name' column contain additional spaces before and after the prefixes, so the exact match is not found.

To overcome this issue, you can use regular expressions (regex) from the re module to remove the prefixes from the 'Name' column:

import re
import pandas as pd

data = {
    &#39;ID&#39;: [101, 102, 103, 104, 105],
    &#39;Name&#39;: [&#39;DR. ADAM SMITH&#39;, &#39;BEN DAVIS&#39;, &#39;MRS. ASHELY JOHNSON&#39;, &#39;DR. CATHY JONES&#39;, &#39;JOHN DOE SMITH&#39;]
}

df = pd.DataFrame(data)
df[&#39;Name&#39;] = df[&#39;Name&#39;].apply(lambda x: re.sub(r&#39;\b(?:DR\.|MRS\.)\s*&#39;, &#39;&#39;, x))

print(df)

huangapple
  • 本文由 发表于 2023年6月2日 04:11:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76385383.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定