pandas.series.str.replace无法使用从Excel加载的正则表达式模式工作。

huangapple go评论69阅读模式
英文:

pandas.series.str.replace not working with regex patterns loaded from excel

问题

I am trying to correct values in a large pandas dataset with other values. In order to achieve this I am using regex pattern and replacement values which are saved for convenience in an excel file.

However using pandas.series.str.replace(pattern, replace, regex=True) is not working for me. The values are unchanged.

I have a xls file where one column is a regex pattern and the other the replace value. This file is loaded with pandas read excel into a dataframe df e.g.:

import pandas as pd
import regex as re

df = pd.read_excel("replacments.xlsx", dtype=str)
FIND REPLACE
\b\s(hello)\b world
^foo.* bar

I tried the following code, however NAMES_CLEAN stays the same as NAMES:

for row in df.itertuples():
    data["NAMES_CLEAN"]=data["NAMES"].str.replace(row.FIND,row.REPLACE,regex=True, flags= re.IGNORECASE)

What am I missing?

英文:

I am trying to correct values in a large pandas dataset with other values. In order to achieve this I am using regex pattern and replacement values which are saved for convenience in an excel file.

However using pandas.series.str.replace(pattern, replace, regex=True) is not working for me. The values are unchanged.

I have a xls file where one column is a regex pattern and the other the replace value. This file is loaded with pandas read excel into a dataframe df e.g.:

import pandas as pd
import regex as re

df = pd.read_excel("replacments.xlsx",dtype=str)
FIND REPLACE
\b\s(hello)\b world
^foo.* bar

I tried the following code, however NAMES_CLEAN stays the same as NAMES.

for row in df.itertuples():
    data["NAMES_CLEAN"]=data["NAMES"].str.replace(row.FIND,row.REPLACE,regex = True, flags= re.IGNORECASE)

What am I missing?

答案1

得分: 0

问题不在于正则表达式,而在于循环。实际上,只有最后一个正则表达式被应用。你想要做的是

data["NAMES_CLEAN"] = data["NAMES"]
for row in df.itertuples():
    data["NAMES_CLEAN"] = data["NAMES_CLEAN"].str.replace(row.FIND, row.REPLACE, regex=True, flags=re.IGNORECASE)

这样,你会用每个连续的正则表达式更新相同的列。

英文:

The issue is not with the regex but with the loop. In essence, only the last regex is being applied. What you want to do is

data["NAMES_CLEAN"] = data["NAMES"]
for row in df.itertuples():
    data["NAMES_CLEAN"] = data["NAMES_CLEAN"].str.replace(row.FIND,row.REPLACE,regex = True, flags= re.IGNORECASE)

This way, you update the same column with each successive regex.

huangapple
  • 本文由 发表于 2023年5月17日 22:59:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76273493.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定