英文:
pandas.series.str.replace not working with regex patterns loaded from excel
问题
I am trying to correct values in a large pandas dataset with other values. In order to achieve this I am using regex pattern and replacement values which are saved for convenience in an excel file.
However using pandas.series.str.replace(pattern, replace, regex=True)
is not working for me. The values are unchanged.
I have a xls file where one column is a regex pattern and the other the replace value. This file is loaded with pandas read excel into a dataframe df e.g.:
import pandas as pd
import regex as re
df = pd.read_excel("replacments.xlsx", dtype=str)
FIND | REPLACE |
---|---|
\b\s(hello)\b | world |
^foo.* | bar |
I tried the following code, however NAMES_CLEAN stays the same as NAMES:
for row in df.itertuples():
data["NAMES_CLEAN"]=data["NAMES"].str.replace(row.FIND,row.REPLACE,regex=True, flags= re.IGNORECASE)
What am I missing?
英文:
I am trying to correct values in a large pandas dataset with other values. In order to achieve this I am using regex pattern and replacement values which are saved for convenience in an excel file.
However using pandas.series.str.replace(pattern, replace, regex=True)
is not working for me. The values are unchanged.
I have a xls file where one column is a regex pattern and the other the replace value. This file is loaded with pandas read excel into a dataframe df e.g.:
import pandas as pd
import regex as re
df = pd.read_excel("replacments.xlsx",dtype=str)
FIND | REPLACE |
---|---|
\b\s(hello)\b | world |
^foo.* | bar |
I tried the following code, however NAMES_CLEAN stays the same as NAMES.
for row in df.itertuples():
data["NAMES_CLEAN"]=data["NAMES"].str.replace(row.FIND,row.REPLACE,regex = True, flags= re.IGNORECASE)
What am I missing?
答案1
得分: 0
问题不在于正则表达式,而在于循环。实际上,只有最后一个正则表达式被应用。你想要做的是
data["NAMES_CLEAN"] = data["NAMES"]
for row in df.itertuples():
data["NAMES_CLEAN"] = data["NAMES_CLEAN"].str.replace(row.FIND, row.REPLACE, regex=True, flags=re.IGNORECASE)
这样,你会用每个连续的正则表达式更新相同的列。
英文:
The issue is not with the regex but with the loop. In essence, only the last regex is being applied. What you want to do is
data["NAMES_CLEAN"] = data["NAMES"]
for row in df.itertuples():
data["NAMES_CLEAN"] = data["NAMES_CLEAN"].str.replace(row.FIND,row.REPLACE,regex = True, flags= re.IGNORECASE)
This way, you update the same column with each successive regex.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论