2023年5月17日 22:59:05go评论100阅读模式

英文:

pandas.series.str.replace not working with regex patterns loaded from excel

问题

I am trying to correct values in a large pandas dataset with other values. In order to achieve this I am using regex pattern and replacement values which are saved for convenience in an excel file.

However using pandas.series.str.replace(pattern, replace, regex=True) is not working for me. The values are unchanged.

I have a xls file where one column is a regex pattern and the other the replace value. This file is loaded with pandas read excel into a dataframe df e.g.:

import pandas as pd
import regex as re
df = pd.read_excel("replacments.xlsx", dtype=str)

FIND	REPLACE
\b\s(hello)\b	world
^foo.*	bar

I tried the following code, however NAMES_CLEAN stays the same as NAMES:

for row in df.itertuples():
    data["NAMES_CLEAN"]=data["NAMES"].str.replace(row.FIND,row.REPLACE,regex=True, flags= re.IGNORECASE)

What am I missing?

英文:

I am trying to correct values in a large pandas dataset with other values. In order to achieve this I am using regex pattern and replacement values which are saved for convenience in an excel file.

However using pandas.series.str.replace(pattern, replace, regex=True) is not working for me. The values are unchanged.

I have a xls file where one column is a regex pattern and the other the replace value. This file is loaded with pandas read excel into a dataframe df e.g.:

import pandas as pd
import regex as re
df = pd.read_excel(&quot;replacments.xlsx&quot;,dtype=str)

FIND	REPLACE
\b\s(hello)\b	world
^foo.*	bar

I tried the following code, however NAMES_CLEAN stays the same as NAMES.

for row in df.itertuples():
    data[&quot;NAMES_CLEAN&quot;]=data[&quot;NAMES&quot;].str.replace(row.FIND,row.REPLACE,regex = True, flags= re.IGNORECASE)

What am I missing?

答案1

得分: 0

问题不在于正则表达式，而在于循环。实际上，只有最后一个正则表达式被应用。你想要做的是

data["NAMES_CLEAN"] = data["NAMES"]
for row in df.itertuples():
    data["NAMES_CLEAN"] = data["NAMES_CLEAN"].str.replace(row.FIND, row.REPLACE, regex=True, flags=re.IGNORECASE)

这样，你会用每个连续的正则表达式更新相同的列。

英文:

The issue is not with the regex but with the loop. In essence, only the last regex is being applied. What you want to do is

data[&quot;NAMES_CLEAN&quot;] = data[&quot;NAMES&quot;]
for row in df.itertuples():
    data[&quot;NAMES_CLEAN&quot;] = data[&quot;NAMES_CLEAN&quot;].str.replace(row.FIND,row.REPLACE,regex = True, flags= re.IGNORECASE)

This way, you update the same column with each successive regex.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

pandas.series.str.replace无法使用从Excel加载的正则表达式模式工作。

问题

答案1

在Pandas数据框列中查找元素的索引。

将笨拙格式的Excel数据使用Python转换成表格格式。

使用Google Cloud Run、FastAPI和Meta Whatsapp API时出现的响应速度较慢。

在COCO物体关键点相似性方程中，S代表什么？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。