2023年5月25日 05:29:15go评论95阅读模式

英文:

Using Regex groups to rename columns in a pandas dataframe by matching multiple patterns at a time

问题

我有一个名为df的数据框，其中包含以下列的列表：

cols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population', 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']

出于自然的原因，我想按照以下方式压缩列名：

将任何出现的 population 改为 pop
将 male 改为 m_
将 female 改为 f_
将 working 改为 w
将 in the age group 0 to 6 years 改为 _minor

请注意，空格也包括在模式中。

此Stack Overflow讨论是起点，其中要求仅通过匹配单个模式来去除方括号。

我的目标是为多个模式获得多个匹配项。

非常感谢任何形式的帮助！

PS：这是我第一次来这里！

英文:

I have the following list of columns of a data frame df
cols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population' 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']

For natural reasons I would like to compress names as follows:
Any occurrences of population to pop, male to m_, female to f_, working to w and in the age group 0 to 6 years to _minor

Please note the spaces being included in the pattern

This Stack overflow Discussion is the starting point, where the requirement is only getting rid off square brackets by matching to a single pattern.

My aim is to obtain multiple matches for multiple patterns

Really appreciate any kind of help!

PS: This is my first time here!

答案1

得分: 0

以下是您要翻译的代码部分：

import re

new_cols = []

pat = re.compile(r"(?:fe)?male\s*|population|working\s*")
for col in cols_2:
    new_col = pat.sub(
        lambda g: f"{g[0][0]}_" if g[0][0] in "fmw" else f"{g[0][:3]}", col
    )
    new_col = new_col.replace(" in the age group 0 to 6 years", "_minor")
    new_cols.append(new_col)

print(new_cols)

# to replace the columns in DataFrame then:
# df.columns = new_cols

Prints:

[
    "state",
    "pop",
    "m_pop",
    "f_pop",
    "w_pop",
    "m_w_pop",
    "f_w_pop",
    "f_pop_minor",
    "m_pop_minor",
    "pop_minor",
]

英文:

You can try:

import re

new_cols = []

pat = re.compile(r&quot;(?:fe)?male\s*|population|working\s*&quot;)
for col in cols_2:
    new_col = pat.sub(
        lambda g: f&quot;{g[0][0]}_&quot; if g[0][0] in &quot;fmw&quot; else f&quot;{g[0][:3]}&quot;, col
    )
    new_col = new_col.replace(&quot; in the age group 0 to 6 years&quot;, &quot;_minor&quot;)
    new_cols.append(new_col)

print(new_cols)

# to replace the columns in DataFrame then:
# df.columns = new_cols

Prints:

[
    &quot;state&quot;,
    &quot;pop&quot;,
    &quot;m_pop&quot;,
    &quot;f_pop&quot;,
    &quot;w_pop&quot;,
    &quot;m_w_pop&quot;,
    &quot;f_w_pop&quot;,
    &quot;f_pop_minor&quot;,
    &quot;m_pop_minor&quot;,
    &quot;pop_minor&quot;,
]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

使用正则表达式组来在pandas数据框中通过同时匹配多个模式来重命名列。

问题

答案1

如何在Python中比较两个Excel文件中的列？

how can i split a pandas dataframe by elements on a column?

将Pandas表转换为出现次数

Pandas 多重索引与多个条件

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论