使用正则表达式组来在pandas数据框中通过同时匹配多个模式来重命名列。

huangapple go评论89阅读模式
英文:

Using Regex groups to rename columns in a pandas dataframe by matching multiple patterns at a time

问题

我有一个名为df的数据框,其中包含以下列的列表:

cols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population', 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']

出于自然的原因,我想按照以下方式压缩列名:

  • 将任何出现的 population 改为 pop
  • male 改为 m_
  • female 改为 f_
  • working 改为 w
  • in the age group 0 to 6 years 改为 _minor

请注意,空格也包括在模式中。

Stack Overflow讨论是起点,其中要求仅通过匹配单个模式来去除方括号。

我的目标是为多个模式获得多个匹配项。

非常感谢任何形式的帮助!

PS:这是我第一次来这里!

英文:

I have the following list of columns of a data frame df
cols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population' 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']

For natural reasons I would like to compress names as follows:
Any occurrences of population to pop, male to m_, female to f_, working to w and in the age group 0 to 6 years to _minor

Please note the spaces being included in the pattern

This Stack overflow Discussion is the starting point, where the requirement is only getting rid off square brackets by matching to a single pattern.

My aim is to obtain multiple matches for multiple patterns

Really appreciate any kind of help!

PS: This is my first time here!

答案1

得分: 0

以下是您要翻译的代码部分:

import re

new_cols = []

pat = re.compile(r"(?:fe)?male\s*|population|working\s*")
for col in cols_2:
    new_col = pat.sub(
        lambda g: f"{g[0][0]}_" if g[0][0] in "fmw" else f"{g[0][:3]}", col
    )
    new_col = new_col.replace(" in the age group 0 to 6 years", "_minor")
    new_cols.append(new_col)

print(new_cols)

# to replace the columns in DataFrame then:
# df.columns = new_cols

Prints:

[
    "state",
    "pop",
    "m_pop",
    "f_pop",
    "w_pop",
    "m_w_pop",
    "f_w_pop",
    "f_pop_minor",
    "m_pop_minor",
    "pop_minor",
]
英文:

You can try:

import re

new_cols = []

pat = re.compile(r"(?:fe)?male\s*|population|working\s*")
for col in cols_2:
    new_col = pat.sub(
        lambda g: f"{g[0][0]}_" if g[0][0] in "fmw" else f"{g[0][:3]}", col
    )
    new_col = new_col.replace(" in the age group 0 to 6 years", "_minor")
    new_cols.append(new_col)

print(new_cols)

# to replace the columns in DataFrame then:
# df.columns = new_cols

Prints:

[
    "state",
    "pop",
    "m_pop",
    "f_pop",
    "w_pop",
    "m_w_pop",
    "f_w_pop",
    "f_pop_minor",
    "m_pop_minor",
    "pop_minor",
]

huangapple
  • 本文由 发表于 2023年5月25日 05:29:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327517.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定