英文:
Using Regex groups to rename columns in a pandas dataframe by matching multiple patterns at a time
问题
我有一个名为df
的数据框,其中包含以下列的列表:
cols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population', 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']
出于自然的原因,我想按照以下方式压缩列名:
- 将任何出现的
population
改为pop
- 将
male
改为m_
- 将
female
改为f_
- 将
working
改为w
- 将
in the age group 0 to 6 years
改为_minor
请注意,空格也包括在模式中。
此Stack Overflow讨论是起点,其中要求仅通过匹配单个模式来去除方括号。
我的目标是为多个模式获得多个匹配项。
非常感谢任何形式的帮助!
PS:这是我第一次来这里!
英文:
I have the following list of columns of a data frame df
cols_2 = ['state', 'population', 'male population', 'female population', 'working population', 'male working population', 'female working population' 'female population in the age group 0 to 6 years', 'male population in the age group 0 to 6 years', 'population in the age group 0 to 6 years']
For natural reasons I would like to compress names as follows:
Any occurrences of population
to pop
, male
to m_
, female
to f_
, working
to w
and in the age group 0 to 6 years
to _minor
Please note the spaces being included in the pattern
This Stack overflow Discussion is the starting point, where the requirement is only getting rid off square brackets by matching to a single pattern.
My aim is to obtain multiple matches for multiple patterns
Really appreciate any kind of help!
PS: This is my first time here!
答案1
得分: 0
以下是您要翻译的代码部分:
import re
new_cols = []
pat = re.compile(r"(?:fe)?male\s*|population|working\s*")
for col in cols_2:
new_col = pat.sub(
lambda g: f"{g[0][0]}_" if g[0][0] in "fmw" else f"{g[0][:3]}", col
)
new_col = new_col.replace(" in the age group 0 to 6 years", "_minor")
new_cols.append(new_col)
print(new_cols)
# to replace the columns in DataFrame then:
# df.columns = new_cols
Prints:
[
"state",
"pop",
"m_pop",
"f_pop",
"w_pop",
"m_w_pop",
"f_w_pop",
"f_pop_minor",
"m_pop_minor",
"pop_minor",
]
英文:
You can try:
import re
new_cols = []
pat = re.compile(r"(?:fe)?male\s*|population|working\s*")
for col in cols_2:
new_col = pat.sub(
lambda g: f"{g[0][0]}_" if g[0][0] in "fmw" else f"{g[0][:3]}", col
)
new_col = new_col.replace(" in the age group 0 to 6 years", "_minor")
new_cols.append(new_col)
print(new_cols)
# to replace the columns in DataFrame then:
# df.columns = new_cols
Prints:
[
"state",
"pop",
"m_pop",
"f_pop",
"w_pop",
"m_w_pop",
"f_w_pop",
"f_pop_minor",
"m_pop_minor",
"pop_minor",
]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论