Python pandas 删除列,如果它们的部分名称在列表或 pandas 列中。

huangapple go评论87阅读模式
英文:

Python pandas drop columns if their partial name is in a list or column in pandas

问题

thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
df.drop(thisFilter.columns, axis=1, inplace=True)
thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
df.drop(thisFilter.columns, axis=1, inplace=True)
thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
df.drop(thisFilter.columns, axis=1, inplace=True)
thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
df.drop(thisFilter.columns, axis=1, inplace=True)
thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
df.drop(thisFilter.columns, axis=1, inplace=True)
thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
df.drop(thisFilter.columns, axis=1, inplace=True)
英文:

I have the following dataframe called dropthese.

     | partname        | x1 | x2 | x3....
0      text1_mid1
1      another1_mid2
2      yet_another

And another dataframe called df that looks like this.

     text1_mid1_suffix1 | text1_mid1_suffix2 | ... | something_else | another1_mid2_suffix1 | ....
0       .....
1       .....
2       .....
3       .....

I want to drop all the columns from df, if a part of the name is in dropthese['partname'].

So for example, since text1_mid1 is in partname, all columns that contain that partial string should be dropped like text1_mid1_suffix1 and text1_mid1_suffix2.

I have tried,

thisFilter = df.filter(dropthese.partname, regex=True)
df.drop(thisFilter, axis=1)

But I get this error, TypeError: Keyword arguments `items`, `like`, or `regex` are mutually exclusive. What is the proper way to do this filter?

答案1

得分: 3

我会使用正则表达式与 str.contains(或 str.match 如果您想限制在字符串的开头)一起使用:

import re
pattern = '|'.join(dropthese['partname'].map(re.escape))

out = df.loc[:, ~df.columns.str.contains(f'({pattern})')]

输出:

   something_else
0             ...

为什么您的命令失败了

您应该将模式传递给 filterregex 参数,并在 drop 中使用列名:

pattern = '|'.join(dropthese['partname'].map(re.escape))
thisFilter = df.filter(regex=pattern)
df.drop(thisFilter.columns, axis=1)
英文:

I would use a regex with str.contains (or str.match if you want to restrict to the start of string):

import re
pattern = '|'.join(dropthese['partname'].map(re.escape))

out = df.loc[:, ~df.columns.str.contains(f'({pattern})')]

Output:

   something_else
0             ...

Why your command failed

you should pass the pattern to the regex parameter of filter, and use the column names in drop:

pattern = '|'.join(dropthese['partname'].map(re.escape))
thisFilter = df.filter(regex=pattern)
df.drop(thisFilter.columns, axis=1)

huangapple
  • 本文由 发表于 2023年1月9日 02:48:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75050435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定