Python pandas 删除列,如果它们的部分名称在列表或 pandas 列中。

huangapple go评论116阅读模式
英文:

Python pandas drop columns if their partial name is in a list or column in pandas

问题

  1. thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
  2. df.drop(thisFilter.columns, axis=1, inplace=True)
  1. thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
  2. df.drop(thisFilter.columns, axis=1, inplace=True)
  1. thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
  2. df.drop(thisFilter.columns, axis=1, inplace=True)
  1. thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
  2. df.drop(thisFilter.columns, axis=1, inplace=True)
  1. thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
  2. df.drop(thisFilter.columns, axis=1, inplace=True)
  1. thisFilter = df.filter(like=dropthese['partname'].iloc[0], axis=1)
  2. df.drop(thisFilter.columns, axis=1, inplace=True)
英文:

I have the following dataframe called dropthese.

  1. | partname | x1 | x2 | x3....
  2. 0 text1_mid1
  3. 1 another1_mid2
  4. 2 yet_another

And another dataframe called df that looks like this.

  1. text1_mid1_suffix1 | text1_mid1_suffix2 | ... | something_else | another1_mid2_suffix1 | ....
  2. 0 .....
  3. 1 .....
  4. 2 .....
  5. 3 .....

I want to drop all the columns from df, if a part of the name is in dropthese['partname'].

So for example, since text1_mid1 is in partname, all columns that contain that partial string should be dropped like text1_mid1_suffix1 and text1_mid1_suffix2.

I have tried,

  1. thisFilter = df.filter(dropthese.partname, regex=True)
  2. df.drop(thisFilter, axis=1)

But I get this error, TypeError: Keyword arguments `items`, `like`, or `regex` are mutually exclusive. What is the proper way to do this filter?

答案1

得分: 3

我会使用正则表达式与 str.contains(或 str.match 如果您想限制在字符串的开头)一起使用:

  1. import re
  2. pattern = '|'.join(dropthese['partname'].map(re.escape))
  3. out = df.loc[:, ~df.columns.str.contains(f'({pattern})')]

输出:

  1. something_else
  2. 0 ...

为什么您的命令失败了

您应该将模式传递给 filterregex 参数,并在 drop 中使用列名:

  1. pattern = '|'.join(dropthese['partname'].map(re.escape))
  2. thisFilter = df.filter(regex=pattern)
  3. df.drop(thisFilter.columns, axis=1)
英文:

I would use a regex with str.contains (or str.match if you want to restrict to the start of string):

  1. import re
  2. pattern = '|'.join(dropthese['partname'].map(re.escape))
  3. out = df.loc[:, ~df.columns.str.contains(f'({pattern})')]

Output:

  1. something_else
  2. 0 ...

Why your command failed

you should pass the pattern to the regex parameter of filter, and use the column names in drop:

  1. pattern = '|'.join(dropthese['partname'].map(re.escape))
  2. thisFilter = df.filter(regex=pattern)
  3. df.drop(thisFilter.columns, axis=1)

huangapple
  • 本文由 发表于 2023年1月9日 02:48:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75050435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定