Pandas中用循环进行多列筛选的函数

huangapple go评论88阅读模式
英文:

Function for multi column filtering in Pandas with a loop

问题

Here is the translated code:

假设我有以下Pandas数据框

df = DataFrame({'A': [True, True, False], 'B': [1, 1, 2], 'C': [3, 4, 5]})

|        A |        B |        C |
| -------- | -------- | -------- |
| True     | 1        | 3        |
| True     | 1        | 4        |
| False    | 2        | 5        |

我想要编写一个函数该函数将列名列表和它们对应的值作为输入并返回筛选后的列表例如

def pandas_filter(df, columns_list, values_list):
    return df.loc[df[columns_list] == values_list]

继续使用上面的示例当我编写以下代码时

result = pandas_filter(df=df, columns_list=['A', 'B'], values_list=[True, 1])

我希望得到以下结果

|        A |        B |        C |
| -------- | -------- | -------- |
| True     | 1        | 3        |
| True     | 1        | 4        |

def pandas_filter(df, columns_list, values_list):
    return df.loc[df[columns_list] == values_list]

Regarding the issue with the ValueError, it's caused by using a list for filtering. You can modify the code like this to avoid the error:

def pandas_filter(df, columns_list, values_list):
    filter_mask = (df[columns_list] == values_list).all(axis=1)
    return df[filter_mask]

This modified function should work correctly.

英文:

Suppose I have the following Pandas dataframe:

df = DataFrame({'A' : [True, True, False], 'B' : [1, 1, 2], 'C' : [3, 4, 5]})

|        A |        B |        C |
| -------- | -------- | -------- |
| True     | 1        | 3        |
| True     | 1        | 4        |
| False    | 2        | 5        |

I want write a function that will give a list of columns and their corresponding values as inputs and it will return the filtered list. For example,

def pandas_filter(df, columns_list, values_list):
    return df.loc[df[columns_list] == values_list]

Continuing on the example, when I write the following code

result = pandas_filter(df=df, columns_list=[A, B], values_list=[True, 1]) 

I want to get the following result

|        A |        B |        C |
| -------- | -------- | -------- |
| True     | 1        | 3        |
| True     | 1        | 4        |
def pandas_filter(df, columns_list, values_list):
    return df.loc[df[columns_list] == values_list]

This function returns ValueError("Cannot index with multidimensional key")

答案1

得分: 2

你只需将eq-comparison (==) 与 all 链接起来,形成一个mask

def pandas_filter(df, columns_list, values_list):
    return df.loc[
        (df[columns_list] == values_list).all(axis=1) # <-- add it here
    ]

result = pandas_filter(df=df, columns_list=["A", "B"], values_list=[True, 1])

输出:

print(result)

      A  B  C
0  True  1  3
1  True  1  4

中间结果:

>>> df[columns_list] == values_list
       A      B
0   True   True
1   True   True
2  False  False

>>> (df[columns_list] == values_list).all(axis=1)
0     True
1     True
2    False
dtype: bool
英文:

You just need to chain your eq-comparison (==) with all to form a mask :

def pandas_filter(df, columns_list, values_list):
    return df.loc[
        (df[columns_list] == values_list).all(axis=1) # <-- add it here
    ]

result = pandas_filter(df=df, columns_list=["A", "B"], values_list=[True, 1])

Output :

print(result)

      A  B  C
0  True  1  3
1  True  1  4

Intermediates :

>>> df[columns_list] == values_list
       A      B
0   True   True
1   True   True
2  False  False

>>> (df[columns_list] == values_list).all(axis=1)
0     True
1     True
2    False
dtype: bool

答案2

得分: 0

你的代码中有一个小错误,因为你需要将列名作为字符串指定在columns_list中。此外,==运算符不适用于值列表。你可以使用isin()方法,它允许你检查列是否包含列表中的某个值。

def pandas_filter(df, columns_list, values_list):
    conditions = pd.Series(True, index=df.index)
    for col, val in zip(columns_list, values_list):
        conditions = conditions & df[col].isin([val])
    return df.loc[conditions]

isin()方法检查每个列中的值是否包含在相应的值列表中。

英文:

There is a small error in your code because you need to specify column names in columns_list as strings. Also, the == operator does not work with a list of values. You can use the isin() method which allows you to check if a column contains one of the values ​​in the list.

def pandas_filter(df, columns_list, values_list):
    conditions = pd.Series(True, index=df.index)
    for col, val in zip(columns_list, values_list):
        conditions = conditions & df[col].isin([val])
    return df.loc[conditions]

The isin() method checks whether each value in a column is contained in the corresponding list of values.

huangapple
  • 本文由 发表于 2023年6月22日 00:55:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525558.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定