如何在Python的DataFrame中删除具有出现在多个列中的特定字符串的记录?

huangapple go评论60阅读模式
英文:

How to drop records having a particular string which is present in multiple columns in a DataFrame in python?

问题

我有一个包含许多行和列的巨大数据框。某些列包含字符串'Varies with device'
我想删除包含这个特定字符串的记录。我知道如何在特定列中删除包含特定字符串的记录。但是我想知道当该特定字符串出现在多列中时应该如何操作。

我尝试过

df2 = df1[df1[['Size', 'Current Ver', 'Android Ve']].apply(lambda x: x.str.contains('Varies with device')).any(axis=1) == False]

对于上述输入,我得到了错误。

英文:

I have a huge data frame with many rows and columns. Some columns contain a string 'Varies with device'.
I want to drop those records with this particular string. I know how to drop if a particular string is present in a particular column.. but I want to know a method when that particular string is present in multiple columns

I tried

df2=df1[df1['Size','Current Ver','Android Ve'].str.contains('Varies with device')==False]

I got error for above input

答案1

得分: 1

以下是代码的中文翻译部分:

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)

例如:

# 定义数据
data = {
    'Name': ['Varies with device', 'Anna', 'Peter', 'Linda', 'Varies with device Bar'],
    'Age': [23, 'Varies with device foo', 35, 'Foo Varies with device', 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 120000, 90000, 110000]
}

# 创建数据框
df = pd.DataFrame(data)
print(df)
# 输出:
#
#                      Name                     Age         City  Salary
# 0      Varies with device                      23     New York   70000
# 1                    Anna  Varies with device foo  Los Angeles   80000
# 2                   Peter                      35      Chicago  120000
# 3                   Linda  Foo Varies with device      Houston   90000
# 4  Varies with device Bar                      29      Phoenix  110000

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)
print(df2)
# 输出:
#
#     Name Age     City  Salary
# 2  Peter  35  Chicago  120000

或者:

df2 = df[
    ~df.apply(
        lambda row: row.apply(
            lambda value: True if 'Varies with device' in str(value) else False
        ).any(), axis=1
    )
]

或者:

df[~df.select_dtypes(object).apply(lambda row: row.str.contains('Varies with device').any(), axis=1)]

请注意,这是代码的翻译,没有其他内容。

英文:

Here's a possible solution:

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)

For example:

# Define the data
data = {
    'Name': ['Varies with device', 'Anna', 'Peter', 'Linda', 'Varies with device Bar'],
    'Age': [23, 'Varies with device foo', 35, 'Foo Varies with device', 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 120000, 90000, 110000]
}

# Create the DataFrame
df = pd.DataFrame(data)
print(df)
# Prints:
#
#                      Name                     Age         City  Salary
# 0      Varies with device                      23     New York   70000
# 1                    Anna  Varies with device foo  Los Angeles   80000
# 2                   Peter                      35      Chicago  120000
# 3                   Linda  Foo Varies with device      Houston   90000
# 4  Varies with device Bar                      29      Phoenix  110000

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)
print(df2)
# Prints:
#
#     Name Age     City  Salary
# 2  Peter  35  Chicago  120000

Or:

df2 = df[
    ~df.apply(
        lambda row: row.apply(
            lambda value: True if 'Varies with device' in str(value) else False
        ).any(), axis=1
    )
]

Or:

df[~df.select_dtypes(object).apply(lambda row: row.str.contains('Varies with device').any(), axis=1)]

huangapple
  • 本文由 发表于 2023年6月13日 08:46:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76461081.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定