英文:
How to drop records having a particular string which is present in multiple columns in a DataFrame in python?
问题
我有一个包含许多行和列的巨大数据框。某些列包含字符串'Varies with device'。
我想删除包含这个特定字符串的记录。我知道如何在特定列中删除包含特定字符串的记录。但是我想知道当该特定字符串出现在多列中时应该如何操作。
我尝试过
df2 = df1[df1[['Size', 'Current Ver', 'Android Ve']].apply(lambda x: x.str.contains('Varies with device')).any(axis=1) == False]
对于上述输入,我得到了错误。
英文:
I have a huge data frame with many rows and columns. Some columns contain a string 'Varies with device'.
I want to drop those records with this particular string. I know how to drop if a particular string is present in a particular column.. but I want to know a method when that particular string is present in multiple columns
I tried
df2=df1[df1['Size','Current Ver','Android Ve'].str.contains('Varies with device')==False]
I got error for above input
答案1
得分: 1
以下是代码的中文翻译部分:
df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)
例如:
# 定义数据
data = {
    'Name': ['Varies with device', 'Anna', 'Peter', 'Linda', 'Varies with device Bar'],
    'Age': [23, 'Varies with device foo', 35, 'Foo Varies with device', 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 120000, 90000, 110000]
}
# 创建数据框
df = pd.DataFrame(data)
print(df)
# 输出:
#
#                      Name                     Age         City  Salary
# 0      Varies with device                      23     New York   70000
# 1                    Anna  Varies with device foo  Los Angeles   80000
# 2                   Peter                      35      Chicago  120000
# 3                   Linda  Foo Varies with device      Houston   90000
# 4  Varies with device Bar                      29      Phoenix  110000
df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)
print(df2)
# 输出:
#
#     Name Age     City  Salary
# 2  Peter  35  Chicago  120000
或者:
df2 = df[
    ~df.apply(
        lambda row: row.apply(
            lambda value: True if 'Varies with device' in str(value) else False
        ).any(), axis=1
    )
]
或者:
df[~df.select_dtypes(object).apply(lambda row: row.str.contains('Varies with device').any(), axis=1)]
请注意,这是代码的翻译,没有其他内容。
英文:
Here's a possible solution:
df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)
For example:
# Define the data
data = {
    'Name': ['Varies with device', 'Anna', 'Peter', 'Linda', 'Varies with device Bar'],
    'Age': [23, 'Varies with device foo', 35, 'Foo Varies with device', 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 120000, 90000, 110000]
}
# Create the DataFrame
df = pd.DataFrame(data)
print(df)
# Prints:
#
#                      Name                     Age         City  Salary
# 0      Varies with device                      23     New York   70000
# 1                    Anna  Varies with device foo  Los Angeles   80000
# 2                   Peter                      35      Chicago  120000
# 3                   Linda  Foo Varies with device      Houston   90000
# 4  Varies with device Bar                      29      Phoenix  110000
df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)
print(df2)
# Prints:
#
#     Name Age     City  Salary
# 2  Peter  35  Chicago  120000
Or:
df2 = df[
    ~df.apply(
        lambda row: row.apply(
            lambda value: True if 'Varies with device' in str(value) else False
        ).any(), axis=1
    )
]
Or:
df[~df.select_dtypes(object).apply(lambda row: row.str.contains('Varies with device').any(), axis=1)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论