2023年6月13日 08:46:21go评论60阅读模式

英文:

How to drop records having a particular string which is present in multiple columns in a DataFrame in python?

问题

我有一个包含许多行和列的巨大数据框。某些列包含字符串'Varies with device'。
我想删除包含这个特定字符串的记录。我知道如何在特定列中删除包含特定字符串的记录。但是我想知道当该特定字符串出现在多列中时应该如何操作。

我尝试过

df2 = df1[df1[['Size', 'Current Ver', 'Android Ve']].apply(lambda x: x.str.contains('Varies with device')).any(axis=1) == False]

对于上述输入，我得到了错误。

英文:

I have a huge data frame with many rows and columns. Some columns contain a string 'Varies with device'.
I want to drop those records with this particular string. I know how to drop if a particular string is present in a particular column.. but I want to know a method when that particular string is present in multiple columns

I tried

df2=df1[df1[&#39;Size&#39;,&#39;Current Ver&#39;,&#39;Android Ve&#39;].str.contains(&#39;Varies with device&#39;)==False]

I got error for above input

答案1

得分: 1

以下是代码的中文翻译部分：

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)

例如:

# 定义数据
data = {
    'Name': ['Varies with device', 'Anna', 'Peter', 'Linda', 'Varies with device Bar'],
    'Age': [23, 'Varies with device foo', 35, 'Foo Varies with device', 29],
    'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
    'Salary': [70000, 80000, 120000, 90000, 110000]
}

# 创建数据框
df = pd.DataFrame(data)
print(df)
# 输出:
#
#                      Name                     Age         City  Salary
# 0      Varies with device                      23     New York   70000
# 1                    Anna  Varies with device foo  Los Angeles   80000
# 2                   Peter                      35      Chicago  120000
# 3                   Linda  Foo Varies with device      Houston   90000
# 4  Varies with device Bar                      29      Phoenix  110000

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if 'Varies with device' in str(value) else False
            ).any(), axis=1
        )
    ].index
)
print(df2)
# 输出:
#
#     Name Age     City  Salary
# 2  Peter  35  Chicago  120000

或者:

df2 = df[
    ~df.apply(
        lambda row: row.apply(
            lambda value: True if 'Varies with device' in str(value) else False
        ).any(), axis=1
    )
]

或者:

df[~df.select_dtypes(object).apply(lambda row: row.str.contains('Varies with device').any(), axis=1)]

请注意，这是代码的翻译，没有其他内容。

英文:

Here's a possible solution:

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if &#39;Varies with device&#39; in str(value) else False
            ).any(), axis=1
        )
    ].index
)

For example:

# Define the data
data = {
    &#39;Name&#39;: [&#39;Varies with device&#39;, &#39;Anna&#39;, &#39;Peter&#39;, &#39;Linda&#39;, &#39;Varies with device Bar&#39;],
    &#39;Age&#39;: [23, &#39;Varies with device foo&#39;, 35, &#39;Foo Varies with device&#39;, 29],
    &#39;City&#39;: [&#39;New York&#39;, &#39;Los Angeles&#39;, &#39;Chicago&#39;, &#39;Houston&#39;, &#39;Phoenix&#39;],
    &#39;Salary&#39;: [70000, 80000, 120000, 90000, 110000]
}

# Create the DataFrame
df = pd.DataFrame(data)
print(df)
# Prints:
#
#                      Name                     Age         City  Salary
# 0      Varies with device                      23     New York   70000
# 1                    Anna  Varies with device foo  Los Angeles   80000
# 2                   Peter                      35      Chicago  120000
# 3                   Linda  Foo Varies with device      Houston   90000
# 4  Varies with device Bar                      29      Phoenix  110000

df2 = df.drop(
    df[
        df.apply(
            lambda row: row.apply(
                lambda value: True if &#39;Varies with device&#39; in str(value) else False
            ).any(), axis=1
        )
    ].index
)
print(df2)
# Prints:
#
#     Name Age     City  Salary
# 2  Peter  35  Chicago  120000

Or:

df2 = df[
    ~df.apply(
        lambda row: row.apply(
            lambda value: True if &#39;Varies with device&#39; in str(value) else False
        ).any(), axis=1
    )
]

Or:

df[~df.select_dtypes(object).apply(lambda row: row.str.contains(&#39;Varies with device&#39;).any(), axis=1)]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Python的DataFrame中删除具有出现在多个列中的特定字符串的记录？

问题

答案1

无法在DataFrame列中的字符出现时将其转换为数字。

Module Not Found Error For Custom Pypi Package

生成多元回归中交互项的Pandas截距乘积。

Python while循环返回第N个字母

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论