英文:
How to drop records having a particular string which is present in multiple columns in a DataFrame in python?
问题
我有一个包含许多行和列的巨大数据框。某些列包含字符串'Varies with device'
。
我想删除包含这个特定字符串的记录。我知道如何在特定列中删除包含特定字符串的记录。但是我想知道当该特定字符串出现在多列中时应该如何操作。
我尝试过
df2 = df1[df1[['Size', 'Current Ver', 'Android Ve']].apply(lambda x: x.str.contains('Varies with device')).any(axis=1) == False]
对于上述输入,我得到了错误。
英文:
I have a huge data frame with many rows and columns. Some columns contain a string 'Varies with device'
.
I want to drop those records with this particular string. I know how to drop if a particular string is present in a particular column.. but I want to know a method when that particular string is present in multiple columns
I tried
df2=df1[df1['Size','Current Ver','Android Ve'].str.contains('Varies with device')==False]
I got error for above input
答案1
得分: 1
以下是代码的中文翻译部分:
df2 = df.drop(
df[
df.apply(
lambda row: row.apply(
lambda value: True if 'Varies with device' in str(value) else False
).any(), axis=1
)
].index
)
例如:
# 定义数据
data = {
'Name': ['Varies with device', 'Anna', 'Peter', 'Linda', 'Varies with device Bar'],
'Age': [23, 'Varies with device foo', 35, 'Foo Varies with device', 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
'Salary': [70000, 80000, 120000, 90000, 110000]
}
# 创建数据框
df = pd.DataFrame(data)
print(df)
# 输出:
#
# Name Age City Salary
# 0 Varies with device 23 New York 70000
# 1 Anna Varies with device foo Los Angeles 80000
# 2 Peter 35 Chicago 120000
# 3 Linda Foo Varies with device Houston 90000
# 4 Varies with device Bar 29 Phoenix 110000
df2 = df.drop(
df[
df.apply(
lambda row: row.apply(
lambda value: True if 'Varies with device' in str(value) else False
).any(), axis=1
)
].index
)
print(df2)
# 输出:
#
# Name Age City Salary
# 2 Peter 35 Chicago 120000
或者:
df2 = df[
~df.apply(
lambda row: row.apply(
lambda value: True if 'Varies with device' in str(value) else False
).any(), axis=1
)
]
或者:
df[~df.select_dtypes(object).apply(lambda row: row.str.contains('Varies with device').any(), axis=1)]
请注意,这是代码的翻译,没有其他内容。
英文:
Here's a possible solution:
df2 = df.drop(
df[
df.apply(
lambda row: row.apply(
lambda value: True if 'Varies with device' in str(value) else False
).any(), axis=1
)
].index
)
For example:
# Define the data
data = {
'Name': ['Varies with device', 'Anna', 'Peter', 'Linda', 'Varies with device Bar'],
'Age': [23, 'Varies with device foo', 35, 'Foo Varies with device', 29],
'City': ['New York', 'Los Angeles', 'Chicago', 'Houston', 'Phoenix'],
'Salary': [70000, 80000, 120000, 90000, 110000]
}
# Create the DataFrame
df = pd.DataFrame(data)
print(df)
# Prints:
#
# Name Age City Salary
# 0 Varies with device 23 New York 70000
# 1 Anna Varies with device foo Los Angeles 80000
# 2 Peter 35 Chicago 120000
# 3 Linda Foo Varies with device Houston 90000
# 4 Varies with device Bar 29 Phoenix 110000
df2 = df.drop(
df[
df.apply(
lambda row: row.apply(
lambda value: True if 'Varies with device' in str(value) else False
).any(), axis=1
)
].index
)
print(df2)
# Prints:
#
# Name Age City Salary
# 2 Peter 35 Chicago 120000
Or:
df2 = df[
~df.apply(
lambda row: row.apply(
lambda value: True if 'Varies with device' in str(value) else False
).any(), axis=1
)
]
Or:
df[~df.select_dtypes(object).apply(lambda row: row.str.contains('Varies with device').any(), axis=1)]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论