英文:
Removing particular element from list when list is present in a cell of dataframe
问题
你可以尝试以下方法来从数据框中的所有列表中移除特定值,而不使用整个数据框的for循环:
import pandas as pd
# 创建一个示例数据框
data = {'TIN': [[1, 2, 'val'], [2, 'val', 4]],
'PIN': [[0, 1], [14, 'val']],
'column': [['val', 't', 'z'], ['b', 'a', 'val']]}
df = pd.DataFrame(data)
# 定义要移除的值
value_to_remove = 'val'
# 使用apply和lambda函数来移除值
df = df.apply(lambda col: col.apply(lambda x: [item for item in x if item != value_to_remove]))
# 打印结果
print(df)
这将移除所有列表中的特定值'value_to_remove',而不需要使用整个数据帧的for循环。
英文:
My dataframe consists of lists in each cell and i want to remove a particular value from all these lists without using for loop on the whole dataframe,
TIN PIN column ......
0 [1,2,val] [0,1] [val, t, z] ......
1 [2,val,4] [14,val] [b, a, val] ......
........
These val are in string format and i want to remove all of these they exist in some of these lists and doesn't exist in others...
I tried using
.apply(lambda x: x.remove('nan'|'NaT')
as the val is either nan or NaT in string format,i.e, 'nan' or 'NaT' instead of being null values.....
It gave me an error saying, besides i think my logic itself was wrong there as i think it was trying to remove x instead of what was specified......
答案1
得分: 2
无法按照你的意图使用 remove
,因为它如果找到值就返回 None
,如果未找到则引发 ValueError
。实现你想要的一种方法是使用嵌套的应用:
df = pd.DataFrame(
{'TIN': {0: [1, 2, 'val'], 1: [2, 'val', 4]},
'PIN': {0: [0, 1], 1: [14, 'val']},
'column': {0: ['val', 't', 'z'], 1: ['b', 'a', 'val']}}
)
df = df.apply(lambda x: x.apply(lambda y: [v for v in y if v != 'val']))
输出:
TIN PIN column
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
性能方面,最好使用 applymap
(大约快35%),如 @LuanNguyen 的答案中所述:
df = df.applymap(lambda y: [v for v in y if v != 'val'])
或者使用要移除的值的集合:
remove = { 'val', 'xyz' }
df = df.applymap(lambda y: [v for v in y if v not in remove])
英文:
You can't use remove
the way you were intending, as it returns None
if the value was found, and raises ValueError
if it wasn't. One way to do what you want would be to use a nested apply:
df = pd.DataFrame(
{'TIN': {0: [1,2,'val'], 1: [2,'val',4]},
'PIN': {0: [0,1], 1: [14,'val']},
'column': {0: ['val', 't', 'z'], 1: ['b', 'a', 'val']}}
)
df = df.apply(lambda x:x.apply(lambda y:[v for v in y if v != 'val']))
Output:
TIN PIN column
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
Note performance-wise it's better to use applymap
(about 35% faster) as described in @LuanNguyen's answer:
df = df.applymap(lambda y:[v for v in y if v != 'val'])
Or using a set of values to remove:
remove = { 'val', 'xyz' }
df = df.applymap(lambda y:[v for v in y if v not in remove])
答案2
得分: 2
也许这正是你需要的,我使用数据框方法 applymap
应用 Lambda 函数于数据框中的所有单元格,并使用 Python filter
函数从变量 nan_set
中删除不需要的元素。
nan_set = {'val'}
print(
df.applymap(lambda arr: list(
filter(lambda element: element not in nan_set, arr)))
)
输入:
TIN PIN COL
0 [1, 2, val] [0, 1] [val, t, z]
1 [2, val, 4] [14, val] [b, a, val]
输出:
TIN PIN COL
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
英文:
Maybe this is what you need, I use the dataframe method applymap
to apply the lambda function for all cells in the dataframe and the Python filter
function to remove the unwanted elements in the variable nan_set
.
nan_set = {'val'}
print(
df.applymap(lambda arr: list(
filter(lambda element: element not in nan_set, arr)))
)
Input:
TIN PIN COL
0 [1, 2, val] [0, 1] [val, t, z]
1 [2, val, 4] [14, val] [b, a, val]
Output:
TIN PIN COL
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
答案3
得分: 1
你可以使用自定义函数进行应用。
(大部分代码可以在一行中完成,但很难理解):
Value2Remove='val'
def RemoveVal(Ls,Val):
if Val in Ls:
Ls.remove(Val)
return Ls
df['TIN'] = df['TIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['PIN'] = df['PIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['column'] = df['column'].apply(lambda x: RemoveVal(x,Value2Remove))
输入:
TIN PIN column
0 [1, 2, val] [0, 1] [val, t, z]
1 [2, val, 4] [14, val] [b, a, val]
输出:
TIN PIN column
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
英文:
You can use the apply with a custom function.
(you can do most of the code in one line, but it is difficult to understand):
Value2Remove='val'
def RemoveVal(Ls,Val):
if Val in Ls:
Ls.remove(Val)
return Ls
df['TIN'] = df['TIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['PIN'] = df['PIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['column'] = df['column'].apply(lambda x: RemoveVal(x,Value2Remove))
Input:
TIN PIN column
0 [1, 2, val] [0, 1] [val, t, z]
1 [2, val, 4] [14, val] [b, a, val]
Output:
TIN PIN column
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
答案4
得分: 0
你可以使用两个嵌套的“for”循环来完成这个任务,一个用于遍历数据框中的单元格,另一个用于遍历单元格中列表的元素。
for cell in df["col"]:
for n, listitem in enumerate(cell):
if listitem < 10:
cell.pop(n)
else:
continue
英文:
You can likely do this with 2 embedded "for" loops, once to iterate across cells in the df, and another to iterate through the list elements in the cell.
for cell in df["col"]:
for listitem in cell:
if listitem < 10:
list.pop(n)
else:
continue
"cell" iterates the df cells, and listitems iterated the elements of the list within the cell. With n meaning the "nth" element in the list. IOW if n = 3 then it will delete the 4TH element of the list (remember 0 is technically the first element).
答案5
得分: 0
你可以通过将每一行视为一个系列来使用 apply
。
def remove_nan(s):
cols = ['TIN', 'PIN', 'column']
for col in cols:
try:
s[col].remove('nan')
except ValueError:
pass
return s
将这个函数应用到 df
上,将轴设置为 1
,以便在列上执行操作。
df_text = df.apply(remove_nan, axis=1)
print(df_text)
TIN PIN column
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
英文:
You can use apply
by treating each row as a series
def remove_nan(s):
cols = ['TIN', 'PIN', 'column']
for col in cols:
try:
s[col].remove('nan')
except ValueError:
pass
return s
Apply this to df
, set axis to 1
so operation is performed over columns
df_text = df.apply(remove_nan, axis=1)
print(df_text)
TIN PIN column
0 [1, 2] [0, 1] [t, z]
1 [2, 4] [14] [b, a]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论