从数据框单元格中删除特定元素时,只需将该元素从列表中删除。

huangapple go评论65阅读模式
英文:

Removing particular element from list when list is present in a cell of dataframe

问题

你可以尝试以下方法来从数据框中的所有列表中移除特定值,而不使用整个数据框的for循环:

import pandas as pd

# 创建一个示例数据框
data = {'TIN': [[1, 2, 'val'], [2, 'val', 4]],
        'PIN': [[0, 1], [14, 'val']],
        'column': [['val', 't', 'z'], ['b', 'a', 'val']]}
df = pd.DataFrame(data)

# 定义要移除的值
value_to_remove = 'val'

# 使用apply和lambda函数来移除值
df = df.apply(lambda col: col.apply(lambda x: [item for item in x if item != value_to_remove]))

# 打印结果
print(df)

这将移除所有列表中的特定值'value_to_remove',而不需要使用整个数据帧的for循环。

英文:

My dataframe consists of lists in each cell and i want to remove a particular value from all these lists without using for loop on the whole dataframe,

    TIN        PIN       column       ......

0  [1,2,val]  [0,1]      [val, t, z]  ......
1  [2,val,4]  [14,val]   [b, a, val]  ......
........

These val are in string format and i want to remove all of these they exist in some of these lists and doesn't exist in others...

I tried using

.apply(lambda x: x.remove('nan'|'NaT') as the val is either nan or NaT in string format,i.e, 'nan' or 'NaT' instead of being null values.....

It gave me an error saying, besides i think my logic itself was wrong there as i think it was trying to remove x instead of what was specified......

答案1

得分: 2

无法按照你的意图使用 remove,因为它如果找到值就返回 None,如果未找到则引发 ValueError。实现你想要的一种方法是使用嵌套的应用:

df = pd.DataFrame(
    {'TIN': {0: [1, 2, 'val'], 1: [2, 'val', 4]},
     'PIN': {0: [0, 1], 1: [14, 'val']},
     'column': {0: ['val', 't', 'z'], 1: ['b', 'a', 'val']}}
)
df = df.apply(lambda x: x.apply(lambda y: [v for v in y if v != 'val']))

输出:

      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

性能方面,最好使用 applymap(大约快35%),如 @LuanNguyen 的答案中所述:

df = df.applymap(lambda y: [v for v in y if v != 'val'])

或者使用要移除的值的集合:

remove = { 'val', 'xyz' }
df = df.applymap(lambda y: [v for v in y if v not in remove])
英文:

You can't use remove the way you were intending, as it returns None if the value was found, and raises ValueError if it wasn't. One way to do what you want would be to use a nested apply:

df = pd.DataFrame(
    {'TIN': {0: [1,2,'val'], 1: [2,'val',4]},
     'PIN': {0: [0,1], 1: [14,'val']},
     'column': {0: ['val', 't', 'z'], 1: ['b', 'a', 'val']}}
)
df = df.apply(lambda x:x.apply(lambda y:[v for v in y if v != 'val']))

Output:

      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

Note performance-wise it's better to use applymap (about 35% faster) as described in @LuanNguyen's answer:

df = df.applymap(lambda y:[v for v in y if v != 'val'])

Or using a set of values to remove:

remove = { 'val', 'xyz' }
df = df.applymap(lambda y:[v for v in y if v not in remove])

答案2

得分: 2

也许这正是你需要的,我使用数据框方法 applymap 应用 Lambda 函数于数据框中的所有单元格,并使用 Python filter 函数从变量 nan_set 中删除不需要的元素。

nan_set = {'val'}
print(
    df.applymap(lambda arr: list(
        filter(lambda element: element not in nan_set, arr)))
)

输入:

        TIN        PIN          COL
0  [1, 2, val]     [0, 1]  [val, t, z]
1  [2, val, 4]  [14, val]  [b, a, val]

输出:

        TIN     PIN     COL
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]
英文:

Maybe this is what you need, I use the dataframe method applymap to apply the lambda function for all cells in the dataframe and the Python filter function to remove the unwanted elements in the variable nan_set.

nan_set = {'val'}
print(
    df.applymap(lambda arr: list(
        filter(lambda element: element not in nan_set, arr)))
)

Input:

    TIN        PIN          COL
0  [1, 2, val]     [0, 1]  [val, t, z]
1  [2, val, 4]  [14, val]  [b, a, val]

Output:

    TIN     PIN     COL
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

答案3

得分: 1

你可以使用自定义函数进行应用。

(大部分代码可以在一行中完成,但很难理解):

Value2Remove='val'

def RemoveVal(Ls,Val):
   if Val in Ls:
       Ls.remove(Val)
   return Ls


df['TIN'] = df['TIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['PIN'] = df['PIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['column'] = df['column'].apply(lambda x: RemoveVal(x,Value2Remove))

输入:

	TIN	PIN	column
0	[1, 2, val]	[0, 1]	[val, t, z]
1	[2, val, 4]	[14, val]	[b, a, val]

输出:

	TIN	PIN	column
0	[1, 2]	[0, 1]	[t, z]
1	[2, 4]	[14]	[b, a]
英文:

You can use the apply with a custom function.

(you can do most of the code in one line, but it is difficult to understand):

Value2Remove='val'

def RemoveVal(Ls,Val):
   if Val in Ls:
       Ls.remove(Val)
   return Ls


df['TIN'] = df['TIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['PIN'] = df['PIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['column'] = df['column'].apply(lambda x: RemoveVal(x,Value2Remove))

Input:

	TIN	PIN	column
0	[1, 2, val]	[0, 1]	[val, t, z]
1	[2, val, 4]	[14, val]	[b, a, val]

Output:

	TIN	PIN	column
0	[1, 2]	[0, 1]	[t, z]
1	[2, 4]	[14]	[b, a]

答案4

得分: 0

你可以使用两个嵌套的“for”循环来完成这个任务,一个用于遍历数据框中的单元格,另一个用于遍历单元格中列表的元素。

for cell in df["col"]:
    for n, listitem in enumerate(cell):
        if listitem < 10:
            cell.pop(n)
        else:
            continue
英文:

You can likely do this with 2 embedded "for" loops, once to iterate across cells in the df, and another to iterate through the list elements in the cell.

for cell in df[&quot;col&quot;]:
    for listitem in cell:
        if listitem &lt; 10:
            list.pop(n)
        else:
            continue

"cell" iterates the df cells, and listitems iterated the elements of the list within the cell. With n meaning the "nth" element in the list. IOW if n = 3 then it will delete the 4TH element of the list (remember 0 is technically the first element).

答案5

得分: 0

你可以通过将每一行视为一个系列来使用 apply

def remove_nan(s):
    cols = ['TIN', 'PIN', 'column']
    for col in cols:
        try:
            s[col].remove('nan')
        except ValueError:
            pass
    return s

将这个函数应用到 df 上,将轴设置为 1,以便在列上执行操作。

df_text = df.apply(remove_nan, axis=1)

print(df_text)
      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]
英文:

You can use apply by treating each row as a series

def remove_nan(s):
    cols = [&#39;TIN&#39;, &#39;PIN&#39;, &#39;column&#39;]
    for col in cols:
            try:
                    s[col].remove(&#39;nan&#39;)
            except ValueError:
                    pass
    return s

Apply this to df, set axis to 1 so operation is performed over columns

df_text = df.apply(remove_nan, axis=1)

print(df_text)
      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

huangapple
  • 本文由 发表于 2023年4月19日 15:37:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76051857.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定