2023年4月19日 15:37:05go评论97阅读模式

英文:

Removing particular element from list when list is present in a cell of dataframe

问题

你可以尝试以下方法来从数据框中的所有列表中移除特定值，而不使用整个数据框的for循环：

import pandas as pd
# 创建一个示例数据框
data = {'TIN': [[1, 2, 'val'], [2, 'val', 4]],
        'PIN': [[0, 1], [14, 'val']],
        'column': [['val', 't', 'z'], ['b', 'a', 'val']]}
df = pd.DataFrame(data)
# 定义要移除的值
value_to_remove = 'val'
# 使用apply和lambda函数来移除值
df = df.apply(lambda col: col.apply(lambda x: [item for item in x if item != value_to_remove]))
# 打印结果
print(df)

这将移除所有列表中的特定值'value_to_remove'，而不需要使用整个数据帧的for循环。

英文:

My dataframe consists of lists in each cell and i want to remove a particular value from all these lists without using for loop on the whole dataframe,

    TIN        PIN       column       ......
0  [1,2,val]  [0,1]      [val, t, z]  ......
1  [2,val,4]  [14,val]   [b, a, val]  ......
........

These val are in string format and i want to remove all of these they exist in some of these lists and doesn't exist in others...

I tried using

.apply(lambda x: x.remove('nan'|'NaT') as the val is either nan or NaT in string format,i.e, 'nan' or 'NaT' instead of being null values.....

It gave me an error saying, besides i think my logic itself was wrong there as i think it was trying to remove x instead of what was specified......

答案1

得分: 2

无法按照你的意图使用 remove，因为它如果找到值就返回 None，如果未找到则引发 ValueError。实现你想要的一种方法是使用嵌套的应用：

df = pd.DataFrame(
    {'TIN': {0: [1, 2, 'val'], 1: [2, 'val', 4]},
     'PIN': {0: [0, 1], 1: [14, 'val']},
     'column': {0: ['val', 't', 'z'], 1: ['b', 'a', 'val']}}
)
df = df.apply(lambda x: x.apply(lambda y: [v for v in y if v != 'val']))

输出：

      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

性能方面，最好使用 applymap（大约快35%），如 @LuanNguyen 的答案中所述：

df = df.applymap(lambda y: [v for v in y if v != 'val'])

或者使用要移除的值的集合：

remove = { 'val', 'xyz' }
df = df.applymap(lambda y: [v for v in y if v not in remove])

英文:

You can't use remove the way you were intending, as it returns None if the value was found, and raises ValueError if it wasn't. One way to do what you want would be to use a nested apply:

df = pd.DataFrame(
    {&#39;TIN&#39;: {0: [1,2,&#39;val&#39;], 1: [2,&#39;val&#39;,4]},
     &#39;PIN&#39;: {0: [0,1], 1: [14,&#39;val&#39;]},
     &#39;column&#39;: {0: [&#39;val&#39;, &#39;t&#39;, &#39;z&#39;], 1: [&#39;b&#39;, &#39;a&#39;, &#39;val&#39;]}}
)
df = df.apply(lambda x:x.apply(lambda y:[v for v in y if v != &#39;val&#39;]))

Output:

      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

Note performance-wise it's better to use applymap (about 35% faster) as described in @LuanNguyen's answer:

df = df.applymap(lambda y:[v for v in y if v != &#39;val&#39;])

Or using a set of values to remove:

remove = { &#39;val&#39;, &#39;xyz&#39; }
df = df.applymap(lambda y:[v for v in y if v not in remove])

答案2

得分: 2

也许这正是你需要的，我使用数据框方法 applymap 应用 Lambda 函数于数据框中的所有单元格，并使用 Python filter 函数从变量 nan_set 中删除不需要的元素。

nan_set = {'val'}
print(
    df.applymap(lambda arr: list(
        filter(lambda element: element not in nan_set, arr)))
)

输入：

        TIN        PIN          COL
0  [1, 2, val]     [0, 1]  [val, t, z]
1  [2, val, 4]  [14, val]  [b, a, val]

输出：

        TIN     PIN     COL
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

英文:

Maybe this is what you need, I use the dataframe method applymap to apply the lambda function for all cells in the dataframe and the Python filter function to remove the unwanted elements in the variable nan_set.

nan_set = {&#39;val&#39;}
print(
    df.applymap(lambda arr: list(
        filter(lambda element: element not in nan_set, arr)))
)

Input:

    TIN        PIN          COL
0  [1, 2, val]     [0, 1]  [val, t, z]
1  [2, val, 4]  [14, val]  [b, a, val]

Output:

    TIN     PIN     COL
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

答案3

得分: 1

你可以使用自定义函数进行应用。

（大部分代码可以在一行中完成，但很难理解）：

Value2Remove='val'
def RemoveVal(Ls,Val):
   if Val in Ls:
       Ls.remove(Val)
   return Ls
df['TIN'] = df['TIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['PIN'] = df['PIN'].apply(lambda x: RemoveVal(x,Value2Remove))
df['column'] = df['column'].apply(lambda x: RemoveVal(x,Value2Remove))

输入:

	TIN	PIN	column
0	[1, 2, val]	[0, 1]	[val, t, z]
1	[2, val, 4]	[14, val]	[b, a, val]

输出:

	TIN	PIN	column
0	[1, 2]	[0, 1]	[t, z]
1	[2, 4]	[14]	[b, a]

英文:

You can use the apply with a custom function.

(you can do most of the code in one line, but it is difficult to understand):

Value2Remove=&#39;val&#39;
def RemoveVal(Ls,Val):
   if Val in Ls:
       Ls.remove(Val)
   return Ls
df[&#39;TIN&#39;] = df[&#39;TIN&#39;].apply(lambda x: RemoveVal(x,Value2Remove))
df[&#39;PIN&#39;] = df[&#39;PIN&#39;].apply(lambda x: RemoveVal(x,Value2Remove))
df[&#39;column&#39;] = df[&#39;column&#39;].apply(lambda x: RemoveVal(x,Value2Remove))

Input:

	TIN	PIN	column
0	[1, 2, val]	[0, 1]	[val, t, z]
1	[2, val, 4]	[14, val]	[b, a, val]

Output:

	TIN	PIN	column
0	[1, 2]	[0, 1]	[t, z]
1	[2, 4]	[14]	[b, a]

答案4

得分: 0

你可以使用两个嵌套的“for”循环来完成这个任务，一个用于遍历数据框中的单元格，另一个用于遍历单元格中列表的元素。

for cell in df["col"]:
    for n, listitem in enumerate(cell):
        if listitem < 10:
            cell.pop(n)
        else:
            continue

英文:

You can likely do this with 2 embedded "for" loops, once to iterate across cells in the df, and another to iterate through the list elements in the cell.

for cell in df[&quot;col&quot;]:
    for listitem in cell:
        if listitem &lt; 10:
            list.pop(n)
        else:
            continue

"cell" iterates the df cells, and listitems iterated the elements of the list within the cell. With n meaning the "nth" element in the list. IOW if n = 3 then it will delete the 4TH element of the list (remember 0 is technically the first element).

答案5

得分: 0

你可以通过将每一行视为一个系列来使用 apply。

def remove_nan(s):
    cols = ['TIN', 'PIN', 'column']
    for col in cols:
        try:
            s[col].remove('nan')
        except ValueError:
            pass
    return s

将这个函数应用到 df 上，将轴设置为 1，以便在列上执行操作。

df_text = df.apply(remove_nan, axis=1)
print(df_text)

      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

英文:

You can use apply by treating each row as a series

def remove_nan(s):
    cols = [&#39;TIN&#39;, &#39;PIN&#39;, &#39;column&#39;]
    for col in cols:
            try:
                    s[col].remove(&#39;nan&#39;)
            except ValueError:
                    pass
    return s

Apply this to df, set axis to 1 so operation is performed over columns

df_text = df.apply(remove_nan, axis=1)
print(df_text)

      TIN     PIN  column
0  [1, 2]  [0, 1]  [t, z]
1  [2, 4]    [14]  [b, a]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

从数据框单元格中删除特定元素时，只需将该元素从列表中删除。

问题

答案1

答案2

答案3

答案4

答案5

解析信息在按钮点击时显示。

将Pivot Like数据转换为JSON使用Python或Pandas

如何在具有两个y轴的图形中使用matplotlib的对数刻度？

如何在Flask-SQLAlchemy中使用返回表的SQL函数？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。