2023年4月11日 15:03:27go评论61阅读模式

英文:

How to drop rows where one column is an array of NaN in pandas data frame

问题

I want to drop all nan rows and have something like this:

a b 
4 2 [4, 5, 6]

df.dropna() does not work when the nans are inside an array. Neither does df[~df.b.isnull()]

英文:

I have the following array t from which I make a data frame

t = array([[1, array(nan)],
       [1, array(nan)],
       [1, array(nan)],
       [1, array(nan)],
       [2, array([4, 5, 6])]], dtype=object)

df = pd.DataFrame(t, names=[&#39;a&#39;,&#39;b&#39;])

    a	b
0	1	nan
1	1	nan
2	1	nan
3	1	nan
4	2	[4, 5, 6]

I want to drop all nan rows and have something like this:

    a	b
4	2	[4, 5, 6]

df.dropna() does not work when the nans are inside an array.
Neither does df[~df.b.isnull()]

答案1

得分: 2

You can use subtract np.nan (或者任何数学运算) 在减小维度之前:

out = df[pd.notna(df['b'] - np.nan)]
print(out)

# 输出
   a          b
4  2  [4, 5, 6]

英文:

You can use subtract np.nan (or any math operations) before to reduce the dimension:

out = df[pd.notna(df[&#39;b&#39;] - np.nan)]
print(out)

# Output
   a          b
4  2  [4, 5, 6]

答案2

得分: 2

你可以使用列表推导来执行布尔索引：

out = df[[not np.isnan(x).any() for x in df['b']]]

注意：如果你只想在所有值都是NaN时才删除一行，可以在np.isnan(x).any()的地方使用np.isnan(x).all()。

输出：

   a          b
4  2  [4, 5, 6]

中间步骤：

[not np.isnan(x).any() for x in df['b']]
# [False, False, False, False, True]

英文:

You can use a list comprehension to perform boolean indexing:

out = df[[not np.isnan(x).any() for x in df[&#39;b&#39;]]]

NB. if you want to only remove a row if all values are NaN, then use np.isnan(x).all() in place of np.isnan(x).any().

Output:

   a          b
4  2  [4, 5, 6]

Intermediate:

[not np.isnan(x).any() for x in df[&#39;b&#39;]]
# [False, False, False, False, True]

答案3

得分: 1

你可以选择数组的第一个值：

out = df[df.b.str[0].notna()]
print(out)
   a          b
4  2  [4, 5, 6]

如果需要移除每行中至少有一个 NaN 值的行：

out = df[df.b.explode().astype(float).notna().groupby(level=0).all()]
print(out)
   a          b
4  2  [4, 5, 6]

英文:

You can select first value of array:

out = df[df.b.str[0].notna()]
print (out)
   a          b
4  2  [4, 5, 6]

If need remove rows with at least one NaN per row:

out = df[df.b.explode().astype(float).notna().groupby(level=0).all()]
print (out)
   a          b
4  2  [4, 5, 6]

答案4

得分: 1

另一种可能的解决方案：

df.applymap(lambda x: x).dropna()

输出：

       a          b
    4  2  [4, 5, 6]

英文:

Another possible solution:

df.applymap(lambda x: x).dropna()

Output:

   a          b
4  2  [4, 5, 6]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在 pandas 数据框中删除包含 NaN 数组的行。

问题

答案1

答案2

答案3

答案4

在XYZV CSV文件中集成X和Y

在 pandas 中减去日期列时出现 OverflowError

将数据框A的列1合并到数据框B，当数据框B的列1中存在多个匹配行时？

Python：如何进行 if/else 条件判断以读取可能为空的 CSV 文件

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论