如何在 pandas 数据框中删除包含 NaN 数组的行。

huangapple go评论61阅读模式
英文:

How to drop rows where one column is an array of NaN in pandas data frame

问题

I want to drop all nan rows and have something like this:

a b 
4 2 [4, 5, 6]

df.dropna() does not work when the nans are inside an array. Neither does df[~df.b.isnull()]

英文:

I have the following array t from which I make a data frame

t = array([[1, array(nan)],
       [1, array(nan)],
       [1, array(nan)],
       [1, array(nan)],
       [2, array([4, 5, 6])]], dtype=object)

df = pd.DataFrame(t, names=['a','b'])

    a	b
0	1	nan
1	1	nan
2	1	nan
3	1	nan
4	2	[4, 5, 6]

I want to drop all nan rows and have something like this:

    a	b
4	2	[4, 5, 6]

df.dropna() does not work when the nans are inside an array.
Neither does df[~df.b.isnull()]

答案1

得分: 2

You can use subtract np.nan (或者任何数学运算) 在减小维度之前:

out = df[pd.notna(df['b'] - np.nan)]
print(out)

# 输出
   a          b
4  2  [4, 5, 6]
英文:

You can use subtract np.nan (or any math operations) before to reduce the dimension:

out = df[pd.notna(df['b'] - np.nan)]
print(out)

# Output
   a          b
4  2  [4, 5, 6]

答案2

得分: 2

你可以使用列表推导来执行布尔索引

out = df[[not np.isnan(x).any() for x in df['b']]]

注意:如果你只想在所有值都是NaN时才删除一行,可以在np.isnan(x).any()的地方使用np.isnan(x).all()

输出:

   a          b
4  2  [4, 5, 6]

中间步骤:

[not np.isnan(x).any() for x in df['b']]
# [False, False, False, False, True]
英文:

You can use a list comprehension to perform boolean indexing:

out = df[[not np.isnan(x).any() for x in df['b']]]

NB. if you want to only remove a row if all values are NaN, then use np.isnan(x).all() in place of np.isnan(x).any().

Output:

   a          b
4  2  [4, 5, 6]

Intermediate:

[not np.isnan(x).any() for x in df['b']]
# [False, False, False, False, True]

答案3

得分: 1

你可以选择数组的第一个值:

out = df[df.b.str[0].notna()]
print(out)
   a          b
4  2  [4, 5, 6]

如果需要移除每行中至少有一个 NaN 值的行:

out = df[df.b.explode().astype(float).notna().groupby(level=0).all()]
print(out)
   a          b
4  2  [4, 5, 6]
英文:

You can select first value of array:

out = df[df.b.str[0].notna()]
print (out)
   a          b
4  2  [4, 5, 6]

If need remove rows with at least one NaN per row:

out = df[df.b.explode().astype(float).notna().groupby(level=0).all()]
print (out)
   a          b
4  2  [4, 5, 6]

答案4

得分: 1

另一种可能的解决方案:

df.applymap(lambda x: x).dropna()

输出:

       a          b
    4  2  [4, 5, 6]
英文:

Another possible solution:

df.applymap(lambda x: x).dropna()

Output:

   a          b
4  2  [4, 5, 6]

huangapple
  • 本文由 发表于 2023年4月11日 15:03:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75983216.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定