英文:
How to drop rows where one column is an array of NaN in pandas data frame
问题
I want to drop all nan rows and have something like this:
a b
4 2 [4, 5, 6]
df.dropna() does not work when the nans are inside an array. Neither does df[~df.b.isnull()]
英文:
I have the following array t from which I make a data frame
t = array([[1, array(nan)],
[1, array(nan)],
[1, array(nan)],
[1, array(nan)],
[2, array([4, 5, 6])]], dtype=object)
df = pd.DataFrame(t, names=['a','b'])
a b
0 1 nan
1 1 nan
2 1 nan
3 1 nan
4 2 [4, 5, 6]
I want to drop all nan rows and have something like this:
a b
4 2 [4, 5, 6]
df.dropna() does not work when the nans are inside an array.
Neither does df[~df.b.isnull()]
答案1
得分: 2
You can use subtract np.nan
(或者任何数学运算) 在减小维度之前:
out = df[pd.notna(df['b'] - np.nan)]
print(out)
# 输出
a b
4 2 [4, 5, 6]
英文:
You can use subtract np.nan
(or any math operations) before to reduce the dimension:
out = df[pd.notna(df['b'] - np.nan)]
print(out)
# Output
a b
4 2 [4, 5, 6]
答案2
得分: 2
你可以使用列表推导来执行布尔索引:
out = df[[not np.isnan(x).any() for x in df['b']]]
注意:如果你只想在所有值都是NaN时才删除一行,可以在np.isnan(x).any()
的地方使用np.isnan(x).all()
。
输出:
a b
4 2 [4, 5, 6]
中间步骤:
[not np.isnan(x).any() for x in df['b']]
# [False, False, False, False, True]
英文:
You can use a list comprehension to perform boolean indexing:
out = df[[not np.isnan(x).any() for x in df['b']]]
NB. if you want to only remove a row if all values are NaN, then use np.isnan(x).all()
in place of np.isnan(x).any()
.
Output:
a b
4 2 [4, 5, 6]
Intermediate:
[not np.isnan(x).any() for x in df['b']]
# [False, False, False, False, True]
答案3
得分: 1
你可以选择数组的第一个值:
out = df[df.b.str[0].notna()]
print(out)
a b
4 2 [4, 5, 6]
如果需要移除每行中至少有一个 NaN
值的行:
out = df[df.b.explode().astype(float).notna().groupby(level=0).all()]
print(out)
a b
4 2 [4, 5, 6]
英文:
You can select first value of array:
out = df[df.b.str[0].notna()]
print (out)
a b
4 2 [4, 5, 6]
If need remove rows with at least one NaN
per row:
out = df[df.b.explode().astype(float).notna().groupby(level=0).all()]
print (out)
a b
4 2 [4, 5, 6]
答案4
得分: 1
另一种可能的解决方案:
df.applymap(lambda x: x).dropna()
输出:
a b
4 2 [4, 5, 6]
英文:
Another possible solution:
df.applymap(lambda x: x).dropna()
Output:
a b
4 2 [4, 5, 6]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论