英文:
pd.fillna(pd.Series()) can't fill all NaN values
问题
我理解你的问题。问题在于你使用了不同长度的filler
来填充df2.month
列,这可能导致一些NaN值保留。要确保没有NaN值,你可以按以下方式更改代码:
from numpy.random import default_rng
rng = default_rng()
filler = rng.choice(len(df2), size=len(df2), replace=False)
filler = pd.Series(-abs(filler))
df2.month = df2.month.fillna(filler).astype(int)
df2
这会根据df2
的长度生成相同长度的filler
,并将NaN值填充为整数,确保没有NaN值在输出中。
英文:
I want to fill the NaN
s in a dataframe with random values:
df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
['a', 'b', 'a', 'b', 'a', 'b'],
['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
[np.nan, np.nan, 'Jan', np.nan, np.nan, 'Jan'],
[1,2,3,4,5,6])),
columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1
Out:
sample ID compound country month value
0 0001 a USA NaN 1
1 0001 b USA NaN 2
2 0002 a USA Jan 3
3 0003 b USA NaN 4
4 0004 a USA NaN 5
5 0004 b USA Jan 6
I slice the database based on the compound
column:
df2 = df1.loc[df1.compound == 'a']
df2
Out:
sample ID compound country month value
0 0001 a USA NaN 1
2 0002 a USA Jan 3
4 0004 a USA NaN 5
Then I tried to fillna
with non-repeated values using filler
:
from numpy.random import default_rng
rng = default_rng()
filler = rng.choice(len(df2.month), size=len(df2.month), replace=False)
filler = pd.Series(-abs(filler))
df2.month.fillna(filler, inplace=True)
df2
Out:
sample ID compound country month value
0 0001 a USA -1.0 1
2 0002 a USA Jan 3
4 0004 a USA NaN 5
I expected no NaN
in the out but actually not, Why?
答案1
得分: 3
问题是,您的 filler
索引与 df2
不同,因为 df2
是通过布尔索引是 df1
的一部分,您可以执行以下操作:
filler = pd.Series(-abs(filler)).set_axis(df2.index)
df2['month'].fillna(filler, inplace=True)
英文:
Problem is that your filler
index is different from df2
, since df2
is part of df1
by boolean indexing, you can do
filler = pd.Series(-abs(filler)).set_axis(df2.index)
df2['month'].fillna(filler, inplace=True)
答案2
得分: 1
以下是您提供的内容的中文翻译:
**示例**
s1 = pd.Series([1, 2, None])
s2 = pd.Series([3, 4, 5], index=list('abc'))
运行以下代码
s1.fillna(s2)
输出:
0 1.0
1 2.0
2 NaN
fillna 不能填充不同索引的 NaN
这未必一定是问题的原因。如果这不能解决问题,不要仅仅发布您的代码和目标,创建并提供一个代表您的数据集的最小示例。
https://stackoverflow.com/help/minimal-reproducible-example
英文:
Example
s1 = pd.Series([1, 2, None])
s2 = pd.Series([3, 4, 5], index=list('abc'))
Run below code
s1.fillna(s2)
output:
0 1.0
1 2.0
2 NaN
fillna cant fill NaN of different index
This may not necessarily be the reason. if this cant solve problem, don't just post your code and goals, create and provide a minimal example representing your dataset.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论