`pd.fillna(pd.Series())` 无法填充所有的 NaN 值。

huangapple go评论59阅读模式
英文:

pd.fillna(pd.Series()) can't fill all NaN values

问题

我理解你的问题。问题在于你使用了不同长度的filler来填充df2.month列,这可能导致一些NaN值保留。要确保没有NaN值,你可以按以下方式更改代码:

from numpy.random import default_rng

rng = default_rng()
filler = rng.choice(len(df2), size=len(df2), replace=False)
filler = pd.Series(-abs(filler))

df2.month = df2.month.fillna(filler).astype(int)
df2

这会根据df2的长度生成相同长度的filler,并将NaN值填充为整数,确保没有NaN值在输出中。

英文:

I want to fill the NaNs in a dataframe with random values:

df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
                            ['a', 'b', 'a', 'b', 'a', 'b'],
                           ['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
                           [np.nan, np.nan, 'Jan', np.nan, np.nan, 'Jan'],
                           [1,2,3,4,5,6])),
                    columns=['sample ID', 'compound', 'country', 'month', 'value'])
df1

Out:

	sample ID	compound	country	month	value
0	0001	      a	          USA	NaN	     1
1	0001	      b	          USA	NaN	     2
2	0002	      a	          USA	Jan	     3
3	0003	      b	          USA	NaN	     4
4	0004	      a	          USA	NaN	     5 
5	0004	      b	          USA	Jan	     6

I slice the database based on the compound column:

df2 = df1.loc[df1.compound == 'a']
df2

Out:

  sample ID	 compound	country	month	value
0	0001	  a	          USA	NaN	     1
2	0002	  a	          USA	Jan	     3
4	0004	  a           USA	NaN	     5

Then I tried to fillna with non-repeated values using filler:

from numpy.random import default_rng

rng = default_rng()
filler = rng.choice(len(df2.month), size=len(df2.month), replace=False)
filler = pd.Series(-abs(filler))

df2.month.fillna(filler, inplace=True)
df2

Out:

   sample ID	compound	country	month	value
0	0001	       a	     USA	-1.0	1
2	0002	       a	     USA	Jan	    3
4	0004	       a	     USA	NaN	    5 

I expected no NaN in the out but actually not, Why?

答案1

得分: 3

问题是,您的 filler 索引与 df2 不同,因为 df2 是通过布尔索引是 df1 的一部分,您可以执行以下操作:

filler = pd.Series(-abs(filler)).set_axis(df2.index)
df2['month'].fillna(filler, inplace=True)
英文:

Problem is that your filler index is different from df2, since df2 is part of df1 by boolean indexing, you can do

filler = pd.Series(-abs(filler)).set_axis(df2.index)
df2['month'].fillna(filler, inplace=True)

答案2

得分: 1

以下是您提供的内容的中文翻译:

**示例**

s1 = pd.Series([1, 2, None])
s2 = pd.Series([3, 4, 5], index=list('abc'))

运行以下代码

s1.fillna(s2)

输出:

0    1.0
1    2.0
2    NaN

fillna 不能填充不同索引的 NaN

这未必一定是问题的原因。如果这不能解决问题,不要仅仅发布您的代码和目标,创建并提供一个代表您的数据集的最小示例。

https://stackoverflow.com/help/minimal-reproducible-example
英文:

Example

s1 = pd.Series([1, 2, None])
s2 = pd.Series([3, 4, 5], index=list('abc'))

Run below code

s1.fillna(s2)

output:

0    1.0
1    2.0
2    NaN

fillna cant fill NaN of different index

This may not necessarily be the reason. if this cant solve problem, don't just post your code and goals, create and provide a minimal example representing your dataset.

https://stackoverflow.com/help/minimal-reproducible-example

huangapple
  • 本文由 发表于 2023年5月11日 15:37:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225163.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定