`pd.fillna(pd.Series())` 无法填充所有的 NaN 值。

huangapple go评论99阅读模式
英文:

pd.fillna(pd.Series()) can't fill all NaN values

问题

我理解你的问题。问题在于你使用了不同长度的filler来填充df2.month列,这可能导致一些NaN值保留。要确保没有NaN值,你可以按以下方式更改代码:

  1. from numpy.random import default_rng
  2. rng = default_rng()
  3. filler = rng.choice(len(df2), size=len(df2), replace=False)
  4. filler = pd.Series(-abs(filler))
  5. df2.month = df2.month.fillna(filler).astype(int)
  6. df2

这会根据df2的长度生成相同长度的filler,并将NaN值填充为整数,确保没有NaN值在输出中。

英文:

I want to fill the NaNs in a dataframe with random values:

  1. df1 = pd.DataFrame(list(zip(['0001', '0001', '0002', '0003', '0004', '0004'],
  2. ['a', 'b', 'a', 'b', 'a', 'b'],
  3. ['USA', 'USA', 'USA', 'USA', 'USA', 'USA'],
  4. [np.nan, np.nan, 'Jan', np.nan, np.nan, 'Jan'],
  5. [1,2,3,4,5,6])),
  6. columns=['sample ID', 'compound', 'country', 'month', 'value'])
  7. df1

Out:

  1. sample ID compound country month value
  2. 0 0001 a USA NaN 1
  3. 1 0001 b USA NaN 2
  4. 2 0002 a USA Jan 3
  5. 3 0003 b USA NaN 4
  6. 4 0004 a USA NaN 5
  7. 5 0004 b USA Jan 6

I slice the database based on the compound column:

  1. df2 = df1.loc[df1.compound == 'a']
  2. df2

Out:

  1. sample ID compound country month value
  2. 0 0001 a USA NaN 1
  3. 2 0002 a USA Jan 3
  4. 4 0004 a USA NaN 5

Then I tried to fillna with non-repeated values using filler:

  1. from numpy.random import default_rng
  2. rng = default_rng()
  3. filler = rng.choice(len(df2.month), size=len(df2.month), replace=False)
  4. filler = pd.Series(-abs(filler))
  5. df2.month.fillna(filler, inplace=True)
  6. df2

Out:

  1. sample ID compound country month value
  2. 0 0001 a USA -1.0 1
  3. 2 0002 a USA Jan 3
  4. 4 0004 a USA NaN 5

I expected no NaN in the out but actually not, Why?

答案1

得分: 3

问题是,您的 filler 索引与 df2 不同,因为 df2 是通过布尔索引是 df1 的一部分,您可以执行以下操作:

  1. filler = pd.Series(-abs(filler)).set_axis(df2.index)
  2. df2['month'].fillna(filler, inplace=True)
英文:

Problem is that your filler index is different from df2, since df2 is part of df1 by boolean indexing, you can do

  1. filler = pd.Series(-abs(filler)).set_axis(df2.index)
  2. df2['month'].fillna(filler, inplace=True)

答案2

得分: 1

以下是您提供的内容的中文翻译:

  1. **示例**
  2. s1 = pd.Series([1, 2, None])
  3. s2 = pd.Series([3, 4, 5], index=list('abc'))
  4. 运行以下代码
  5. s1.fillna(s2)
  6. 输出:
  7. 0 1.0
  8. 1 2.0
  9. 2 NaN
  10. fillna 不能填充不同索引的 NaN
  11. 这未必一定是问题的原因。如果这不能解决问题,不要仅仅发布您的代码和目标,创建并提供一个代表您的数据集的最小示例。
  12. https://stackoverflow.com/help/minimal-reproducible-example
英文:

Example

  1. s1 = pd.Series([1, 2, None])
  2. s2 = pd.Series([3, 4, 5], index=list('abc'))

Run below code

  1. s1.fillna(s2)

output:

  1. 0 1.0
  2. 1 2.0
  3. 2 NaN

fillna cant fill NaN of different index

This may not necessarily be the reason. if this cant solve problem, don't just post your code and goals, create and provide a minimal example representing your dataset.

https://stackoverflow.com/help/minimal-reproducible-example

huangapple
  • 本文由 发表于 2023年5月11日 15:37:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76225163.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定