DataFrame 最高效的方法是将小于 40% 的行值更新为 NaN 吗?

huangapple go评论111阅读模式
英文:

DataFrame most efficient way update row value less than 40% to NaN?

问题

我有一个大型数据框,需要找到每一行中所有小于40%的元素并将其设置为NaN,元素未排序,需要为每一行重复这个操作。

我可以强制计算,但你可以想象它不太高效,有没有更高效的方法?

这里的40%意味着将行元素按升序排序,并将低排序的40%元素设为NaN,不包括本身为NaN的元素。
如果我有十个元素:1,21,20,4,5,6,7,9,10,11,应该对它进行排序,变成1,4,5,6,7,9,10,11,20,21,然后移除1,4,5,6,最终变成NaN, 21, 20, NaN, NaN, NaN, 7, 9, 10, 11

英文:

I have big dataframe, need to find all element less than 40% in a row set to NaN, element not sorted, repeat this for each row.

I can force the calculation, but you can imagine it's not very efficient, there is no efficient way to do it?

40% mean row element order asc, and set low order 40% element to nan, does not contain an element that is itself a nan.
If I have ten element : 1,21,20,4,5,6,7,9,10,11, should sort it to 1,4,5,6,7,9,10,11,20,21 and remove 1,4,5,6, finally become nan,21,20,nan,nan,nan,7,9,10,11.

  1. 1 21 20 4 5 6 7 9 10 11

to

  1. NaN 21 20 NaN NaN NaN 7 9 10 11

答案1

得分: 2

使用DataFrame.count来获取每行非缺失值的数量,然后通过双重numpy.argsort排序值的位置进行比较,最后根据掩码设置缺失值:

  1. print (df)
  2. 0 1 2 3 4 5 6 7 8 9 10
  3. 0 1 2 3 10 5 6 7 NaN 9 4 11.0
  4. 1 1 21 20 4 5 6 7 9.0 10 11 NaN
  5. counts = df.count(axis=1).mul(0.4).to_numpy()[:, None]
  6. arr = np.argsort(np.argsort(df.to_numpy()))
  7. df[arr < counts] = np.nan
  8. print (df)
  9. 0 1 2 3 4 5 6 7 8 9 10
  10. 0 NaN NaN NaN 10.0 5.0 6.0 7 NaN 9 NaN 11.0
  11. 1 NaN 21.0 20.0 NaN NaN NaN 7 9.0 10 11.0 NaN
英文:

Use DataFrame.count for get number of non missing values per rows, then compare by positions of sorted values by double numpy.argsort and last set missing values by mask:

  1. print (df)
  2. 0 1 2 3 4 5 6 7 8 9 10
  3. 0 1 2 3 10 5 6 7 NaN 9 4 11.0
  4. 1 1 21 20 4 5 6 7 9.0 10 11 NaN
  5. counts = df.count(axis=1).mul(0.4).to_numpy()[:, None]
  6. arr = np.argsort(np.argsort(df.to_numpy()))
  7. df[arr &lt; counts] = np.nan
  8. print (df)
  9. 0 1 2 3 4 5 6 7 8 9 10
  10. 0 NaN NaN NaN 10.0 5.0 6.0 7 NaN 9 NaN 11.0
  11. 1 NaN 21.0 20.0 NaN NaN NaN 7 9.0 10 11.0 NaN

huangapple
  • 本文由 发表于 2023年2月24日 14:16:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/75553170.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定