问题

我有一个大型数据框，需要找到每一行中所有小于40%的元素并将其设置为NaN，元素未排序，需要为每一行重复这个操作。

我可以强制计算，但你可以想象它不太高效，有没有更高效的方法？

这里的40%意味着将行元素按升序排序，并将低排序的40%元素设为NaN，不包括本身为NaN的元素。
如果我有十个元素：1,21,20,4,5,6,7,9,10,11，应该对它进行排序，变成1,4,5,6,7,9,10,11,20,21，然后移除1,4,5,6，最终变成NaN, 21, 20, NaN, NaN, NaN, 7, 9, 10, 11。

英文:

I have big dataframe, need to find all element less than 40% in a row set to NaN, element not sorted, repeat this for each row.

I can force the calculation, but you can imagine it's not very efficient, there is no efficient way to do it?

40% mean row element order asc, and set low order 40% element to nan, does not contain an element that is itself a nan.
If I have ten element : 1,21,20,4,5,6,7,9,10,11, should sort it to 1,4,5,6,7,9,10,11,20,21 and remove 1,4,5,6, finally become nan,21,20,nan,nan,nan,7,9,10,11.

1  21  20  4  5  6  7  9  10  11

NaN  21  20 NaN NaN NaN  7  9  10  11

答案1

得分: 2

使用DataFrame.count来获取每行非缺失值的数量，然后通过双重numpy.argsort排序值的位置进行比较，最后根据掩码设置缺失值：

print (df)
       0   1   2   3   4   5   6    7   8   9     10
0   1   2   3  10   5   6   7  NaN   9   4  11.0
1   1  21  20   4   5   6   7  9.0  10  11   NaN
counts = df.count(axis=1).mul(0.4).to_numpy()[:, None]
arr = np.argsort(np.argsort(df.to_numpy()))
df[arr < counts] = np.nan
print (df)
       0     1     2     3    4    5   6    7   8     9     10
0 NaN   NaN   NaN  10.0  5.0  6.0   7  NaN   9   NaN  11.0
1 NaN  21.0  20.0   NaN  NaN  NaN   7  9.0  10  11.0   NaN

英文:

Use DataFrame.count for get number of non missing values per rows, then compare by positions of sorted values by double numpy.argsort and last set missing values by mask:

print (df)
   0   1   2   3   4   5   6    7   8   9     10
0   1   2   3  10   5   6   7  NaN   9   4  11.0
1   1  21  20   4   5   6   7  9.0  10  11   NaN
counts = df.count(axis=1).mul(0.4).to_numpy()[:, None]
arr = np.argsort(np.argsort(df.to_numpy()))
df[arr &lt; counts] = np.nan
print (df)
   0     1     2     3    4    5   6    7   8     9     10
0 NaN   NaN   NaN  10.0  5.0  6.0   7  NaN   9   NaN  11.0
1 NaN  21.0  20.0   NaN  NaN  NaN   7  9.0  10  11.0   NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

DataFrame 最高效的方法是将小于 40% 的行值更新为 NaN 吗？

问题

答案1

Hashicorp Vault: Python hvac看不到secrets

遍历对象列表，这些对象可以是元组或对象。

打印一个Word文档，首先在Python 3中设置一些打印机属性。

If I run a Python script from Powershell via the call operator ($val = & python myscript.py) – How can I pass in an array to $val from Python?

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。