2023年7月28日 03:20:46go评论92阅读模式

英文:

New column in a pandas Dataframe based on a the index of minimum value of other columns

问题

对于数据分析需要（超过 500,000 条记录），我需要执行以下操作：

例如，假设我有以下数组：

RMI=[ 100, 200 ,300, 400 ]

以及以下数据框架：

ar = numpy.array([[-2.0, 2.0, -5, 0], [2.7, 10, 5.4, 7], [5.3, 9, 1.5, -12]])
df = pandas.DataFrame(ar, index = ['1', '2', '3'], columns = ['Seed1', 'Seed2', 'Seed3', 'Seed4'])

我需要在 df 中添加一个新列 RMI_number，该列包含 RMI 中索引对应于 Seed1、Seed2、Seed3、Seed4 的最小值的值。

在我的示例中，我希望有以下结果：

df['RMI_number']=[300, 100, 400]

300，因为第一行的最小值在第三个位置（-5 值）
100，因为第二行的最小值在第一个位置（2.7）
400，因为第三行的最小值在第四个位置（-12）

要以最高效的方式执行此操作，您可以尝试使用 idxmin 或 groupby，但是未获得预期结果。

英文:

For data analysis need (more than 500 000 records), I need to do the following :

As an example, let's say than I have the following array :

RMI=[ 100, 200 ,300, 400 ]

and the following dataframe :

ar = numpy.array([[-2.0, 2.0, -5, 0], [2.7, 10, 5.4, 7], [5.3, 9, 1.5, -12]])
df = pandas.DataFrame(ar, index = [&#39;1&#39;, &#39;2&#39;, &#39;3&#39;], columns = [&#39;Seed1&#39;, &#39;Seed2&#39;, &#39;Seed3&#39;, &#39;Seed4&#39;])

I need a new column RMI_number in df, that contains the value in RMI whose index corresponds to the min value between Seed1,Seed2,Seed3,Seed4.

In my example, I expect to have

df[&#39;RMI_number&#39;]=[300, 100, 400]

300 because the minimum value for first line is in third position (-5 value)
100 becasue the minimum value for second line is in first position (2.7)
400 because the minium value for third line is in fourth position (-12)

What would be the most efficient way to do that ?
I've tried a lot of option base on idxmin, or groupby but didn't get what I expected.

答案1

得分: 1

使用 numpy.argmin：

df['RMI_number'] = np.array(RMI)[np.argmin(df.to_numpy(), axis=1)]

或者，如果你有其他列：

df['RMI_number'] = np.array(RMI)[np.argmin(df.filter(like='Seed').to_numpy(), axis=1)]

输出：

   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400

英文:

Use numpy.argmin:

df[&#39;RMI_number&#39;] = np.array(RMI)[np.argmin(df.to_numpy(), axis=1)]

Or, if you have other columns:

df[&#39;RMI_number&#39;] = np.array(RMI)[np.argmin(df.filter(like=&#39;Seed&#39;).to_numpy(), axis=1)]

Output:

   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400

答案2

得分: 1

使用 np.argmin 在数据框数值上的另一种简短版本：

np.array(RMI)[df.values.argmin(1)]

英文:

Another short version of using np.argmin on dataframe values:

np.array(RMI)[df.values.argmin(1)]

答案3

得分: 0

你可以沿着行使用 idxmin，然后按需提取“Seed”名称：

df["RMI_number"] = df.idxmin(axis=1).str.lstrip("Seed").astype(int).mul(100)

或者，转置并使用 reset_index() 重置列名，然后使用 idxmin：

df["RMI_number"] = df.T.reset_index(drop=True).idxmin().add(1).mul(100)

df
   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400

英文:

You can use idxmin along rows and then extract the "Seed" name as required:

df[&quot;RMI_number&quot;] = df.idxmin(axis=1).str.lstrip(&quot;Seed&quot;).astype(int).mul(100)

Alternatively, transpose and reset_index() to reset the column names and then use idxmin:

df[&quot;RMI_number&quot;] = df.T.reset_index(drop=True).idxmin().add(1).mul(100)
&gt;&gt;&gt; df
   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在 pandas 数据帧中基于其他列最小值的索引创建新列。

问题

答案1

答案2

答案3

在Python中在进程间传递带有ID的图像，Windows。

Numpy：将最终维度的切片与另一个数组相乘

如何在父类方法中使用条件来检查子类，如果子类位于其他Python脚本中？

alembic没有在模式中生成表格。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。