在 pandas 数据帧中基于其他列最小值的索引创建新列。

huangapple go评论65阅读模式
英文:

New column in a pandas Dataframe based on a the index of minimum value of other columns

问题

对于数据分析需要(超过 500,000 条记录),我需要执行以下操作:

例如,假设我有以下数组:

RMI=[ 100, 200 ,300, 400 ]

以及以下数据框架:

ar = numpy.array([[-2.0, 2.0, -5, 0], [2.7, 10, 5.4, 7], [5.3, 9, 1.5, -12]])
df = pandas.DataFrame(ar, index = ['1', '2', '3'], columns = ['Seed1', 'Seed2', 'Seed3', 'Seed4'])

我需要在 df 中添加一个新列 RMI_number,该列包含 RMI 中索引对应于 Seed1、Seed2、Seed3、Seed4 的最小值的值。

在我的示例中,我希望有以下结果:

df['RMI_number']=[300, 100, 400]

300,因为第一行的最小值在第三个位置(-5 值)
100,因为第二行的最小值在第一个位置(2.7)
400,因为第三行的最小值在第四个位置(-12)

要以最高效的方式执行此操作,您可以尝试使用 idxmingroupby,但是未获得预期结果。

英文:

For data analysis need (more than 500 000 records), I need to do the following :

As an example, let's say than I have the following array :

RMI=[ 100, 200 ,300, 400 ]

and the following dataframe :

ar = numpy.array([[-2.0, 2.0, -5, 0], [2.7, 10, 5.4, 7], [5.3, 9, 1.5, -12]])
df = pandas.DataFrame(ar, index = ['1', '2', '3'], columns = ['Seed1', 'Seed2', 'Seed3', 'Seed4'])

I need a new column RMI_number in df, that contains the value in RMI whose index corresponds to the min value between Seed1,Seed2,Seed3,Seed4.

In my example, I expect to have

df['RMI_number']=[300, 100, 400]

300 because the minimum value for first line is in third position (-5 value)
100 becasue the minimum value for second line is in first position (2.7)
400 because the minium value for third line is in fourth position (-12)

What would be the most efficient way to do that ?
I've tried a lot of option base on idxmin, or groupby but didn't get what I expected.

答案1

得分: 1

使用 numpy.argmin

df['RMI_number'] = np.array(RMI)[np.argmin(df.to_numpy(), axis=1)]

或者,如果你有其他列:

df['RMI_number'] = np.array(RMI)[np.argmin(df.filter(like='Seed').to_numpy(), axis=1)]

输出:

   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400
英文:

Use numpy.argmin:

df['RMI_number'] = np.array(RMI)[np.argmin(df.to_numpy(), axis=1)]

Or, if you have other columns:

df['RMI_number'] = np.array(RMI)[np.argmin(df.filter(like='Seed').to_numpy(), axis=1)]

Output:

   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400

答案2

得分: 1

使用 np.argmin 在数据框数值上的另一种简短版本:

np.array(RMI)[df.values.argmin(1)]
英文:

Another short version of using np.argmin on dataframe values:

np.array(RMI)[df.values.argmin(1)]

答案3

得分: 0

你可以沿着行使用 idxmin,然后按需提取“Seed”名称:

df["RMI_number"] = df.idxmin(axis=1).str.lstrip("Seed").astype(int).mul(100)

或者,转置并使用 reset_index() 重置列名,然后使用 idxmin

df["RMI_number"] = df.T.reset_index(drop=True).idxmin().add(1).mul(100)
df
   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400
英文:

You can use idxmin along rows and then extract the "Seed" name as required:

df["RMI_number"] = df.idxmin(axis=1).str.lstrip("Seed").astype(int).mul(100)

Alternatively, transpose and reset_index() to reset the column names and then use idxmin:

df["RMI_number"] = df.T.reset_index(drop=True).idxmin().add(1).mul(100)

>>> df
   Seed1  Seed2  Seed3  Seed4  RMI_number
1   -2.0    2.0   -5.0    0.0         300
2    2.7   10.0    5.4    7.0         100
3    5.3    9.0    1.5  -12.0         400

huangapple
  • 本文由 发表于 2023年7月28日 03:20:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76782812.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定