英文:
New column in a pandas Dataframe based on a the index of minimum value of other columns
问题
对于数据分析需要(超过 500,000 条记录),我需要执行以下操作:
例如,假设我有以下数组:
RMI=[ 100, 200 ,300, 400 ]
以及以下数据框架:
ar = numpy.array([[-2.0, 2.0, -5, 0], [2.7, 10, 5.4, 7], [5.3, 9, 1.5, -12]])
df = pandas.DataFrame(ar, index = ['1', '2', '3'], columns = ['Seed1', 'Seed2', 'Seed3', 'Seed4'])
我需要在 df
中添加一个新列 RMI_number
,该列包含 RMI 中索引对应于 Seed1、Seed2、Seed3、Seed4 的最小值的值。
在我的示例中,我希望有以下结果:
df['RMI_number']=[300, 100, 400]
300,因为第一行的最小值在第三个位置(-5 值)
100,因为第二行的最小值在第一个位置(2.7)
400,因为第三行的最小值在第四个位置(-12)
要以最高效的方式执行此操作,您可以尝试使用 idxmin
或 groupby
,但是未获得预期结果。
英文:
For data analysis need (more than 500 000 records), I need to do the following :
As an example, let's say than I have the following array :
RMI=[ 100, 200 ,300, 400 ]
and the following dataframe :
ar = numpy.array([[-2.0, 2.0, -5, 0], [2.7, 10, 5.4, 7], [5.3, 9, 1.5, -12]])
df = pandas.DataFrame(ar, index = ['1', '2', '3'], columns = ['Seed1', 'Seed2', 'Seed3', 'Seed4'])
I need a new column RMI_number
in df
, that contains the value in RMI whose index corresponds to the min value between Seed1,Seed2,Seed3,Seed4.
In my example, I expect to have
df['RMI_number']=[300, 100, 400]
300 because the minimum value for first line is in third position (-5 value)
100 becasue the minimum value for second line is in first position (2.7)
400 because the minium value for third line is in fourth position (-12)
What would be the most efficient way to do that ?
I've tried a lot of option base on idxmin
, or groupby
but didn't get what I expected.
答案1
得分: 1
使用 numpy.argmin
:
df['RMI_number'] = np.array(RMI)[np.argmin(df.to_numpy(), axis=1)]
或者,如果你有其他列:
df['RMI_number'] = np.array(RMI)[np.argmin(df.filter(like='Seed').to_numpy(), axis=1)]
输出:
Seed1 Seed2 Seed3 Seed4 RMI_number
1 -2.0 2.0 -5.0 0.0 300
2 2.7 10.0 5.4 7.0 100
3 5.3 9.0 1.5 -12.0 400
英文:
Use numpy.argmin
:
df['RMI_number'] = np.array(RMI)[np.argmin(df.to_numpy(), axis=1)]
Or, if you have other columns:
df['RMI_number'] = np.array(RMI)[np.argmin(df.filter(like='Seed').to_numpy(), axis=1)]
Output:
Seed1 Seed2 Seed3 Seed4 RMI_number
1 -2.0 2.0 -5.0 0.0 300
2 2.7 10.0 5.4 7.0 100
3 5.3 9.0 1.5 -12.0 400
答案2
得分: 1
使用 np.argmin
在数据框数值上的另一种简短版本:
np.array(RMI)[df.values.argmin(1)]
英文:
Another short version of using np.argmin
on dataframe values:
np.array(RMI)[df.values.argmin(1)]
答案3
得分: 0
你可以沿着行使用 idxmin
,然后按需提取“Seed”名称:
df["RMI_number"] = df.idxmin(axis=1).str.lstrip("Seed").astype(int).mul(100)
或者,转置并使用 reset_index()
重置列名,然后使用 idxmin
:
df["RMI_number"] = df.T.reset_index(drop=True).idxmin().add(1).mul(100)
df
Seed1 Seed2 Seed3 Seed4 RMI_number
1 -2.0 2.0 -5.0 0.0 300
2 2.7 10.0 5.4 7.0 100
3 5.3 9.0 1.5 -12.0 400
英文:
You can use idxmin
along rows and then extract the "Seed" name as required:
df["RMI_number"] = df.idxmin(axis=1).str.lstrip("Seed").astype(int).mul(100)
Alternatively, transpose and reset_index() to reset the column names and then use idxmin
:
df["RMI_number"] = df.T.reset_index(drop=True).idxmin().add(1).mul(100)
>>> df
Seed1 Seed2 Seed3 Seed4 RMI_number
1 -2.0 2.0 -5.0 0.0 300
2 2.7 10.0 5.4 7.0 100
3 5.3 9.0 1.5 -12.0 400
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论