如何在pandas的`np.where`子句中使用字典

huangapple go评论66阅读模式
英文:

How to use dictionary on np.where clause in pandas

问题

   id  time  col_a  col_b  col_c  col
0   1     1      1     -1     10    1
1   1     2      2     -2     20    2
2   1     3      3     -3     30    3
3   2     1      4     -4     40   -4
4   2     2      5     -5     50   -5
5   2     3      6     -6     60   60

The code uses the dictionary dict_cols_matching to map values in the 'col_id' column to corresponding column names. Then, it uses the map function to create the 'col' column. Finally, it fills the 'col' column with values from the corresponding columns using the apply method and drops the 'col_id' column if needed.

英文:

I have the following dataframe

import pandas as pd
foo = pd.DataFrame({'id': [1,1,1,2,2,2],
                    'time': [1,2,3,1,2,3],
             'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
             'col_a': [1,2,3,4,5,6],
             'col_b': [-1,-2,-3,-4,-5,-6],
                'col_c': [10,20,30,40,50,60]})

id  time col_id  col_a  col_b  col_c
0   1     1    ffp      1     -1     10
1   1     2    ffp      2     -2     20
2   1     3    ffp      3     -3     30
3   2     1    hie      4     -4     40
4   2     2    hie      5     -5     50
5   2     3    ttt      6     -6     60

I would like to create a new col in foo, which will take the value of either col_a or col_b or col_c, depending on the value of col_id.

I am doing the following:

foo['col'] = np.where(foo.col_id == "ffp", foo.col_a, 
                      np.where(foo.col_id == "hie",foo.col_b, foo.col_c))

which gives

  id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

Since I have a lot of columns, I was wondering if there is a cleaner way to do that, with using a dictionary for example:

dict_cols_matching = {"ffp" : "col_a", "hie": "col_b", "ttt": "col_c"}

Any ideas ?

答案1

得分: 2

你可以在col_id上使用map函数将字典中的值映射,然后进行索引查找

import numpy as np

idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))

foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]

输出:

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60
英文:

You can map the values of the dictionary on col_id, then perform indexing lookup:

import numpy as np

idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))

foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]

Output:

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

答案2

得分: 2

使用np.select函数将condition列表排列为choice列表:

foo['col'] = np.select([foo.col_id.eq("ffp"), foo.col_id.eq("hie"), foo.col_id.eq("ttt")],
                       [foo.col_a, foo.col_b, foo.col_c])

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60
英文:

With np.select function to arrange condition list to choice list:

foo['col'] = np.select([foo.col_id.eq("ffp"), foo.col_id.eq("hie"), foo.col_id.eq("ttt")],
                       [foo.col_a, foo.col_b, foo.col_c])

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

答案3

得分: 0

可以使用 Lambda 函数根据你的 ID 来选择列,但方法取决于列的顺序,如果更改顺序,请调整参数 3。

import pandas as pd
import numpy as np

foo = pd.DataFrame({'id': [1,1,1,2,2,2],
                    'time': [1,2,3,1,2,3],
                    'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
                    'col_a': [1,2,3,4,5,6],
                    'col_b': [-1,-2,-3,-4,-5,-6],
                    'col_c': [10,20,30,40,50,60]})

idSet = np.unique(foo['col_id'].to_numpy()).tolist()
foo['col'] = foo.apply(lambda x: x[idSet.index(x.col_id) + 3], axis=1)

输出:

    id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60
英文:

You can use lambda function to select the column based on your id, but the method depends on the order of the columns, adjust the parameter 3 if you change the order.

import pandas as pd
import numpy as np

foo = pd.DataFrame({'id': [1,1,1,2,2,2],
                    'time': [1,2,3,1,2,3],
                    'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
                    'col_a': [1,2,3,4,5,6],
                    'col_b': [-1,-2,-3,-4,-5,-6],
                    'col_c': [10,20,30,40,50,60]})

idSet = np.unique(foo['col_id'].to_numpy()).tolist()
foo['col'] = foo.apply(lambda x: x[idSet.index(x.col_id)+3], axis=1)
display(foo)

> Output

	id	time	col_id	col_a	col_b	col_c	col
0	1	1	ffp	1	-1	10	1
1	1	2	ffp	2	-2	20	2
2	1	3	ffp	3	-3	30	3
3	2	1	hie	4	-4	40	-4
4	2	2	hie	5	-5	50	-5
5	2	3	ttt	6	-6	60	60

答案4

得分: 0

你可以结合使用reset_index和逐行应用来实现:

foo[["col_id"]].reset_index().apply(lambda u: foo.loc[u["index"], dict_cols_matching[u["col_id"]]], axis=1)
英文:

You might use a reset_index in combination with a rowwise apply:

foo[["col_id"]].reset_index().apply(lambda u: foo.loc[u["index"],dict_cols_matching[u["col_id"]]], axis=1)

huangapple
  • 本文由 发表于 2023年2月16日 18:38:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75471037.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定