2023年2月16日 18:38:12go评论66阅读模式

英文:

How to use dictionary on np.where clause in pandas

问题

   id  time  col_a  col_b  col_c  col
0   1     1      1     -1     10    1
1   1     2      2     -2     20    2
2   1     3      3     -3     30    3
3   2     1      4     -4     40   -4
4   2     2      5     -5     50   -5
5   2     3      6     -6     60   60

The code uses the dictionary dict_cols_matching to map values in the 'col_id' column to corresponding column names. Then, it uses the map function to create the 'col' column. Finally, it fills the 'col' column with values from the corresponding columns using the apply method and drops the 'col_id' column if needed.

英文:

I have the following dataframe

import pandas as pd
foo = pd.DataFrame({&#39;id&#39;: [1,1,1,2,2,2],
                    &#39;time&#39;: [1,2,3,1,2,3],
             &#39;col_id&#39;: [&#39;ffp&#39;,&#39;ffp&#39;,&#39;ffp&#39;, &#39;hie&#39;, &#39;hie&#39;, &#39;ttt&#39;],
             &#39;col_a&#39;: [1,2,3,4,5,6],
             &#39;col_b&#39;: [-1,-2,-3,-4,-5,-6],
                &#39;col_c&#39;: [10,20,30,40,50,60]})

id  time col_id  col_a  col_b  col_c
0   1     1    ffp      1     -1     10
1   1     2    ffp      2     -2     20
2   1     3    ffp      3     -3     30
3   2     1    hie      4     -4     40
4   2     2    hie      5     -5     50
5   2     3    ttt      6     -6     60

I would like to create a new col in foo, which will take the value of either col_a or col_b or col_c, depending on the value of col_id.

I am doing the following:

foo[&#39;col&#39;] = np.where(foo.col_id == &quot;ffp&quot;, foo.col_a, 
                      np.where(foo.col_id == &quot;hie&quot;,foo.col_b, foo.col_c))

which gives

  id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

Since I have a lot of columns, I was wondering if there is a cleaner way to do that, with using a dictionary for example:

dict_cols_matching = {&quot;ffp&quot; : &quot;col_a&quot;, &quot;hie&quot;: &quot;col_b&quot;, &quot;ttt&quot;: &quot;col_c&quot;}

Any ideas ?

答案1

得分: 2

你可以在col_id上使用map函数将字典中的值映射，然后进行索引查找：

import numpy as np

idx, cols = pd.factorize(foo['col_id'].map(dict_cols_matching))

foo['col'] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]

输出：

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

英文:

You can map the values of the dictionary on col_id, then perform indexing lookup:

import numpy as np

idx, cols = pd.factorize(foo[&#39;col_id&#39;].map(dict_cols_matching))

foo[&#39;col&#39;] = foo.reindex(cols, axis=1).to_numpy()[np.arange(len(foo)), idx]

Output:

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

答案2

得分: 2

使用np.select函数将condition列表排列为choice列表：

foo['col'] = np.select([foo.col_id.eq("ffp"), foo.col_id.eq("hie"), foo.col_id.eq("ttt")],
                       [foo.col_a, foo.col_b, foo.col_c])

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

英文:

With np.select function to arrange condition list to choice list:

foo[&#39;col&#39;] = np.select([foo.col_id.eq(&quot;ffp&quot;), foo.col_id.eq(&quot;hie&quot;), foo.col_id.eq(&quot;ttt&quot;)],
                       [foo.col_a, foo.col_b, foo.col_c])

   id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

答案3

得分: 0

可以使用 Lambda 函数根据你的 ID 来选择列，但方法取决于列的顺序，如果更改顺序，请调整参数 3。

import pandas as pd
import numpy as np

foo = pd.DataFrame({'id': [1,1,1,2,2,2],
                    'time': [1,2,3,1,2,3],
                    'col_id': ['ffp','ffp','ffp', 'hie', 'hie', 'ttt'],
                    'col_a': [1,2,3,4,5,6],
                    'col_b': [-1,-2,-3,-4,-5,-6],
                    'col_c': [10,20,30,40,50,60]})

idSet = np.unique(foo['col_id'].to_numpy()).tolist()
foo['col'] = foo.apply(lambda x: x[idSet.index(x.col_id) + 3], axis=1)

输出：

    id  time col_id  col_a  col_b  col_c  col
0   1     1    ffp      1     -1     10    1
1   1     2    ffp      2     -2     20    2
2   1     3    ffp      3     -3     30    3
3   2     1    hie      4     -4     40   -4
4   2     2    hie      5     -5     50   -5
5   2     3    ttt      6     -6     60   60

英文:

You can use lambda function to select the column based on your id, but the method depends on the order of the columns, adjust the parameter 3 if you change the order.

import pandas as pd
import numpy as np

foo = pd.DataFrame({&#39;id&#39;: [1,1,1,2,2,2],
                    &#39;time&#39;: [1,2,3,1,2,3],
                    &#39;col_id&#39;: [&#39;ffp&#39;,&#39;ffp&#39;,&#39;ffp&#39;, &#39;hie&#39;, &#39;hie&#39;, &#39;ttt&#39;],
                    &#39;col_a&#39;: [1,2,3,4,5,6],
                    &#39;col_b&#39;: [-1,-2,-3,-4,-5,-6],
                    &#39;col_c&#39;: [10,20,30,40,50,60]})

idSet = np.unique(foo[&#39;col_id&#39;].to_numpy()).tolist()
foo[&#39;col&#39;] = foo.apply(lambda x: x[idSet.index(x.col_id)+3], axis=1)
display(foo)

> Output

	id	time	col_id	col_a	col_b	col_c	col
0	1	1	ffp	1	-1	10	1
1	1	2	ffp	2	-2	20	2
2	1	3	ffp	3	-3	30	3
3	2	1	hie	4	-4	40	-4
4	2	2	hie	5	-5	50	-5
5	2	3	ttt	6	-6	60	60

答案4

得分: 0

你可以结合使用reset_index和逐行应用来实现：

foo[["col_id"]].reset_index().apply(lambda u: foo.loc[u["index"], dict_cols_matching[u["col_id"]]], axis=1)

英文:

You might use a reset_index in combination with a rowwise apply:

foo[[&quot;col_id&quot;]].reset_index().apply(lambda u: foo.loc[u[&quot;index&quot;],dict_cols_matching[u[&quot;col_id&quot;]]], axis=1)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在pandas的`np.where`子句中使用字典

问题

答案1

答案2

答案3

答案4

Django 中的 related_name 参数名称冲突？

来自FLASK通过AJAX调用的结果在网页上的警报通知中没有返回结果

将复杂的字典扁平化，同时使用点作为分隔符压缩键。

Tkinter – GUI：用户文本输入，带有检查输入异常并关闭窗口的按钮。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论