2023年6月27日 20:20:41go评论80阅读模式

英文:

Is there a way to optimize multiple numpy.where functions?

问题

你可以使用pandas库的apply函数和一个自定义的函数来更简洁地实现相同的结果。以下是一个示例代码：

# 定义一个函数来映射代码和颜色到新列
def map_combination(row):
    if row["code"] == "500" and row["color"] == "blue":
        return "1"
    elif row["code"] == "500" and row["color"] == "green":
        return "2"
    elif row["code"] == "100" and row["color"] == "red":
        return "3"
    elif row["code"] == "42" and row["color"] == "yellow":
        return "4"
    # 在这里添加更多的条件和映射
    else:
        return row["new_column"]

# 使用apply函数应用映射函数到新列
df["new_column"] = df.apply(map_combination, axis=1)

这将更加清晰地处理多个组合的情况，避免了重复使用多个np.where函数。你可以继续扩展map_combination函数以处理更多的组合情况。

英文:

I am trying to optimize the following python code.

I have a dataframe that looks like this one

import pandas as pd
import numpy as np

data = {
        &quot;code&quot;: [&quot;500&quot;,&quot;500&quot;,&quot;100&quot;,&quot;500&quot;,&quot;42&quot;, &quot;100&quot;, &quot;500&quot;],
        &quot;color&quot;: [&quot;blue&quot;, &quot;green&quot;, &quot;red&quot;, &quot;blue&quot;, &quot;yellow&quot;, &quot;red&quot;, &quot;green&quot;]
        }

df = pd.DataFrame(data)
df[&quot;new_column&quot;] = &quot;&quot;


print(df)

  code   color new_column
0  500    blue           
1  500   green           
2  100     red           
3  500    blue           
4   42  yellow           
5  100     red           
6  500   green

The next step would be to populate the "new_colum" with a values which should be chosen according to the combination of values in the first two columns (e.g. if code = 500 and color = blue, new_colum = 1).

The (working) solution I have so far is the following

df[&quot;new_column&quot;] = np.where((df[&quot;code&quot;] == &quot;500&quot;) &amp; (df[&quot;color&quot;] == &quot;blue&quot;), &quot;1&quot;, df[&quot;new_column&quot;])
df[&quot;new_column&quot;] = np.where((df[&quot;code&quot;] == &quot;500&quot;) &amp; (df[&quot;color&quot;] == &quot;green&quot;), &quot;2&quot;, df[&quot;new_column&quot;])
df[&quot;new_column&quot;] = np.where((df[&quot;code&quot;] == &quot;100&quot;) &amp; (df[&quot;color&quot;] == &quot;red&quot;), &quot;3&quot;, df[&quot;new_column&quot;])
df[&quot;new_column&quot;] = np.where((df[&quot;code&quot;] == &quot;42&quot;) &amp; (df[&quot;color&quot;] == &quot;yellow&quot;), &quot;4&quot;, df[&quot;new_column&quot;])

which gives

  code   color new_column
0  500    blue          1
1  500   green          2
2  100     red          3
3  500    blue          1
4   42  yellow          4
5  100     red          3
6  500   green          2

my question is: is there a "more elegant" way to achieve the same result?
In this example I am using only four np.where functions but I have several combinations of values that should be cover with this methods, resulting in a "wall of text" that I hope to slim somehow.

答案1

得分: 2

numpy.select 是用于连接多个条件的方法，但在你的情况下，使用字典和 merge 可能是最佳方法：

d = {('500', 'blue'): '1', ('500', 'green'): '2',
     ('100', 'red'): '3', ('42', 'yellow'): '4'}

cols = ['code', 'color']
df['new_column'] = df[cols].merge(pd.Series(d, name='X'), left_on=cols,
                                  right_index=True, how='left')['X']

输出：

  code   color new_column
0  500    blue          1
1  500   green          2
2  100     red          3
3  500    blue          1
4   42  yellow          4
5  100     red          3
6  500   green          2

英文:

numpy.select is the way to chain multiple conditions, but in your case using a dictionary and merge is likely the best approach:

d = {(&#39;500&#39;, &#39;blue&#39;): &#39;1&#39;, (&#39;500&#39;, &#39;green&#39;): &#39;2&#39;,
     (&#39;100&#39;, &#39;red&#39;): &#39;3&#39;, (&#39;42&#39;, &#39;yellow&#39;): &#39;4&#39;}

cols = [&#39;code&#39;, &#39;color&#39;]
df[&#39;new_column&#39;] = df[cols].merge(pd.Series(d, name=&#39;X&#39;), left_on=cols,
                                  right_index=True, how=&#39;left&#39;)[&#39;X&#39;]

Output:

  code   color new_column
0  500    blue          1
1  500   green          2
2  100     red          3
3  500    blue          1
4   42  yellow          4
5  100     red          3
6  500   green          2

答案2

得分: 2

是的，有一种更优雅的方法来实现相同的结果。

    mapping = {
            ("500", "蓝色"): "1",
            ("500", "绿色"): "2",
            ("100", "红色"): "3",
            ("42", "黄色"): "4"
        }
    df["新列"] = [mapping.get((code, color), "") for code, color in zip(df["编码"], df["颜色"])]

英文:

Yea,there is a more elegant way to achieve the same result

mapping = {
        (&quot;500&quot;, &quot;blue&quot;): &quot;1&quot;,
        (&quot;500&quot;, &quot;green&quot;): &quot;2&quot;,
        (&quot;100&quot;, &quot;red&quot;): &quot;3&quot;,
        (&quot;42&quot;, &quot;yellow&quot;): &quot;4&quot;
    }
df[&quot;new_column&quot;] = [mapping.get((code, color), &quot;&quot;) for code, color in zip(df[&quot;code&quot;], df[&quot;color&quot;])]

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

有没有一种方法来优化多个numpy.where函数？

问题

答案1

答案2

两个进程实时从/向同一个文件进行读写

如何使用DeltaTable API在PySpark中设置Delta表的表属性。

升级自托管的 Python 包，从 Python 会话中进行。

Python到可执行文件的大小优化

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论