有没有一种方法来优化多个numpy.where函数?

huangapple go评论80阅读模式
英文:

Is there a way to optimize multiple numpy.where functions?

问题

你可以使用pandas库的apply函数和一个自定义的函数来更简洁地实现相同的结果。以下是一个示例代码:

# 定义一个函数来映射代码和颜色到新列
def map_combination(row):
    if row["code"] == "500" and row["color"] == "blue":
        return "1"
    elif row["code"] == "500" and row["color"] == "green":
        return "2"
    elif row["code"] == "100" and row["color"] == "red":
        return "3"
    elif row["code"] == "42" and row["color"] == "yellow":
        return "4"
    # 在这里添加更多的条件和映射
    else:
        return row["new_column"]

# 使用apply函数应用映射函数到新列
df["new_column"] = df.apply(map_combination, axis=1)

这将更加清晰地处理多个组合的情况,避免了重复使用多个np.where函数。你可以继续扩展map_combination函数以处理更多的组合情况。

英文:

I am trying to optimize the following python code.

I have a dataframe that looks like this one

import pandas as pd
import numpy as np

data = {
        "code": ["500","500","100","500","42", "100", "500"],
        "color": ["blue", "green", "red", "blue", "yellow", "red", "green"]
        }

df = pd.DataFrame(data)
df["new_column"] = ""


print(df)

  code   color new_column
0  500    blue           
1  500   green           
2  100     red           
3  500    blue           
4   42  yellow           
5  100     red           
6  500   green                    

The next step would be to populate the "new_colum" with a values which should be chosen according to the combination of values in the first two columns (e.g. if code = 500 and color = blue, new_colum = 1).

The (working) solution I have so far is the following

df["new_column"] = np.where((df["code"] == "500") & (df["color"] == "blue"), "1", df["new_column"])
df["new_column"] = np.where((df["code"] == "500") & (df["color"] == "green"), "2", df["new_column"])
df["new_column"] = np.where((df["code"] == "100") & (df["color"] == "red"), "3", df["new_column"])
df["new_column"] = np.where((df["code"] == "42") & (df["color"] == "yellow"), "4", df["new_column"])

which gives

  code   color new_column
0  500    blue          1
1  500   green          2
2  100     red          3
3  500    blue          1
4   42  yellow          4
5  100     red          3
6  500   green          2

my question is: is there a "more elegant" way to achieve the same result?
In this example I am using only four np.where functions but I have several combinations of values that should be cover with this methods, resulting in a "wall of text" that I hope to slim somehow.

答案1

得分: 2

numpy.select 是用于连接多个条件的方法,但在你的情况下,使用字典和 merge 可能是最佳方法:

d = {('500', 'blue'): '1', ('500', 'green'): '2',
     ('100', 'red'): '3', ('42', 'yellow'): '4'}

cols = ['code', 'color']
df['new_column'] = df[cols].merge(pd.Series(d, name='X'), left_on=cols,
                                  right_index=True, how='left')['X']

输出:

  code   color new_column
0  500    blue          1
1  500   green          2
2  100     red          3
3  500    blue          1
4   42  yellow          4
5  100     red          3
6  500   green          2
英文:

numpy.select is the way to chain multiple conditions, but in your case using a dictionary and merge is likely the best approach:

d = {('500', 'blue'): '1', ('500', 'green'): '2',
     ('100', 'red'): '3', ('42', 'yellow'): '4'}

cols = ['code', 'color']
df['new_column'] = df[cols].merge(pd.Series(d, name='X'), left_on=cols,
                                  right_index=True, how='left')['X']

Output:

  code   color new_column
0  500    blue          1
1  500   green          2
2  100     red          3
3  500    blue          1
4   42  yellow          4
5  100     red          3
6  500   green          2

答案2

得分: 2

是的,有一种更优雅的方法来实现相同的结果。

    mapping = {
            ("500", "蓝色"): "1",
            ("500", "绿色"): "2",
            ("100", "红色"): "3",
            ("42", "黄色"): "4"
        }
    df["新列"] = [mapping.get((code, color), "") for code, color in zip(df["编码"], df["颜色"])]
英文:

Yea,there is a more elegant way to achieve the same result

mapping = {
        ("500", "blue"): "1",
        ("500", "green"): "2",
        ("100", "red"): "3",
        ("42", "yellow"): "4"
    }
df["new_column"] = [mapping.get((code, color), "") for code, color in zip(df["code"], df["color"])]

huangapple
  • 本文由 发表于 2023年6月27日 20:20:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76564801.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定