英文:
Is there a way to optimize multiple numpy.where functions?
问题
你可以使用pandas
库的apply
函数和一个自定义的函数来更简洁地实现相同的结果。以下是一个示例代码:
# 定义一个函数来映射代码和颜色到新列
def map_combination(row):
if row["code"] == "500" and row["color"] == "blue":
return "1"
elif row["code"] == "500" and row["color"] == "green":
return "2"
elif row["code"] == "100" and row["color"] == "red":
return "3"
elif row["code"] == "42" and row["color"] == "yellow":
return "4"
# 在这里添加更多的条件和映射
else:
return row["new_column"]
# 使用apply函数应用映射函数到新列
df["new_column"] = df.apply(map_combination, axis=1)
这将更加清晰地处理多个组合的情况,避免了重复使用多个np.where
函数。你可以继续扩展map_combination
函数以处理更多的组合情况。
英文:
I am trying to optimize the following python code.
I have a dataframe that looks like this one
import pandas as pd
import numpy as np
data = {
"code": ["500","500","100","500","42", "100", "500"],
"color": ["blue", "green", "red", "blue", "yellow", "red", "green"]
}
df = pd.DataFrame(data)
df["new_column"] = ""
print(df)
code color new_column
0 500 blue
1 500 green
2 100 red
3 500 blue
4 42 yellow
5 100 red
6 500 green
The next step would be to populate the "new_colum" with a values which should be chosen according to the combination of values in the first two columns (e.g. if code = 500 and color = blue, new_colum = 1).
The (working) solution I have so far is the following
df["new_column"] = np.where((df["code"] == "500") & (df["color"] == "blue"), "1", df["new_column"])
df["new_column"] = np.where((df["code"] == "500") & (df["color"] == "green"), "2", df["new_column"])
df["new_column"] = np.where((df["code"] == "100") & (df["color"] == "red"), "3", df["new_column"])
df["new_column"] = np.where((df["code"] == "42") & (df["color"] == "yellow"), "4", df["new_column"])
which gives
code color new_column
0 500 blue 1
1 500 green 2
2 100 red 3
3 500 blue 1
4 42 yellow 4
5 100 red 3
6 500 green 2
my question is: is there a "more elegant" way to achieve the same result?
In this example I am using only four np.where
functions but I have several combinations of values that should be cover with this methods, resulting in a "wall of text" that I hope to slim somehow.
答案1
得分: 2
numpy.select
是用于连接多个条件的方法,但在你的情况下,使用字典和 merge
可能是最佳方法:
d = {('500', 'blue'): '1', ('500', 'green'): '2',
('100', 'red'): '3', ('42', 'yellow'): '4'}
cols = ['code', 'color']
df['new_column'] = df[cols].merge(pd.Series(d, name='X'), left_on=cols,
right_index=True, how='left')['X']
输出:
code color new_column
0 500 blue 1
1 500 green 2
2 100 red 3
3 500 blue 1
4 42 yellow 4
5 100 red 3
6 500 green 2
英文:
numpy.select
is the way to chain multiple conditions, but in your case using a dictionary and merge
is likely the best approach:
d = {('500', 'blue'): '1', ('500', 'green'): '2',
('100', 'red'): '3', ('42', 'yellow'): '4'}
cols = ['code', 'color']
df['new_column'] = df[cols].merge(pd.Series(d, name='X'), left_on=cols,
right_index=True, how='left')['X']
Output:
code color new_column
0 500 blue 1
1 500 green 2
2 100 red 3
3 500 blue 1
4 42 yellow 4
5 100 red 3
6 500 green 2
答案2
得分: 2
是的,有一种更优雅的方法来实现相同的结果。
mapping = {
("500", "蓝色"): "1",
("500", "绿色"): "2",
("100", "红色"): "3",
("42", "黄色"): "4"
}
df["新列"] = [mapping.get((code, color), "") for code, color in zip(df["编码"], df["颜色"])]
英文:
Yea,there is a more elegant way to achieve the same result
mapping = {
("500", "blue"): "1",
("500", "green"): "2",
("100", "red"): "3",
("42", "yellow"): "4"
}
df["new_column"] = [mapping.get((code, color), "") for code, color in zip(df["code"], df["color"])]
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论