英文:
Python Dataframe compare column values with a list and produce output with matching
问题
# 请注意:这是您所需的翻译部分,不包括代码部分。
我有一个以年月为索引的数据框。我想根据样本采集的年份为数据框分配颜色。
import matplotlib.colors as mcolors
colors_list = list(mcolors.XKCD_COLORS.keys())
colors_list =
['xkcd:cloudy blue',
'xkcd:dark pastel green',
'xkcd:dust',
'xkcd:electric lime',
'xkcd:fresh green',
'xkcd:light eggplant'
.....
]
df =
sensor_value Year Month
0 5171.318942 2002 4
1 5085.094086 2002 5
3 5685.681944 2004 6
4 6097.877688 2006 7
5 6063.909946 2003 8
.....
years_list = df['Year'].unique().tolist()
req_colors_list = colors_list[:len(years_list)]
df['year_color'] = df['Year'].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))
Present output:
<lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda>
Year
2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2003 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
Expected output:
2002 'xkcd:cloudy blue'
2002 'xkcd:cloudy blue'
2006 'xkcd:fresh green'
2006 'xkcd:fresh green'
2003
英文:
I have a dataframe with year-month as index. I want to assign a color to the dataframe based on the year the sample was collected.
import matplotlib.colors as mcolors
colors_list = list(mcolors.XKCD_COLORS.keys())
colors_list =
['xkcd:cloudy blue',
'xkcd:dark pastel green',
'xkcd:dust',
'xkcd:electric lime',
'xkcd:fresh green',
'xkcd:light eggplant'
........
]
df =
sensor_value Year Month
0 5171.318942 2002 4
1 5085.094086 2002 5
3 5685.681944 2004 6
4 6097.877688 2006 7
5 6063.909946 2003 8
.....
years_list = df['Year'].unique().tolist()
req_colors_list = colors_list[:len(years_list)]
df['year_color'] = df['Year'].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))
Present output:
<lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda>
Year
2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
2003 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
... ... ... ... ... ... ... ... ... ... ...
Expected output:
2002 'xkcd:cloudy blue'
2002 'xkcd:cloudy blue'
2006 'xkcd:fresh green'
2006 'xkcd:fresh green'
2003
答案1
得分: 2
要根据样本的年份为DataFrame分配颜色,您可以修改您的lambda函数:
df['year_color'] = df['Year'].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)
这个lambda函数检查年份x是否存在于years_list中。如果存在,它将使用索引从req_colors_list中检索相应的颜色。否则,它将分配np.nan来表示缺失的值。
由于colors_list包含有限数量的颜色,会有多个年份具有相同的颜色的情况。
英文:
To assign colors to the DataFrame based on the year of the sample, you can modify your lambda function:
df['year_color'] = df['Year'].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)
This lambda function checks if the year x is present in the years_list. If it is, it retrieves the corresponding color from the req_colors_list using the index. Otherwise, it assigns np.nan to indicate missing values.
Because the colors_list contains a limited number of colors, there will be cases where multiple years have the same color.
答案2
得分: 1
使用Series.map
和由zip
生成的字典:
df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print(df)
sensor_value Year Month year_color
0 5171.318942 2002 4 xkcd:cloudy blue
1 5085.094086 2002 5 xkcd:cloudy blue
3 5685.681944 2004 6 xkcd:dark pastel green
4 6097.877688 2006 7 xkcd:dust
5 6063.909946 2003 8 xkcd:electric lime
如果唯一年份的数量少于列数,map
会生成NaN
:
colors_list = ['xkcd:cloudy blue', 'xkcd:dark pastel green', 'xkcd:dust']
years_list = df['Year'].unique().tolist()
df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print(df)
sensor_value Year Month year_color
0 5171.318942 2002 4 xkcd:cloudy blue
1 5085.094086 2002 5 xkcd:cloudy blue
3 5685.681944 2004 6 xkcd:dark pastel green
4 6097.877688 2006 7 xkcd:dust
5 6063.909946 2003 8 NaN
英文:
Use Series.map
by dictionary generated by zip
:
df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print (df)
sensor_value Year Month year_color
0 5171.318942 2002 4 xkcd:cloudy blue
1 5085.094086 2002 5 xkcd:cloudy blue
3 5685.681944 2004 6 xkcd:dark pastel green
4 6097.877688 2006 7 xkcd:dust
5 6063.909946 2003 8 xkcd:electric lime
If number of unique years is less like number of column, map
generate NaN
s:
colors_list =['xkcd:cloudy blue',
'xkcd:dark pastel green',
'xkcd:dust']
years_list = df['Year'].unique().tolist()
df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print (df)
sensor_value Year Month year_color
0 5171.318942 2002 4 xkcd:cloudy blue
1 5085.094086 2002 5 xkcd:cloudy blue
3 5685.681944 2004 6 xkcd:dark pastel green
4 6097.877688 2006 7 xkcd:dust
5 6063.909946 2003 8 NaN
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论