Python数据框比较列值与列表并生成匹配的输出

huangapple go评论90阅读模式
英文:

Python Dataframe compare column values with a list and produce output with matching

问题

# 请注意:这是您所需的翻译部分,不包括代码部分。
我有一个以年月为索引的数据框我想根据样本采集的年份为数据框分配颜色

import matplotlib.colors as mcolors
colors_list = list(mcolors.XKCD_COLORS.keys())
colors_list =
['xkcd:cloudy blue',
 'xkcd:dark pastel green',
 'xkcd:dust',
 'xkcd:electric lime',
 'xkcd:fresh green',
 'xkcd:light eggplant'
.....
]

df =           
   sensor_value 	Year 	Month
0 	5171.318942 	2002 	4
1 	5085.094086 	2002 	5
3 	5685.681944 	2004 	6
4 	6097.877688 	2006 	7
5 	6063.909946 	2003 	8
.....
years_list = df['Year'].unique().tolist()
req_colors_list = colors_list[:len(years_list)]

df['year_color'] = df['Year'].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))

Present output: 

<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda>
Year 										
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2003 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
... 	... 	... 	... 	... 	... 	... 	... 	... 	... 	...

Expected output: 

2002   'xkcd:cloudy blue'
2002   'xkcd:cloudy blue'
2006   'xkcd:fresh green'
2006   'xkcd:fresh green'
2003 
英文:

I have a dataframe with year-month as index. I want to assign a color to the dataframe based on the year the sample was collected.

import matplotlib.colors as mcolors
colors_list = list(mcolors.XKCD_COLORS.keys())
colors_list =
[&#39;xkcd:cloudy blue&#39;,
 &#39;xkcd:dark pastel green&#39;,
 &#39;xkcd:dust&#39;,
 &#39;xkcd:electric lime&#39;,
 &#39;xkcd:fresh green&#39;,
 &#39;xkcd:light eggplant&#39;
........
]

df =           
   sensor_value 	Year 	Month
0 	5171.318942 	2002 	4
1 	5085.094086 	2002 	5
3 	5685.681944 	2004 	6
4 	6097.877688 	2006 	7
5 	6063.909946 	2003 	8
.....
years_list = df[&#39;Year&#39;].unique().tolist()
req_colors_list = colors_list[:len(years_list)]

df[&#39;year_color&#39;] = df[&#39;Year&#39;].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))

Present output:

&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt;
Year 										
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2003 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
... 	... 	... 	... 	... 	... 	... 	... 	... 	... 	...

Expected output:

2002   &#39;xkcd:cloudy blue&#39;
2002   &#39;xkcd:cloudy blue&#39;
2006   &#39;xkcd:fresh green&#39;
2006   &#39;xkcd:fresh green&#39;
2003 

答案1

得分: 2

要根据样本的年份为DataFrame分配颜色,您可以修改您的lambda函数:

df['year_color'] = df['Year'].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)

这个lambda函数检查年份x是否存在于years_list中。如果存在,它将使用索引从req_colors_list中检索相应的颜色。否则,它将分配np.nan来表示缺失的值。

由于colors_list包含有限数量的颜色,会有多个年份具有相同的颜色的情况。

英文:

To assign colors to the DataFrame based on the year of the sample, you can modify your lambda function:

df[&#39;year_color&#39;] = df[&#39;Year&#39;].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)

This lambda function checks if the year x is present in the years_list. If it is, it retrieves the corresponding color from the req_colors_list using the index. Otherwise, it assigns np.nan to indicate missing values.

Because the colors_list contains a limited number of colors, there will be cases where multiple years have the same color.

答案2

得分: 1

使用Series.map和由zip生成的字典:

df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print(df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8      xkcd:electric lime

如果唯一年份的数量少于列数,map会生成NaN

colors_list = ['xkcd:cloudy blue', 'xkcd:dark pastel green', 'xkcd:dust']

years_list = df['Year'].unique().tolist()

df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print(df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8                     NaN
英文:

Use Series.map by dictionary generated by zip:

df[&#39;year_color&#39;] = df[&#39;Year&#39;].map(dict(zip(years_list, colors_list)))
print (df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8      xkcd:electric lime

If number of unique years is less like number of column, map generate NaNs:

colors_list =[&#39;xkcd:cloudy blue&#39;,
              &#39;xkcd:dark pastel green&#39;,
              &#39;xkcd:dust&#39;]

years_list = df[&#39;Year&#39;].unique().tolist()

df[&#39;year_color&#39;] = df[&#39;Year&#39;].map(dict(zip(years_list, colors_list)))
print (df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8                     NaN

huangapple
  • 本文由 发表于 2023年6月16日 14:22:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76487429.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定