2023年6月16日 14:22:35go评论129阅读模式

英文:

Python Dataframe compare column values with a list and produce output with matching

问题

# 请注意：这是您所需的翻译部分，不包括代码部分。
我有一个以年月为索引的数据框。我想根据样本采集的年份为数据框分配颜色。
import matplotlib.colors as mcolors
colors_list = list(mcolors.XKCD_COLORS.keys())
colors_list =
['xkcd:cloudy blue',
 'xkcd:dark pastel green',
 'xkcd:dust',
 'xkcd:electric lime',
 'xkcd:fresh green',
 'xkcd:light eggplant'
.....
]
df =           
   sensor_value 	Year 	Month
0 	5171.318942 	2002 	4
1 	5085.094086 	2002 	5
3 	5685.681944 	2004 	6
4 	6097.877688 	2006 	7
5 	6063.909946 	2003 	8
.....
years_list = df['Year'].unique().tolist()
req_colors_list = colors_list[:len(years_list)]
df['year_color'] = df['Year'].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))
Present output: 
<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda> 	<lambda>
Year 										
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2003 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
... 	... 	... 	... 	... 	... 	... 	... 	... 	... 	...
Expected output: 
2002   'xkcd:cloudy blue'
2002   'xkcd:cloudy blue'
2006   'xkcd:fresh green'
2006   'xkcd:fresh green'
2003

英文:

I have a dataframe with year-month as index. I want to assign a color to the dataframe based on the year the sample was collected.

import matplotlib.colors as mcolors
colors_list = list(mcolors.XKCD_COLORS.keys())
colors_list =
[&#39;xkcd:cloudy blue&#39;,
 &#39;xkcd:dark pastel green&#39;,
 &#39;xkcd:dust&#39;,
 &#39;xkcd:electric lime&#39;,
 &#39;xkcd:fresh green&#39;,
 &#39;xkcd:light eggplant&#39;
........
]
df =           
   sensor_value 	Year 	Month
0 	5171.318942 	2002 	4
1 	5085.094086 	2002 	5
3 	5685.681944 	2004 	6
4 	6097.877688 	2006 	7
5 	6063.909946 	2003 	8
.....
years_list = df[&#39;Year&#39;].unique().tolist()
req_colors_list = colors_list[:len(years_list)]
df[&#39;year_color&#39;] = df[&#39;Year&#39;].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))

Present output:

&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt; 	&lt;lambda&gt;
Year 										
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2002 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2006 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
2003 	tab:blue 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN 	NaN
... 	... 	... 	... 	... 	... 	... 	... 	... 	... 	...

Expected output:

2002   &#39;xkcd:cloudy blue&#39;
2002   &#39;xkcd:cloudy blue&#39;
2006   &#39;xkcd:fresh green&#39;
2006   &#39;xkcd:fresh green&#39;
2003

答案1

得分: 2

要根据样本的年份为DataFrame分配颜色，您可以修改您的lambda函数：

df['year_color'] = df['Year'].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)

这个lambda函数检查年份x是否存在于years_list中。如果存在，它将使用索引从req_colors_list中检索相应的颜色。否则，它将分配np.nan来表示缺失的值。

由于colors_list包含有限数量的颜色，会有多个年份具有相同的颜色的情况。

英文:

To assign colors to the DataFrame based on the year of the sample, you can modify your lambda function:

df[&#39;year_color&#39;] = df[&#39;Year&#39;].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)

This lambda function checks if the year x is present in the years_list. If it is, it retrieves the corresponding color from the req_colors_list using the index. Otherwise, it assigns np.nan to indicate missing values.

Because the colors_list contains a limited number of colors, there will be cases where multiple years have the same color.

答案2

得分: 1

使用Series.map和由zip生成的字典：

df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print(df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8      xkcd:electric lime

如果唯一年份的数量少于列数，map会生成NaN：

colors_list = ['xkcd:cloudy blue', 'xkcd:dark pastel green', 'xkcd:dust']
years_list = df['Year'].unique().tolist()
df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
print(df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8                     NaN

英文:

Use Series.map by dictionary generated by zip:

df[&#39;year_color&#39;] = df[&#39;Year&#39;].map(dict(zip(years_list, colors_list)))
print (df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8      xkcd:electric lime

If number of unique years is less like number of column, map generate NaNs:

colors_list =[&#39;xkcd:cloudy blue&#39;,
              &#39;xkcd:dark pastel green&#39;,
              &#39;xkcd:dust&#39;]
years_list = df[&#39;Year&#39;].unique().tolist()
df[&#39;year_color&#39;] = df[&#39;Year&#39;].map(dict(zip(years_list, colors_list)))
print (df)
   sensor_value  Year  Month              year_color
0   5171.318942  2002      4        xkcd:cloudy blue
1   5085.094086  2002      5        xkcd:cloudy blue
3   5685.681944  2004      6  xkcd:dark pastel green
4   6097.877688  2006      7               xkcd:dust
5   6063.909946  2003      8                     NaN

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Python数据框比较列值与列表并生成匹配的输出

问题

答案1

答案2

添加一个数据目录，放在Python包目录之外。

Pip在Windows 10上的路径问题

what does the keyword "\n" do in python? I don't know what it means

Pip将包安装在错误的目录中。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。