Python数据框比较列值与列表并生成匹配的输出

huangapple go评论119阅读模式
英文:

Python Dataframe compare column values with a list and produce output with matching

问题

  1. # 请注意:这是您所需的翻译部分,不包括代码部分。
  2. 我有一个以年月为索引的数据框我想根据样本采集的年份为数据框分配颜色
  3. import matplotlib.colors as mcolors
  4. colors_list = list(mcolors.XKCD_COLORS.keys())
  5. colors_list =
  6. ['xkcd:cloudy blue',
  7. 'xkcd:dark pastel green',
  8. 'xkcd:dust',
  9. 'xkcd:electric lime',
  10. 'xkcd:fresh green',
  11. 'xkcd:light eggplant'
  12. .....
  13. ]
  14. df =
  15. sensor_value Year Month
  16. 0 5171.318942 2002 4
  17. 1 5085.094086 2002 5
  18. 3 5685.681944 2004 6
  19. 4 6097.877688 2006 7
  20. 5 6063.909946 2003 8
  21. .....
  22. years_list = df['Year'].unique().tolist()
  23. req_colors_list = colors_list[:len(years_list)]
  24. df['year_color'] = df['Year'].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))
  25. Present output:
  26. <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda> <lambda>
  27. Year
  28. 2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  29. 2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  30. 2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  31. 2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  32. 2003 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  33. ... ... ... ... ... ... ... ... ... ... ...
  34. Expected output:
  35. 2002 'xkcd:cloudy blue'
  36. 2002 'xkcd:cloudy blue'
  37. 2006 'xkcd:fresh green'
  38. 2006 'xkcd:fresh green'
  39. 2003
英文:

I have a dataframe with year-month as index. I want to assign a color to the dataframe based on the year the sample was collected.

  1. import matplotlib.colors as mcolors
  2. colors_list = list(mcolors.XKCD_COLORS.keys())
  3. colors_list =
  4. [&#39;xkcd:cloudy blue&#39;,
  5. &#39;xkcd:dark pastel green&#39;,
  6. &#39;xkcd:dust&#39;,
  7. &#39;xkcd:electric lime&#39;,
  8. &#39;xkcd:fresh green&#39;,
  9. &#39;xkcd:light eggplant&#39;
  10. ........
  11. ]
  12. df =
  13. sensor_value Year Month
  14. 0 5171.318942 2002 4
  15. 1 5085.094086 2002 5
  16. 3 5685.681944 2004 6
  17. 4 6097.877688 2006 7
  18. 5 6063.909946 2003 8
  19. .....
  20. years_list = df[&#39;Year&#39;].unique().tolist()
  21. req_colors_list = colors_list[:len(years_list)]
  22. df[&#39;year_color&#39;] = df[&#39;Year&#39;].apply(lambda x: clr if x==year else np.nan for year,clr in zip(years_list,req_colors_list))

Present output:

  1. &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt; &lt;lambda&gt;
  2. Year
  3. 2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  4. 2002 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  5. 2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  6. 2006 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  7. 2003 tab:blue NaN NaN NaN NaN NaN NaN NaN NaN NaN
  8. ... ... ... ... ... ... ... ... ... ... ...

Expected output:

  1. 2002 &#39;xkcd:cloudy blue&#39;
  2. 2002 &#39;xkcd:cloudy blue&#39;
  3. 2006 &#39;xkcd:fresh green&#39;
  4. 2006 &#39;xkcd:fresh green&#39;
  5. 2003

答案1

得分: 2

要根据样本的年份为DataFrame分配颜色,您可以修改您的lambda函数:

  1. df['year_color'] = df['Year'].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)

这个lambda函数检查年份x是否存在于years_list中。如果存在,它将使用索引从req_colors_list中检索相应的颜色。否则,它将分配np.nan来表示缺失的值。

由于colors_list包含有限数量的颜色,会有多个年份具有相同的颜色的情况。

英文:

To assign colors to the DataFrame based on the year of the sample, you can modify your lambda function:

  1. df[&#39;year_color&#39;] = df[&#39;Year&#39;].apply(lambda x: req_colors_list[years_list.index(x)] if x in years_list else np.nan)

This lambda function checks if the year x is present in the years_list. If it is, it retrieves the corresponding color from the req_colors_list using the index. Otherwise, it assigns np.nan to indicate missing values.

Because the colors_list contains a limited number of colors, there will be cases where multiple years have the same color.

答案2

得分: 1

使用Series.map和由zip生成的字典:

  1. df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
  2. print(df)
  3. sensor_value Year Month year_color
  4. 0 5171.318942 2002 4 xkcd:cloudy blue
  5. 1 5085.094086 2002 5 xkcd:cloudy blue
  6. 3 5685.681944 2004 6 xkcd:dark pastel green
  7. 4 6097.877688 2006 7 xkcd:dust
  8. 5 6063.909946 2003 8 xkcd:electric lime

如果唯一年份的数量少于列数,map会生成NaN

  1. colors_list = ['xkcd:cloudy blue', 'xkcd:dark pastel green', 'xkcd:dust']
  2. years_list = df['Year'].unique().tolist()
  3. df['year_color'] = df['Year'].map(dict(zip(years_list, colors_list)))
  4. print(df)
  5. sensor_value Year Month year_color
  6. 0 5171.318942 2002 4 xkcd:cloudy blue
  7. 1 5085.094086 2002 5 xkcd:cloudy blue
  8. 3 5685.681944 2004 6 xkcd:dark pastel green
  9. 4 6097.877688 2006 7 xkcd:dust
  10. 5 6063.909946 2003 8 NaN
英文:

Use Series.map by dictionary generated by zip:

  1. df[&#39;year_color&#39;] = df[&#39;Year&#39;].map(dict(zip(years_list, colors_list)))
  2. print (df)
  3. sensor_value Year Month year_color
  4. 0 5171.318942 2002 4 xkcd:cloudy blue
  5. 1 5085.094086 2002 5 xkcd:cloudy blue
  6. 3 5685.681944 2004 6 xkcd:dark pastel green
  7. 4 6097.877688 2006 7 xkcd:dust
  8. 5 6063.909946 2003 8 xkcd:electric lime

If number of unique years is less like number of column, map generate NaNs:

  1. colors_list =[&#39;xkcd:cloudy blue&#39;,
  2. &#39;xkcd:dark pastel green&#39;,
  3. &#39;xkcd:dust&#39;]
  4. years_list = df[&#39;Year&#39;].unique().tolist()
  5. df[&#39;year_color&#39;] = df[&#39;Year&#39;].map(dict(zip(years_list, colors_list)))
  6. print (df)
  7. sensor_value Year Month year_color
  8. 0 5171.318942 2002 4 xkcd:cloudy blue
  9. 1 5085.094086 2002 5 xkcd:cloudy blue
  10. 3 5685.681944 2004 6 xkcd:dark pastel green
  11. 4 6097.877688 2006 7 xkcd:dust
  12. 5 6063.909946 2003 8 NaN

huangapple
  • 本文由 发表于 2023年6月16日 14:22:35
  • 转载请务必保留本文链接:https://go.coder-hub.com/76487429.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定