2023年6月15日 21:31:00go评论78阅读模式

英文:

How to create two different legends from multiple plot calls

问题

我必须创建一个可视化图表，其中包含多个折线图（趋势线/移动平均线等）和多个散点图。

我已成功在单个图表中创建了所有图表，但我希望散点图的图例与折线图的图例不同。

我希望折线图的图例位于中心，而散点图的图例位于图表的左上角。

散点图的代码如下：

# 散点图
# 使用 matplotlib 的散点图
scatter1 = plt.scatter(data=df[df.ABC < 10], x='Date', y='ABC', c='cyan', s=5, label='less')
scatter2 = plt.scatter(data=df[(df.ABC >= 10) & (df.ABC < 40)], x='Date', y='ABC', c='dodgerblue', s=5, label='normal')
scatter3 = plt.scatter(data=df[(df.ABC >= 40) & (df.ABC < 60)], x='Date', y='ABC', c='orangered', s=5, label='good')
scatter4 = plt.scatter(data=df[df.ABC > 60], x='Date', y='ABC', c='brown', s=5, label='more')

折线图和一个计算得出的指标（全部必须在同一个图例框中）的代码如下：

# 空的图例线，指定一个计算得出的指标
plt.plot([], [], ' ', label=f'Points above Target Budget PR = {num}/{den} = {ans}%')

# 使用滚动平均值绘制折线图
# 使用 seaborn 的 lineplot()
sns.lineplot(x='Date',
             y='moving avg',
             data=df,
             color='red',
             linewidth=1.5,
             label='Moving Average')

# 绘制简单的时间序列图
# 使用 seaborn 的 lineplot()
sns.lineplot(x='Date',
             y='Yield',
             color='darkgreen',
             linewidth=1.5,
             data=df,
             label='Yield')

我尝试过以下方法来创建多个图例：

legend1 = plt.legend((scatter1, scatter2, scatter3, scatter4), ['less', 'normal', 'good', 'more'], loc='upper left')
plt.gca().add_artist(legend1)
plt.legend(loc='center')

上述代码的作用是在左上角创建散点图的图例。然而，相同的点在应该只包含折线图和计算指标的图例中也被重复写入。因此，散点图的图例在图表中表示两次。

我尝试过对代码进行其他更改，但结果总是导致一个包含所有内容的图例，或者包含散点图部分重复的两个图例，或者根本没有图例。

英文:

I have to create a visualization where there are multiple line plots (trend lines/moving averages etc.), and multiple scatter charts.

I have successfully created all the charts in a single plot, however I want the legend of scatter chart to be different from that of the line plots.

I want the legend of lineplots to be at the center and that of scatter charts to be at the upper left corner of the plot.

Code for the 4 scatter charts

#Scatter plot
#Using matplotlib scatter plot
scatter1 = plt.scatter(data=df[df.ABC &lt; 10], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;cyan&#39; , s = 5, label=&#39;less&#39;)
scatter2 = plt.scatter(data=df[(df.ABC &gt;= 10) &amp; (df.ABC &lt; 40)], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;dodgerblue&#39; , s = 5, label=&#39;normal&#39;)
scatter3 = plt.scatter(data=df[(df.ABC &gt;= 40) &amp; (df.ABC &lt; 60)], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;orangered&#39; , s = 5, label=&#39;good&#39;)
scatter4 = plt.scatter(data=df[df.ABC &gt; 60], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;brown&#39; , s = 5, label=&#39;more&#39;)

Code for the line plots and a calculated metric (all have to be in the same legend box)

#Empty legend line specifying a calulated metric
plt.plot([], [], &#39; &#39;, label=f&#39;Points above Target Budget PR = {num}/{den} = {ans}%&#39;)

#plot using rolling average
#using seaborn.lineplot()
sns.lineplot( x = &#39;Date&#39;,
             y = &#39;moving avg&#39;,
             data = df,
             color=&#39;red&#39;,
             linewidth=1.5,
             label = &#39;Moving Average&#39;)

#plot a simple time series plot
#using seaborn.lineplot()
sns.lineplot( x = &#39;Date&#39;,
             y = &#39;Yield&#39;,
             color = &#39;darkgreen&#39;,
             linewidth = 1.5,
             data = df,
             label = &#39;Yield&#39;)

I have tried following method to make multiple legends:-

legend1 = plt.legend((scatter1,scatter2,scatter3,scatter4),[&#39;less&#39;,&#39;normal&#39;,&#39;good&#39;,&#39;more&#39;], loc = &#39;upper left&#39;)
plt.gca().add_artist(legend1)
plt.legend(loc = &#39;center&#39;)

What above code does is, it creates a legend for scatter chart in top left corner. However the same points are re-written in the legend that is supposed to contain only line plots and the calculated metric. so the legend for scatter chart is represented twice in the chart.

I tried to do other changes in the code too, but the outcome always leads to 1 legend including all, or 2 legends with scatter chart part repeated, or no legend at all.

答案1

得分: 1

通常在进行任何可视化之前，管理数据是更好的选择。
1. 使用 pd.cut 将数据分箱为不同类别。
2. 将滚动均值和中位数添加到数据框中作为列。
3. 使用 pandas.DataFrame.melt 将新列转换为长格式。
这样做更好，因为所有数据都在一个地方，可以更轻松地对数据进行可视化。
从可视化的角度来看，除非有两个不同的 y 轴（左侧 / 右侧），否则将单个图例放在图的中间或覆盖标记会创建更干净的可视化。

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# 创建样本数据
np.random.seed(2023)
df=pd.DataFrame({'ABC': np.random.uniform(low=0, high=100, size=(100,)),
                 'Date': pd.date_range('2019/07/01', periods=100, freq='SM')})

# 为数据创建带标签的箱子
bins = [-np.inf, 10, 40, 60, np.inf]
labels = ['less', 'normal', 'good', 'more']
df['cats'] = pd.cut(df.ABC, bins=bins, labels=labels, ordered=True, right=False)

# 添加滚动均值和中位数作为列
df['MA'] = df.ABC.rolling(10).mean()
df['Yield'] = df.ABC.rolling(10).median()

# 将滚动列转换为长格式
df = df.melt(id_vars=['ABC', 'Date', 'cats'], var_name='Labels')

# 绘图
fig = plt.figure(figsize=(10, 6))
ax = sns.scatterplot(data=df, x='Date', y='ABC', hue='cats', palette=['cyan', 'dodgerblue', 'orangered', 'brown'])
sns.lineplot(data=df, x='Date', y='value', hue='Labels', palette=['red', 'darkgreen'], ax=ax)

sns.move_legend(ax, bbox_to_anchor=(1, 0.5), loc='center left', frameon=False)

`df`

           ABC       Date    cats Labels      value
0    32.198830 2019-07-15  normal     MA        NaN
1    89.042245 2019-07-31    more     MA        NaN
2    58.805226 2019-08-15    good     MA        NaN
3    12.659609 2019-08-31  normal     MA        NaN
4    14.134122 2019-09-15  normal     MA        NaN
...
195  60.740918 2023-06-30    more  Yield  61.721929
196  47.481600 2023-07-15    good  Yield  56.303730
197  70.959070 2023-07-31    more  Yield  61.721929
198  11.524271 2023-08-15  normal  Yield  56.303730
199  73.279407 2023-08-31    more  Yield  56.303730

英文:

It is typically a better option to manage the data prior to any visualizations.
1. Use pd.cut to bin the data with categories
2. Add the rolling mean, and median to the dataframe as columns
3. Convert the new columns to long-form with pandas.DataFrame.melt
- It's better, because all of the data is in a single place, and it allows for more easily visualizing the data.
Visually speaking, unless there are two different y-axes (left / right), it creates a cleaner visualization to have a single legend, and not in the middle of the plot, or otherwise covering markers.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

# create sample data
np.random.seed(2023)
df=pd.DataFrame({&#39;ABC&#39;: np.random.uniform(low=0, high=100, size=(100,)),
                 &#39;Date&#39;: pd.date_range(&#39;2019/07/01&#39;, periods=100, freq=&#39;SM&#39;)})

# create bins with labels for the data
bins = [-np.inf, 10, 40, 60, np.inf]
labels = [&#39;less&#39;, &#39;normal&#39;, &#39;good&#39;, &#39;more&#39;]
df[&#39;cats&#39;] = pd.cut(df.ABC, bins=bins, labels=labels, ordered=True, right=False)

# add the rolling mean and median as a column
df[&#39;MA&#39;] = df.ABC.rolling(10).mean()
df[&#39;Yield&#39;] = df.ABC.rolling(10).median()

# convert the rolling columns to a long form
df = df.melt(id_vars=[&#39;ABC&#39;, &#39;Date&#39;, &#39;cats&#39;], var_name=&#39;Labels&#39;)

# plot
fig = plt.figure(figsize=(10, 6))
ax = sns.scatterplot(data=df, x=&#39;Date&#39;, y=&#39;ABC&#39;, hue=&#39;cats&#39;, palette=[&#39;cyan&#39;, &#39;dodgerblue&#39;, &#39;orangered&#39;, &#39;brown&#39;])
sns.lineplot(data=df, x=&#39;Date&#39;, y=&#39;value&#39;, hue=&#39;Labels&#39;, palette=[&#39;red&#39;, &#39;darkgreen&#39;], ax=ax)

sns.move_legend(ax, bbox_to_anchor=(1, 0.5), loc=&#39;center left&#39;, frameon=False)

`df`

           ABC       Date    cats Labels      value
0    32.198830 2019-07-15  normal     MA        NaN
1    89.042245 2019-07-31    more     MA        NaN
2    58.805226 2019-08-15    good     MA        NaN
3    12.659609 2019-08-31  normal     MA        NaN
4    14.134122 2019-09-15  normal     MA        NaN
...
195  60.740918 2023-06-30    more  Yield  61.721929
196  47.481600 2023-07-15    good  Yield  56.303730
197  70.959070 2023-07-31    more  Yield  61.721929
198  11.524271 2023-08-15  normal  Yield  56.303730
199  73.279407 2023-08-31    more  Yield  56.303730

答案2

得分: 1

以下是您要翻译的内容：

[![在此输入图像描述][1]][1]
以前发布的优秀答案大多数都是针对OP问题的特定解决方案 — 我想添加一个通用解决方案，用于在绘图中为每种存在的Artist类型使用图例。

首先，我们需要一个函数来从handles中提取相同类型的Artists

    def split(handles_labels, plot_type):
        import matplotlib
    
        types = dict(plot=matplotlib.lines.Line2D,
                     scatter=matplotlib.collections.PathCollection,)
    
        try: plot_type = types[plot_type]
        except KeyError: raise ValueError('无效的绘图类型。')
    
        return zip(*((h, l) for h, l in zip(*handles_labels) if type(h) is plot_type))

然后很容易生成上面的绘图...

    import matplotlib.pyplot as plt
    
    fig, ax = plt.subplots(layout='constrained')
    
    ax.scatter(9, 4, label='A')
    ax.plot((5,7),(4,8), label='B')
    ax.scatter(3, 7, label='C')
    ax.plot((4,8),(6,5), label='D')
    
    handles_labels = ax.get_legend_handles_labels()
    
    l0 = ax.legend(*split(handles_labels, 'scatter'), loc='左上角')
    ax.add_artist(l0)
    l1 = ax.legend(*split(handles_labels, 'plot'), loc='右上角')
    
    plt.show()

  [1]: https://i.stack.imgur.com/GS7bq.png

英文:

The excellent answers previously posted are mostly specific to the OP problem — I'd like to add a generic solution to the problem of using a legend for each Artist type that is present in a plot.

First we need a function to extract from the handles the Artists of the same type

def split(handles_labels, plot_type):
    import matplotlib

    types = dict(plot=matplotlib.lines.Line2D,
                 scatter=matplotlib.collections.PathCollection,)

    try: plot_type = types[plot_type]
    except KeyError: raise ValueError(&#39;Invalid plot type.&#39;)

    return zip(*((h, l) for h, l in zip(*handles_labels) if type(h) is plot_type))

Then it's quite easy to produce the plot above...

import matplotlib.pyplot as plt

fig, ax = plt.subplots(layout=&#39;constrained&#39;)

ax.scatter(9, 4, label=&#39;A&#39;)
ax.plot((5,7),(4,8), label=&#39;B&#39;)
ax.scatter(3, 7, label=&#39;C&#39;)
ax.plot((4,8),(6,5), label=&#39;D&#39;)

handles_labels = ax.get_legend_handles_labels()

l0 = ax.legend(*split(handles_labels, &#39;scatter&#39;), loc=&#39;upper left&#39;)
ax.add_artist(l0)
l1 = ax.legend(*split(handles_labels, &#39;plot&#39;), loc=&#39;upper right&#39;)

plt.show()

答案3

得分: 0

以下是代码的翻译部分：

# 一种方法（也是我认为更简单的方法）是首先完成绘图并获得包含6个条目的图例。然后使用 `get_legend_handles_labels()` 来获取图例和句柄，根据需要将它们拆分并创建为单独的图例。请注意，`add_artist` 是为了将第一个添加到轴上。完整的代码如下... 大部分与您已有的内容相同，删除了您的空图例行，大多数添加的代码在底部... 另外，Title1 和 Title2 是可选的，您可以根据需要使用/删除它。

希望这对您有所帮助。

英文:

One way to do this (and what I think is the easier way) is to simply completing the plotting and get the legend with all the 6 entries. Then use get_legend_handles_labels() to get the legends and handles, split them up as you need and create them as separate legends. Note the add_artist is so that you add the first one to the axis. The entire code is below... mostly what you have, removed the empty legend line you have and most of the added code is towards the bottom... Also, Title1 & Title2 are optional and you can use/remove it as it suits you.

df=pd.DataFrame({&#39;ABC&#39;:np.random.uniform(low=0, high=100, size=(100,)),
                &#39;Date&#39;:pd.date_range(&#39;2019/07/01&#39;, periods=100, freq=&#39;SM&#39;)})
#Using matplotlib scatter plot
scatter1 = plt.scatter(data=df[df.ABC &lt; 10], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;cyan&#39; , s = 5, label=&#39;less&#39;)
scatter2 = plt.scatter(data=df[(df.ABC &gt;= 10) &amp; (df.ABC &lt; 40)], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;dodgerblue&#39; , s = 5, label=&#39;normal&#39;)
scatter3 = plt.scatter(data=df[(df.ABC &gt;= 40) &amp; (df.ABC &lt; 60)], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;orangered&#39; , s = 5, label=&#39;good&#39;)
scatter4 = plt.scatter(data=df[df.ABC &gt; 60], x=&#39;Date&#39;, y=&#39;ABC&#39;, c = &#39;brown&#39; , s = 5, label=&#39;more&#39;)

#Empty legend line specifying a calulated metric
#plt.plot([], [], &#39; &#39;, label=f&#39;Points above Target Budget PR = {num}/{den} = {ans}%&#39;)

#plot using rolling average
#using seaborn.lineplot()
sns.lineplot( x = &#39;Date&#39;,
             y = df.ABC.rolling(10).mean(),
             data = df,
             color=&#39;red&#39;,
             linewidth=1.5,
             label = &#39;Moving Average&#39;)

#plot a simple time series plot
#using seaborn.lineplot()
sns.lineplot( x = &#39;Date&#39;,
             y = df.ABC.rolling(10).median(),
             color = &#39;darkgreen&#39;,
             linewidth = 1.5,
             data = df,
             label = &#39;Yield&#39;)

h,l = plt.gca().get_legend_handles_labels() ##Get the legend handles and lables

l1 = plt.gca().legend(h[:2],l[:2], loc=&#39;center&#39;, title=&#39;Title 1&#39;) ##Plot the seborn lines as the first legend
l2 = plt.gca().legend(h[2:],l[2:], loc=&#39;upper right&#39;, title=&#39;Title 2&#39;) ##Plot the seborn lines as the first legend

plt.gca().add_artist(l1) # 2nd legend will erases the first, so need to add it
plt.show()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何从多个绘图调用中创建两个不同的图例

问题

答案1

`df`

`df`

答案2

答案3

Azure Function timeout after 5mins even though `functionTimeout` is set to `00:10:00` in `host.json`

Sublime Text 用于自定义 Python 日志格式的语法高亮

如何在XPath和Python中使用preceding-sibling？它似乎显示错误的输出。

Pandas Python: KeyError 日期

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论