英文:
Combine Binned barplot with lineplot
问题
我想在同一张图上表示两个数据集,一个作为折线图,另一个作为分组条形图。我可以分别做到:
但是当我尝试将它们合并时,x轴当然会出现问题:
我也似乎无法去掉分组标签。
我该如何在同一张图上展示这两个信息呢?
英文:
I'd like to represent two datasets on the same plot, one as a line as one as a binned barplot. I can do each individually:
tobar = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
tobar["bins"] = pd.qcut(tobar.index, 20)
bp = sns.barplot(data=tobar, x="bins", y="value")
toline = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
lp = sns.lineplot(data=toline, x=toline.index, y="value")
But when I try to combine them, of course the x axis gets messed up:
fig, ax = plt.subplots()
ax2 = ax.twinx()
bp = sns.barplot(data=tobar, x="bins", y="value", ax=ax)
lp = sns.lineplot(data=toline, x=toline.index, y="value", ax=ax2)
bp.set(xlabel=None)
I also can't seem to get rid of the bin labels.
How can I get these two informations on the one plot?
答案1
得分: 2
- 这个答案解释了为什么最好使用
matplotlib.axes.Axes.bar
来绘制柱状图,而不是使用sns.barplot
或pandas.DataFrame.bar
。- 简而言之,
xtick
的位置对应于标签的实际数值,而seaborn
和pandas
中的xticks
是从 0 开始索引的,并不对应实际数值。
- 简而言之,
- 这个答案展示了如何添加柱状图的标签。
- 如果需要,可以对线图使用
ax2 = ax.twinx()
。 - 如果线图使用不同的数据也是可以的。
- 在
python 3.11
,pandas 1.5.2
,matplotlib 3.6.2
,seaborn 0.12.1
中进行了测试
导入和数据框
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# 测试数据
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# 创建分箱
df["bins"] = pd.qcut(df.index, 20)
# 添加一个列作为区间的中点
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# 透视数据框以计算每个区间的均值
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
绘图 1
# 创建图形
fig, ax = plt.subplots(figsize=(30, 7))
# 在 y=0 处添加水平线
ax.axhline(0, color='black')
# 添加柱状图
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# 如果需要,设置 xticks 的标签
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# 如果需要,将区间添加为柱状图的标签
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# 添加线图
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
绘图 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
绘图 3
- 柱宽为区间的宽度
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
英文:
- This answer explains why it's better to plot the bars with
matplotlib.axes.Axes.bar
instead ofsns.barplot
orpandas.DataFrame.bar
.- In short, the
xtick
locations correspond to the actual numeric value of the label, whereas thexticks
forseaborn
andpandas
are 0 indexed, and don't correspond to the numeric value.
- In short, the
- This answer shows how to add bar labels.
ax2 = ax.twinx()
can be used for the line plot if needed- Works the same if the line plot is different data.
- Tested in
python 3.11
,pandas 1.5.2
,matplotlib 3.6.2
,seaborn 0.12.1
Imports and DataFrame
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import numpy as np
# test data
np.random.seed(2022)
df = pd.melt(pd.DataFrame(np.random.randn(1000).cumsum()))
# create the bins
df["bins"] = pd.qcut(df.index, 20)
# add a column for the mid point of the interval
df['mid'] = df.bins.apply(lambda row: row.mid.round().astype(int))
# pivot the dataframe to calculate the mean of each interval
pt = df.pivot_table(index='mid', values='value', aggfunc='mean').reset_index()
Plot 1
# create the figure
fig, ax = plt.subplots(figsize=(30, 7))
# add a horizontal line at y=0
ax.axhline(0, color='black')
# add the bar plot
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
# set the labels on the xticks - if desired
ax.set_xticks(ticks=pt.mid, labels=pt.mid)
# add the intervals as labels on the bars - if desired
ax.bar_label(ax.containers[0], labels=df.bins.unique(), weight='bold')
# add the line plot
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 2
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=4, alpha=0.5)
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
Plot 3
- The bar width is the width of the interval
fig, ax = plt.subplots(figsize=(30, 7))
ax.axhline(0, color='black')
ax.bar(data=pt, x='mid', height='value', width=50, alpha=0.5, ec='k')
ax.set_xticks(ticks=pt.mid, labels=df.bins.unique(), rotation=45)
ax.bar_label(ax.containers[0], weight='bold')
_ = sns.lineplot(data=df, x=df.index, y="value", ax=ax, color='tab:orange')
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论