英文:
x axis value ranges not sequential in seaborn barplot & pointplot as subplots
问题
以下是您的数据框翻译部分:
我的数据框如下:
df['graph_df_uni_valid']
组别 MO_SNCE_REC_APP 标签 预测
0 (-0.001, 25.0] 24324 0.042551 0.042118
1 (25.0, 45.0] 24261 0.035077 0.033748
2 (45.0, 64.0] 23000 0.033391 0.033354
3 (64.0, 83.0] 22960 0.028876 0.028351
4 (83.0, 118.0] 23725 0.028872 0.029056
5 (118.0, 174.0] 23354 0.021024 0.022121
6 丢失 0 0.009165 0.008978
df['graph_df_uni_oot']
组别 MO_SNCE_REC_APP 标签 预测
0 (-0.001, 25.0] 28942 0.033308 0.041806
1 (25.0, 44.0] 28545 0.027921 0.034701
2 (44.0, 64.0] 27934 0.026634 0.033682
3 (64.0, 83.0] 27446 0.021132 0.028101
4 (83.0, 119.0] 28108 0.022236 0.028721
5 (119.0, 171.0] 27812 0.015892 0.020897
6 丢失 0 0.007614 0.009352
请注意,这些翻译是数据框的部分内容。如果您需要其他方面的帮助,请告诉我。
第二个问题,如何添加图例,将红线标记为'Book rate',绿线标记为'Score',蓝色柱状图标记为'Volume',这需要在代码中添加以下行来创建图例:
import matplotlib.pyplot as plt
# 在相应的图形上添加标签
ax1_line.set_label('Book rate')
ax2_line.set_label('Score')
ax[0].set_label('Volume')
# 在图上添加图例
ax[0].legend(loc='upper right')
ax1_line.legend(loc='upper left')
ax2_line.legend(loc='upper center')
# 显示图形
plt.show()
这将在图形上添加图例,并根据需要设置它们的位置。
英文:
My data frames are:
df['graph_df_uni_valid']
group MO_SNCE_REC_APP Label predictions
0 (-0.001, 25.0] 24324 0.042551 0.042118
1 (25.0, 45.0] 24261 0.035077 0.033748
2 (45.0, 64.0] 23000 0.033391 0.033354
3 (64.0, 83.0] 22960 0.028876 0.028351
4 (83.0, 118.0] 23725 0.028872 0.029056
5 (118.0, 174.0] 23354 0.021024 0.022121
6 miss 0 0.009165 0.008978
df['graph_df_uni_oot']
group MO_SNCE_REC_APP Label predictions
0 (-0.001, 25.0] 28942 0.033308 0.041806
1 (25.0, 44.0] 28545 0.027921 0.034701
2 (44.0, 64.0] 27934 0.026634 0.033682
3 (64.0, 83.0] 27446 0.021132 0.028101
4 (83.0, 119.0] 28108 0.022236 0.028721
5 (119.0, 171.0] 27812 0.015892 0.020897
6 miss 0 0.007614 0.009352
Issue is x-axis of Test (& OOT) plot is not in sequential order i.e. bin (11.0 – 102.0] should be the last, NOT 2nd in sequence.
My data is in correct sequence so I used sort=False
for pointplot (or lineplot) and order=df['graph_df_uni_valid'].sort_values(by='group').group
for barplot. But I get same unordered x-axis with/without these parameters.
Here is my code:
fig, ax = plt.subplots(nrows = 1, ncols = 2, figsize = (12,5), sharex = False, sharey = True, tight_layout = True)
fig.supxlabel(desc, ha = 'center', wrap = True)
fig.suptitle(f"{col} (Rank #{rank}, TotGain: {totgain}, Cum TotGain: {cumtotgain})", fontsize = 16)
ax1_line = ax[0].twinx()
ax2_line = ax[1].twinx()
ax2_line.get_shared_y_axes().join(ax1_line,ax2_line)
ax[0] = sns.barplot(data = df['graph_df_uni_valid'], ax = ax[0], x = 'group', y = col, color = 'blue', order=df['graph_df_uni_valid'].sort_values(by='group').group)
ax[0].set(xlabel = '', ylabel = 'Count')
ax[0].tick_params(axis = 'x', rotation = 60)
ax1_line = sns.pointplot(data = df['graph_df_uni_valid'], ax = ax1_line, x = 'group', y = target, sort= False, color = 'red', marker = '.')
ax1_line = sns.pointplot(data = df['graph_df_uni_valid'], ax = ax1_line, x = 'group', y = sc, sort= False, color = 'green', marker = '.')
ax1_line.set(xlabel = '', ylabel = 'Book Rate/Score')
ax[0].set_title('Test (202205 - 202208)')
ax[1] = sns.barplot(data = df['graph_df_uni_oot'], ax = ax[1], x = 'group', y = col, color = 'blue', order=df['graph_df_uni_oot'].sort_values(by='group').group)
ax[1].set(xlabel = '', ylabel = 'Count')
ax[1].tick_params(axis = 'x', rotation = 60)
ax2_line = sns.pointplot(data = df['graph_df_uni_oot'], x = 'group', y = target, sort= False, color = 'red', marker = '.')
ax2_line = sns.pointplot(data = df['graph_df_uni_oot'], ax = ax2_line, x = 'group', y = sc, sort=False, color = 'green', marker = '.')
ax2_line.set(xlabel = '', ylabel = 'Book Rate/Score')
ax[1].set_title('OOT (202204)')
If I change barplot parameter order=df['graph_df_uni_valid'].index
, I get desired x-axis sequence but bars disappears.
versions
- matplotlib 3.4.0
- seaborn 0.10.0
2nd Question How to add legend that red line is 'Book rate', green line is 'Score' & blue bars are volume
答案1
得分: 2
-
使用
.groupby
聚合数据是不必要的。 -
尽管在问题中没有显示,但示例的形状表明它已经被使用。
-
sns.barplot
和sns.pointplot
都有一个estimator
参数,用于设置用于聚合的统计函数的类型。默认为'mean'
。 -
如果存在聚合,将会有误差条,可以使用
errorbar
参数(在旧版本中为ci
)来移除。 -
使用
pd.cut
添加一列,默认情况下创建有序的分类排序的箱子,ordered=True
。 -
由于它们是有序的,x 轴也将是有序的。
-
图例:
- 为
ax1
和ax1y
上的绘图添加标签 - 获取句柄和标签
- 删除轴上的图例
- 使用合并的句柄和标签创建一个图例
- 通过更改
loc
和bbox_to_anchor
来查看 如何将图例放在图外 中的其他放置选项。
- 为
-
在
python 3.11.2
,pandas 2.0.1
,matplotlib 3.7.1
,seaborn 0.12.2
中进行了测试
英文:
-
Aggregating the data with
.groupby
is not necessary.- While not shown in the OP, the shape of the sample, indicates it was used.
sns.barplot
andsns.pointplot
both have theestimator
parameter for setting the type of statistical function to use for aggregation. The default is'mean'
.- If there is aggregation, there will be errorbars, which can be removed with the
errorbar
parameter (ci
in older versions).
- If there is aggregation, there will be errorbars, which can be removed with the
-
Add a column with
pd.cut
, which creates categorically ordered bins,ordered=True
, by default.- Since they are ordered, the x-axis will be ordered.
-
Legends:
- Add labels for plots on
ax1
andax1y
- Get the handles and labels
- Delete the axes legend
- Create a figure legend with the combined handles and labels
- See How to put the legend outside the plot for other placement options for the figure legend, by changing
loc
andbbox_to_anchor
.
- See How to put the legend outside the plot for other placement options for the figure legend, by changing
- Add labels for plots on
-
Tested in
python 3.11.2
,pandas 2.0.1
,matplotlib 3.7.1
,seaborn 0.12.2
import seaborn as sns
import matplotlib.pyplot as plt
# create the dataframe
df = sns.load_dataset('geyser')
# create the categorically ordered groups
df['group'] = pd.cut(df.duration, bins=np.arange(1.6, 5.2, 0.5), ordered=True)
# create the figure and axes
fig, (ax1, ax2) = plt.subplots(nrows=1, ncols=2, figsize=(12, 5), sharex=False, sharey=True, tight_layout=True)
ax1y = ax1.twinx()
ax2y = ax2.twinx()
# select the data for ax1
long = df[df.kind.eq('long')]
# plot
sns.barplot(data=long, x='group', y='duration', ax=ax1, color='tab:blue', label='Duration', errorbar=None)
sns.pointplot(data=long, x='group', y='waiting', ax=ax1y, color='tab:red', label='Waiting', errorbar=None)
ax1.set(title='Geyser: short wait time and duration')
# create the legends on ax1 and ax1y
ax1.legend()
ax1y.legend()
# get the legend handles and labels
h1, l1 = ax1.get_legend_handles_labels()
h1y, l1y = ax1y.get_legend_handles_labels()
# remove the axes legend
ax1.get_legend().remove()
ax1y.get_legend().remove()
# add a figure legend from the combined handles and labels
fig.legend(h1 + h1y, l1 + l1y, loc='lower center', ncols=2, bbox_to_anchor=(0.5, 0), frameon=False)
# select the data for ax2
short = df[df.kind.eq('short')]
# plot
sns.barplot(data=short, x='group', y='duration', ax=ax2, color='tab:blue', errorbar=None)
sns.pointplot(data=short, x='group', y='waiting', ax=ax2y, color='tab:red', errorbar=None)
_ = ax2.set(title='Geyser: long wait time and duration')
df.head()
duration waiting kind group
0 3.600 79 long (3.1, 3.6]
1 1.800 54 short (1.6, 2.1]
2 3.333 74 long (3.1, 3.6]
3 2.283 62 short (2.1, 2.6]
4 4.533 85 long (4.1, 4.6]
答案2
得分: 0
由于我的数据顺序正确,所以我只需要在pointplot(或lineplot)中使用sort=False,而在barplot中不需要order参数。我可以得到正确顺序的x轴。
英文:
As my data was in correct sequence so I just have to use sort=False for pointplot (or lineplot) and no order parameter for barplot. I get x-axis in correct order.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论