英文:
Seaborn convert BarPlot to histogram-like chart
问题
我有一个 pandas 的 DataFrame
,看起来像这样,我正在用它来绘制一个角色随着时间变化的生命情况。days
列实际上是“出生后的天数”。例如,这个角色是在 2023 年 5 月 26 日出生的。
这是 seaborn 的 BarPlot
。
我已经显著简化了角色的寿命天数,以便于可重现性,但在我的正常代码中,天数可能有数百天,甚至可能有数千天。
这是我的正常情况下的图表。
正如您所看到的,这个图表有很多条形图,这似乎对性能产生了负面影响,即使只有几百天。
所以我的问题是:我能否将 BarPlot
转换为与我的 DataFrame
设置相对应的 seaborn 直方图?
理想情况下,它看起来会像下面的图像一样(忽略我糟糕的图形设计工作),红线只是用来突出直方图的每个部分,我不打算添加这些红线。
请注意,我还需要能够将月份标签保持在与上面相同的位置,因为可能会有一段时间角色的健康状况保持不变。
我的图表代码很简单,但似乎 DataFrame
的大小导致了渲染时间很慢。
这是一个示例 DataFrame
的行,用于方便重现:
感谢您提前的回答。
英文:
I have a pandas DataFrame
that looks like this, and I'm using it to graph the life of a character over period of days. The days
column is really "days since birth." For this example, the character was born on May 26th, 2023.
days health months
0 0 30 May 23
1 1 30
2 2 20
3 3 20
4 4 10
5 5 10
6 6 10 Jun 23
7 7 10
8 8 10
9 9 0
This is the seaborn BarPlot
.
I have significantly simplified the number of days the character is alive for the sake of reproducibility, but in my normal code, the number of days is in the hundreds, possibly thousands.
Here is a graph of my normal case.
As you can see, this graph is much more overloaded with bars, which seems to be impacting performance pretty negatively, with only a few hundred days.
So my question is this: can I convert the BarPlot
to the seaborn equivalent of a histogram with the way my DataFrame
is set up?
The ideal would look something like the image below (ignore my bad graphic design job), The red lines are only to highlight each section of the histogram. I am not looking to add those red lines.
Note, I also need to be able to keep the month labels in the same place as they are above, since there could be a section of time where the character's health stays the same for multiple months.
My code is minimal for the chart, but the size of the DataFrame
seems to be causing the slow rendering time.
ax = sns.barplot(dataframe, x='days', y='health', color='blue')
ax.set_xticklabels(dataframe.months)
plt.xticks(rotation=45)
plt.show()
Here's a line for the example DataFrame
, for easy reproducibility:
df = pd.DataFrame({'days': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'health': [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], 'months': ["May 23", " ", " ", " ", " ", " ", "Jun 23", " ", " ", " "]})
Thank you in advance.
答案1
得分: 2
- 似乎应该是一个连续的 x 轴,如果是这样的话,这应该是一个线图。
- 你不应该在 x 轴上绘制字符串,因为这会导致每个字符串都有一个刻度,因为标签是分类的。
- 结合
‘months’
和‘days’
,添加一年,创建一个日期时间 Dtype 列用作 x 轴。
- 结合
pandas
会根据日期的范围不同而格式化刻度标签。- 通过查看处理日期时间 x 轴刻度标签格式的众多 SO 问题,自定义日期时间轴的确切外观和位置。
- 在
python 3.11.2
、pandas 2.0.1
、matplotlib 3.7.1
中测试通过。
import pandas as pd
# 示例数据框
df = pd.DataFrame({'days': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'health': [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], 'months': ["May 23", " ", " ", " ", " ", " ", "Jun 23", " ", " ", " "]})
# 用 NA 替换字符串,以便前向填充正常工作
df = df.replace(' ', pd.NA)
# 填充月份列,而不是空白
df.months = df.months.ffill()
# 转换为带有特定年份的日期时间
df['date'] = pd.to_datetime('2023 ' + df.months, format='%Y %b %d')
# 通过添加天数作为偏移量来更新日期列
df.date = df.apply(lambda v: v.date + pd.Timedelta(days=v.days), axis=1)
# 绘图
ax = df.plot(x='date', y='health', rot=0, figsize=(10, 5))
# 如果需要重新定位 x 轴刻度标签,请获取刻度和标签
ticks, labels = list(zip(*[(v.get_position()[0], v.get_text()) for v in ax.get_xticklabels()]))
# 重置刻度和标签
ax.set_xticks(ticks, labels, ha='center')
df
days health months date
0 0 30 May 23 2023-05-23
1 1 30 May 23 2023-05-24
2 2 20 May 23 2023-05-25
3 3 20 May 23 2023-05-26
4 4 10 May 23 2023-05-27
5 5 10 May 23 2023-05-28
6 6 10 Jun 23 2023-06-29
7 7 10 Jun 23 2023-06-30
8 8 10 Jun 23 2023-07-01
9 9 0 Jun 23 2023-07-02
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 days 10 non-null int64
1 health 10 non-null int64
2 months 10 non-null object
3 date 10 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 452.0+ bytes
英文:
- It seems like it's supposed to be a continuous x-axis, in which case, this should be a line plot.
- You don't want to plot strings on the x-axis. This results in a tick for every string because the labels are categorical.
- Combine
'months'
and'days'
, add a year, and create a datetime Dtype column to use as the x-axis.
- Combine
pandas
will format the ticklabels differently based on the range of the dates.- Customize the exact look and location of the datetime axis, by looking at the many questions on SO dealing with formatting datetime xtick labels.
- Tested in
python 3.11.2
,pandas 2.0.1
,matplotlib 3.7.1
import pandas as pd
# sample dataframe
df = pd.DataFrame({'days': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'health': [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], 'months': ["May 23", " ", " ", " ", " ", " ", "Jun 23", " ", " ", " "]})
# replace the strings with NA, so forward fill will work
df = df.replace(' ', pd.NA)
# fill the months column, not empties
df.months = df.months.ffill()
# convert to a datetime with a specific year
df['date'] = pd.to_datetime('2023 ' + df.months, format='%Y %b %d')
# update the date column by adding the days as an offset
df.date = df.apply(lambda v: v.date + pd.Timedelta(days=v.days), axis=1)
# plot
ax = df.plot(x='date', y='health', rot=0, figsize=(10, 5))
# if the xticks labels need to be repositioned horizontally get the ticks and labels
ticks, labels = list(zip(*[(v.get_position()[0], v.get_text()) for v in ax.get_xticklabels()]))
# reset the ticks and labels
ax.set_xticks(ticks, labels, ha='center')
df
days health months date
0 0 30 May 23 2023-05-23
1 1 30 May 23 2023-05-24
2 2 20 May 23 2023-05-25
3 3 20 May 23 2023-05-26
4 4 10 May 23 2023-05-27
5 5 10 May 23 2023-05-28
6 6 10 Jun 23 2023-06-29
7 7 10 Jun 23 2023-06-30
8 8 10 Jun 23 2023-07-01
9 9 0 Jun 23 2023-07-02
df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 days 10 non-null int64
1 health 10 non-null int64
2 months 10 non-null object
3 date 10 non-null datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 452.0+ bytes
答案2
得分: 1
You can also define the width
:
ax = sns.barplot(df, x='days', y='health', color='blue', width=1)
ax.set_xticklabels(df.months)
plt.xticks(rotation=45)
plt.show()
Output:
Another way with lineplot
as suggested by @TrentonMcKinney
ax = sns.lineplot(df, x='days', y='health', color='blue', drawstyle='steps-pre')
ax.set_xticklabels(df.months)
ax.fill_betweenx(df['health'], df['days'], color='blue', step='post')
plt.xticks(rotation=45)
plt.show()
Output:
英文:
You can also define the width
:
ax = sns.barplot(df, x='days', y='health', color='blue', width=1)
ax.set_xticklabels(df.months)
plt.xticks(rotation=45)
plt.show()
Output:
Another way with lineplot
as suggested by @TrentonMcKinney
ax = sns.lineplot(df, x='days', y='health', color='blue', drawstyle='steps-pre')
ax.set_xticklabels(df.months)
ax.fill_betweenx(df['health'], df['days'], color='blue', step='post')
plt.xticks(rotation=45)
plt.show()
Output:
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论