Seaborn 将柱状图转换为类似直方图的图表。

huangapple go评论56阅读模式
英文:

Seaborn convert BarPlot to histogram-like chart

问题

我有一个 pandas 的 DataFrame,看起来像这样,我正在用它来绘制一个角色随着时间变化的生命情况。days 列实际上是“出生后的天数”。例如,这个角色是在 2023 年 5 月 26 日出生的。

这是 seaborn 的 BarPlot

我已经显著简化了角色的寿命天数,以便于可重现性,但在我的正常代码中,天数可能有数百天,甚至可能有数千天

这是我的正常情况下的图表。

正如您所看到的,这个图表有很多条形图,这似乎对性能产生了负面影响,即使只有几百天。

所以我的问题是:我能否将 BarPlot 转换为与我的 DataFrame 设置相对应的 seaborn 直方图?

理想情况下,它看起来会像下面的图像一样(忽略我糟糕的图形设计工作),红线只是用来突出直方图的每个部分,我不打算添加这些红线

请注意,我还需要能够将月份标签保持在与上面相同的位置,因为可能会有一段时间角色的健康状况保持不变。

我的图表代码很简单,但似乎 DataFrame 的大小导致了渲染时间很慢。

这是一个示例 DataFrame 的行,用于方便重现:

感谢您提前的回答。

英文:

I have a pandas DataFrame that looks like this, and I'm using it to graph the life of a character over period of days. The days column is really "days since birth." For this example, the character was born on May 26th, 2023.

   days  health  months
0     0    30    May 23
1     1    30    
2     2    20    
3     3    20    
4     4    10    
5     5    10    
6     6    10    Jun 23
7     7    10    
8     8    10    
9     9     0    

This is the seaborn BarPlot.

Seaborn 将柱状图转换为类似直方图的图表。

I have significantly simplified the number of days the character is alive for the sake of reproducibility, but in my normal code, the number of days is in the hundreds, possibly thousands.

Here is a graph of my normal case.

Seaborn 将柱状图转换为类似直方图的图表。

As you can see, this graph is much more overloaded with bars, which seems to be impacting performance pretty negatively, with only a few hundred days.

So my question is this: can I convert the BarPlot to the seaborn equivalent of a histogram with the way my DataFrame is set up?

The ideal would look something like the image below (ignore my bad graphic design job), The red lines are only to highlight each section of the histogram. I am not looking to add those red lines.

Note, I also need to be able to keep the month labels in the same place as they are above, since there could be a section of time where the character's health stays the same for multiple months.

Seaborn 将柱状图转换为类似直方图的图表。

My code is minimal for the chart, but the size of the DataFrame seems to be causing the slow rendering time.

ax = sns.barplot(dataframe, x='days', y='health', color='blue')
ax.set_xticklabels(dataframe.months)

plt.xticks(rotation=45)
plt.show()

Here's a line for the example DataFrame, for easy reproducibility:

df = pd.DataFrame({'days': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'health': [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], 'months': ["May 23", " ", " ", " ", " ", " ", "Jun 23", " ", " ", " "]})

Thank you in advance.

答案1

得分: 2

  • 似乎应该是一个连续的 x 轴,如果是这样的话,这应该是一个线图。
  • 你不应该在 x 轴上绘制字符串,因为这会导致每个字符串都有一个刻度,因为标签是分类的。
    • 结合 ‘months’‘days’,添加一年,创建一个日期时间 Dtype 列用作 x 轴。
  • pandas 会根据日期的范围不同而格式化刻度标签。
  • 通过查看处理日期时间 x 轴刻度标签格式的众多 SO 问题,自定义日期时间轴的确切外观和位置。
  • python 3.11.2pandas 2.0.1matplotlib 3.7.1 中测试通过。
import pandas as pd

# 示例数据框
df = pd.DataFrame({'days': [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], 'health': [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], 'months': ["May 23", " ", " ", " ", " ", " ", "Jun 23", " ", " ", " "]})

# 用 NA 替换字符串,以便前向填充正常工作
df = df.replace(' ', pd.NA)

# 填充月份列,而不是空白
df.months = df.months.ffill()

# 转换为带有特定年份的日期时间
df['date'] = pd.to_datetime('2023 ' + df.months, format='%Y %b %d')

# 通过添加天数作为偏移量来更新日期列
df.date = df.apply(lambda v: v.date + pd.Timedelta(days=v.days), axis=1)

# 绘图
ax = df.plot(x='date', y='health', rot=0, figsize=(10, 5))

# 如果需要重新定位 x 轴刻度标签,请获取刻度和标签
ticks, labels = list(zip(*[(v.get_position()[0], v.get_text()) for v in ax.get_xticklabels()]))

# 重置刻度和标签
ax.set_xticks(ticks, labels, ha='center')

Seaborn 将柱状图转换为类似直方图的图表。

df

   days  health  months       date
0     0      30  May 23 2023-05-23
1     1      30  May 23 2023-05-24
2     2      20  May 23 2023-05-25
3     3      20  May 23 2023-05-26
4     4      10  May 23 2023-05-27
5     5      10  May 23 2023-05-28
6     6      10  Jun 23 2023-06-29
7     7      10  Jun 23 2023-06-30
8     8      10  Jun 23 2023-07-01
9     9       0  Jun 23 2023-07-02

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   days    10 non-null     int64         
 1   health  10 non-null     int64         
 2   months  10 non-null     object        
 3   date    10 non-null     datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 452.0+ bytes
英文:
  • It seems like it's supposed to be a continuous x-axis, in which case, this should be a line plot.
  • You don't want to plot strings on the x-axis. This results in a tick for every string because the labels are categorical.
    • Combine &#39;months&#39; and &#39;days&#39;, add a year, and create a datetime Dtype column to use as the x-axis.
  • pandas will format the ticklabels differently based on the range of the dates.
  • Customize the exact look and location of the datetime axis, by looking at the many questions on SO dealing with formatting datetime xtick labels.
  • Tested in python 3.11.2, pandas 2.0.1, matplotlib 3.7.1
import pandas as pd

# sample dataframe
df = pd.DataFrame({&#39;days&#39;: [0, 1, 2, 3, 4, 5, 6, 7, 8, 9], &#39;health&#39;: [30, 30, 20, 20, 10, 10, 10, 10, 10, 0], &#39;months&#39;: [&quot;May 23&quot;, &quot; &quot;, &quot; &quot;, &quot; &quot;, &quot; &quot;, &quot; &quot;, &quot;Jun 23&quot;, &quot; &quot;, &quot; &quot;, &quot; &quot;]})

# replace the strings with NA, so forward fill will work
df = df.replace(&#39; &#39;, pd.NA)

# fill the months column, not empties
df.months = df.months.ffill()

# convert to a datetime with a specific year
df[&#39;date&#39;] = pd.to_datetime(&#39;2023 &#39; + df.months, format=&#39;%Y %b %d&#39;)

# update the date column by adding the days as an offset
df.date = df.apply(lambda v: v.date + pd.Timedelta(days=v.days), axis=1)

# plot
ax = df.plot(x=&#39;date&#39;, y=&#39;health&#39;, rot=0, figsize=(10, 5))

# if the xticks labels need to be repositioned horizontally get the ticks and labels
ticks, labels = list(zip(*[(v.get_position()[0], v.get_text()) for v in ax.get_xticklabels()]))

# reset the ticks and labels
ax.set_xticks(ticks, labels, ha=&#39;center&#39;)

Seaborn 将柱状图转换为类似直方图的图表。

df

   days  health  months       date
0     0      30  May 23 2023-05-23
1     1      30  May 23 2023-05-24
2     2      20  May 23 2023-05-25
3     3      20  May 23 2023-05-26
4     4      10  May 23 2023-05-27
5     5      10  May 23 2023-05-28
6     6      10  Jun 23 2023-06-29
7     7      10  Jun 23 2023-06-30
8     8      10  Jun 23 2023-07-01
9     9       0  Jun 23 2023-07-02

df.info()

&lt;class &#39;pandas.core.frame.DataFrame&#39;&gt;
RangeIndex: 10 entries, 0 to 9
Data columns (total 4 columns):
 #   Column  Non-Null Count  Dtype         
---  ------  --------------  -----         
 0   days    10 non-null     int64         
 1   health  10 non-null     int64         
 2   months  10 non-null     object        
 3   date    10 non-null     datetime64[ns]
dtypes: datetime64[ns](1), int64(2), object(1)
memory usage: 452.0+ bytes

答案2

得分: 1

You can also define the width:

ax = sns.barplot(df, x='days', y='health', color='blue', width=1)
ax.set_xticklabels(df.months)

plt.xticks(rotation=45)
plt.show()

Output:

Seaborn 将柱状图转换为类似直方图的图表。

Another way with lineplot as suggested by @TrentonMcKinney

ax = sns.lineplot(df, x='days', y='health', color='blue', drawstyle='steps-pre')
ax.set_xticklabels(df.months)
ax.fill_betweenx(df['health'], df['days'], color='blue', step='post')

plt.xticks(rotation=45)
plt.show()

Output:

Seaborn 将柱状图转换为类似直方图的图表。

英文:

You can also define the width:

ax = sns.barplot(df, x=&#39;days&#39;, y=&#39;health&#39;, color=&#39;blue&#39;, width=1)
ax.set_xticklabels(df.months)

plt.xticks(rotation=45)
plt.show()

Output:

Seaborn 将柱状图转换为类似直方图的图表。

Another way with lineplot as suggested by @TrentonMcKinney

ax = sns.lineplot(df, x=&#39;days&#39;, y=&#39;health&#39;, color=&#39;blue&#39;, drawstyle=&#39;steps-pre&#39;)
ax.set_xticklabels(df.months)
ax.fill_betweenx(df[&#39;health&#39;], df[&#39;days&#39;], color=&#39;blue&#39;, step=&#39;post&#39;)

plt.xticks(rotation=45)
plt.show()

Output:

Seaborn 将柱状图转换为类似直方图的图表。

huangapple
  • 本文由 发表于 2023年5月26日 00:37:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/76334511.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定