DatetimeIndex在使用pandas绘图时的格式化

huangapple go评论69阅读模式
英文:

DatetimeIndex formatting when plotting with pandas

问题

我正在尝试绘制来自经过月份和年份分组的pandas数据的图表。然而,我在格式化生成的图表的x轴方面遇到了困难。似乎我只能更改最右边的xtick标签,而不能更改其他标签。而且,在更改格式时,日期被重置。当将DatetimeIndex转换为字符串并替换xtick标签时,只有最后一个标签被更新。某种方式,pandas在matplotlib中进行了一些我不理解的操作。

我喜欢只有每隔3个月有标签,但每个标签也应该包含年份。我该如何实现这个?

使用基本格式化,我得到以下结果:
DatetimeIndex在使用pandas绘图时的格式化

使用格式替换,我得到了这个奇怪的结果:
DatetimeIndex在使用pandas绘图时的格式化

生成这些图表的代码(Python 3.9.13、pandas 1.4.4 和 matplotlib 3.5.2)如下:

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

# 生成 DataFrame
df = pd.DataFrame(data = {
    "value": np.random.rand(1000),
    "day": np.random.randint(1, 28+1, 1000),
    "month": np.random.randint(1, 12+1, 1000),
    "year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])

# 按月(和年份)对数据进行分组
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()

# 绘制数据
ax = monthly["2008-02":"2009-03"].value.plot.line()

# 可选的不同格式化方式,我尝试过的
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
date_form = DateFormatter("%m-%Y")
ax.xaxis.set_major_formatter(date_form)
英文:

I'm trying to plot data from pandas that has been grouped by months and years. However I struggle to format the resulting plot's x-axis. It seems I can only change the right-most xtick label, and not the others. Also, when changing the format, the date gets reset. When converting the DatetimeIndex to strings and replacing the xticklabels there is only the last one updated. Somehow pandas does some shenanigangs in matplotlib I don't understand.

I like how there is only every 3rd month labeled, but it should also contain the year with each label. How can I achieve this?

With the base formatting I get
DatetimeIndex在使用pandas绘图时的格式化

And with the formatter replacement I get this weird thing:
DatetimeIndex在使用pandas绘图时的格式化

Code to generate these plots (python 3.9.13, pandas 1.4.4 and matplotlib 3.5.2):

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np

# generate DataFrame
df = pd.DataFrame(data = {
    "value": np.random.rand(1000),
    "day": np.random.randint(1, 28+1, 1000),
    "month": np.random.randint(1, 12+1, 1000),
    "year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])

# group the data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()

# plot the data
ax = monthly["2008-02":"2009-03"].value.plot.line()

# optional different formatter I've tried
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
date_form = DateFormatter("%m-%Y")
ax.xaxis.set_major_formatter(date_form)

答案1

得分: 1

尝试这个:

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.dates as mdates

# 生成 DataFrame
df = pd.DataFrame(data={
    "value": np.random.rand(1000),
    "day": np.random.randint(1, 28+1, 1000),
    "month": np.random.randint(1, 12+1, 1000),
    "year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])

# 按月份(和年份)分组数据
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()

# 绘制数据
fig, ax = plt.subplots()
ax.plot(monthly.index, monthly["value"])

# 格式化 x 轴刻度
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))  # 每隔3个月显示一次
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  # 以缩写的月份和年份格式显示

# 设置月份的次要定位器
ax.xaxis.set_minor_locator(mdates.MonthLocator())

# 旋转 x 轴刻度标签以提高可读性(可选)
plt.xticks(rotation=45)

# 显示图表
plt.show()

DatetimeIndex在使用pandas绘图时的格式化

英文:

try this:

from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.dates as mdates

# generate DataFrame
df = pd.DataFrame(data={
    "value": np.random.rand(1000),
    "day": np.random.randint(1, 28+1, 1000),
    "month": np.random.randint(1, 12+1, 1000),
    "year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])

# group the data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()

# plot the data
fig, ax = plt.subplots()
ax.plot(monthly.index, monthly["value"])

# format x-axis ticks
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))  # Show every 3rd month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  # Format as abbreviated month and year

# set minor locator for months
ax.xaxis.set_minor_locator(mdates.MonthLocator())

# rotate the x-axis tick labels for better readability (optional)
plt.xticks(rotation=45)

# display the plot
plt.show()

DatetimeIndex在使用pandas绘图时的格式化

答案2

得分: 1

  • 数据框中的问题在于index列中的值是pandas._libs.tslibs.timestamps.Timestamp类型。

    • 可以通过将类型更改为datetime.date来解决此问题,可以通过仅选择.date组件来完成,因为.time组件是无关的。
    • 仍然可以使用pandas.DataFrame.plot来绘制数据框。
  • 一些方法,例如pd.Grouper,以及使用monthly["2008-02":"2009-03"]选择索引范围,需要使用.Timestamp类型才能正常工作,因此在完成其他格式化和操作之前不要更改类型。

import matplotlib.dates as mdates

# 按月份(和年份)汇总数值数据
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()

# 提取日期组件
monthly.index = monthly.index.date

# 绘制折线图
ax = monthly.plot(rot=45)

# 格式化x轴刻度
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))  # 每3个月显示一次
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  # 格式化为缩写的月份和年份

# 设置次要刻度定位器以显示每个月
ax.xaxis.set_minor_locator(mdates.MonthLocator())

DatetimeIndex在使用pandas绘图时的格式化

# 按月份(和年份)汇总数值数据
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()

# 选择指定范围
sel = monthly.loc["2008-02-28":"2009-03-31"]

# 提取日期组件
sel.index = sel.index.date

# 绘制折线图
ax = sel.plot(rot=45)

# 格式化x轴刻度
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))  # 每3个月显示一次
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  # 格式化为缩写的月份和年份

# 设置次要刻度定位器以显示每个月
ax.xaxis.set_minor_locator(mdates.MonthLocator())

DatetimeIndex在使用pandas绘图时的格式化

英文:
  • The issue is the values in the dataframe index are pandas._libs.tslibs.timestamps.Timestamp type.
    • The issue can be resolved by changing the type to datetime.date, which can be done by selecting only the .date component, since the .time component is irrelevant.
    • pandas.DataFrame.plot can still be used to plot the dataframe.
  • Some methods, such as pd.Grouper, and selecting an index range with monthly["2008-02":"2009-03"], require the .Timestamp type to work, so don't change the type until other formatting and manipulations are complete.
import matplotlib.dates as mdates

# sum the value data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()

# extract the date component
monthly.index = monthly.index.date

# plot a line
ax = monthly.plot(rot=45)

# format x-axis ticks
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))  # Show every 3rd month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  # Format as abbreviated month and year

# set minor locator for months
ax.xaxis.set_minor_locator(mdates.MonthLocator())

DatetimeIndex在使用pandas绘图时的格式化

# sum the value data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()

# selected range
sel = monthly.loc["2008-02-28":"2009-03-31"]

# extract the date component
sel.index = sel.index.date

# plot a line
ax = sel.plot(rot=45)

# format x-axis ticks
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3))  # Show every 3rd month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y'))  # Format as abbreviated month and year

# set minor locator for months
ax.xaxis.set_minor_locator(mdates.MonthLocator())

DatetimeIndex在使用pandas绘图时的格式化

huangapple
  • 本文由 发表于 2023年6月19日 20:09:26
  • 转载请务必保留本文链接:https://go.coder-hub.com/76506485.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定