英文:
DatetimeIndex formatting when plotting with pandas
问题
我正在尝试绘制来自经过月份和年份分组的pandas数据的图表。然而,我在格式化生成的图表的x轴方面遇到了困难。似乎我只能更改最右边的xtick标签,而不能更改其他标签。而且,在更改格式时,日期被重置。当将DatetimeIndex转换为字符串并替换xtick标签时,只有最后一个标签被更新。某种方式,pandas在matplotlib中进行了一些我不理解的操作。
我喜欢只有每隔3个月有标签,但每个标签也应该包含年份。我该如何实现这个?
生成这些图表的代码(Python 3.9.13、pandas 1.4.4 和 matplotlib 3.5.2)如下:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
# 生成 DataFrame
df = pd.DataFrame(data = {
"value": np.random.rand(1000),
"day": np.random.randint(1, 28+1, 1000),
"month": np.random.randint(1, 12+1, 1000),
"year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])
# 按月(和年份)对数据进行分组
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()
# 绘制数据
ax = monthly["2008-02":"2009-03"].value.plot.line()
# 可选的不同格式化方式,我尝试过的
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
date_form = DateFormatter("%m-%Y")
ax.xaxis.set_major_formatter(date_form)
英文:
I'm trying to plot data from pandas that has been grouped by months and years. However I struggle to format the resulting plot's x-axis. It seems I can only change the right-most xtick label, and not the others. Also, when changing the format, the date gets reset. When converting the DatetimeIndex to strings and replacing the xticklabels there is only the last one updated. Somehow pandas does some shenanigangs in matplotlib I don't understand.
I like how there is only every 3rd month labeled, but it should also contain the year with each label. How can I achieve this?
With the base formatting I get
And with the formatter replacement I get this weird thing:
Code to generate these plots (python 3.9.13, pandas 1.4.4 and matplotlib 3.5.2):
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
# generate DataFrame
df = pd.DataFrame(data = {
"value": np.random.rand(1000),
"day": np.random.randint(1, 28+1, 1000),
"month": np.random.randint(1, 12+1, 1000),
"year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])
# group the data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()
# plot the data
ax = monthly["2008-02":"2009-03"].value.plot.line()
# optional different formatter I've tried
import matplotlib.dates as mdates
from matplotlib.dates import DateFormatter
from pandas.plotting import register_matplotlib_converters
register_matplotlib_converters()
date_form = DateFormatter("%m-%Y")
ax.xaxis.set_major_formatter(date_form)
答案1
得分: 1
尝试这个:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
# 生成 DataFrame
df = pd.DataFrame(data={
"value": np.random.rand(1000),
"day": np.random.randint(1, 28+1, 1000),
"month": np.random.randint(1, 12+1, 1000),
"year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])
# 按月份(和年份)分组数据
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()
# 绘制数据
fig, ax = plt.subplots()
ax.plot(monthly.index, monthly["value"])
# 格式化 x 轴刻度
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3)) # 每隔3个月显示一次
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) # 以缩写的月份和年份格式显示
# 设置月份的次要定位器
ax.xaxis.set_minor_locator(mdates.MonthLocator())
# 旋转 x 轴刻度标签以提高可读性(可选)
plt.xticks(rotation=45)
# 显示图表
plt.show()
英文:
try this:
from matplotlib import pyplot as plt
import pandas as pd
import numpy as np
import matplotlib.dates as mdates
# generate DataFrame
df = pd.DataFrame(data={
"value": np.random.rand(1000),
"day": np.random.randint(1, 28+1, 1000),
"month": np.random.randint(1, 12+1, 1000),
"year": np.random.randint(2005, 2010+1, 1000)
})
df["date"] = pd.to_datetime(df[['year', 'month', 'day']])
# group the data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).sum()
# plot the data
fig, ax = plt.subplots()
ax.plot(monthly.index, monthly["value"])
# format x-axis ticks
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3)) # Show every 3rd month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) # Format as abbreviated month and year
# set minor locator for months
ax.xaxis.set_minor_locator(mdates.MonthLocator())
# rotate the x-axis tick labels for better readability (optional)
plt.xticks(rotation=45)
# display the plot
plt.show()
答案2
得分: 1
-
数据框中的问题在于
index
列中的值是pandas._libs.tslibs.timestamps.Timestamp
类型。- 可以通过将类型更改为
datetime.date
来解决此问题,可以通过仅选择.date
组件来完成,因为.time
组件是无关的。 - 仍然可以使用
pandas.DataFrame.plot
来绘制数据框。
- 可以通过将类型更改为
-
一些方法,例如
pd.Grouper
,以及使用monthly["2008-02":"2009-03"]
选择索引范围,需要使用.Timestamp
类型才能正常工作,因此在完成其他格式化和操作之前不要更改类型。
import matplotlib.dates as mdates
# 按月份(和年份)汇总数值数据
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()
# 提取日期组件
monthly.index = monthly.index.date
# 绘制折线图
ax = monthly.plot(rot=45)
# 格式化x轴刻度
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3)) # 每3个月显示一次
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) # 格式化为缩写的月份和年份
# 设置次要刻度定位器以显示每个月
ax.xaxis.set_minor_locator(mdates.MonthLocator())
# 按月份(和年份)汇总数值数据
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()
# 选择指定范围
sel = monthly.loc["2008-02-28":"2009-03-31"]
# 提取日期组件
sel.index = sel.index.date
# 绘制折线图
ax = sel.plot(rot=45)
# 格式化x轴刻度
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3)) # 每3个月显示一次
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) # 格式化为缩写的月份和年份
# 设置次要刻度定位器以显示每个月
ax.xaxis.set_minor_locator(mdates.MonthLocator())
英文:
- The issue is the values in the dataframe
index
arepandas._libs.tslibs.timestamps.Timestamp
type.- The issue can be resolved by changing the type to
datetime.date
, which can be done by selecting only the.date
component, since the.time
component is irrelevant. pandas.DataFrame.plot
can still be used to plot the dataframe.
- The issue can be resolved by changing the type to
- Some methods, such as
pd.Grouper
, and selecting an index range withmonthly["2008-02":"2009-03"]
, require the.Timestamp
type to work, so don't change thetype
until other formatting and manipulations are complete.
import matplotlib.dates as mdates
# sum the value data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()
# extract the date component
monthly.index = monthly.index.date
# plot a line
ax = monthly.plot(rot=45)
# format x-axis ticks
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3)) # Show every 3rd month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) # Format as abbreviated month and year
# set minor locator for months
ax.xaxis.set_minor_locator(mdates.MonthLocator())
# sum the value data by month (and also year)
monthly = df.groupby(pd.Grouper(key="date", freq="M")).value.sum()
# selected range
sel = monthly.loc["2008-02-28":"2009-03-31"]
# extract the date component
sel.index = sel.index.date
# plot a line
ax = sel.plot(rot=45)
# format x-axis ticks
ax.xaxis.set_major_locator(mdates.MonthLocator(interval=3)) # Show every 3rd month
ax.xaxis.set_major_formatter(mdates.DateFormatter('%b %Y')) # Format as abbreviated month and year
# set minor locator for months
ax.xaxis.set_minor_locator(mdates.MonthLocator())
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论