英文:
Line plot for time series of grouped data in pandas
问题
我想要显示按客户类别分组的时间序列的折线图。以下是包含3个客户属于2个类别的4个连续月份的数据快照:
df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
如何绘制一张折线图(x轴上的时间段),显示这2组每个月的平均销售量?
英文:
I have monthly data for customers belonging to separate categories. I would like to display line plot of the time series grouped by customer category
Here is a snapshot containing data ("volume") over 4 consecutive months for 3 customers belonging to 2 categories
df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
How to plot a line plot (time period on the x axe) displaying the average monthly volume for each of these 2 groups?
答案1
得分: 2
关于使用 groupby.mean
和 seaborn.lineplot
的想法:
import seaborn as sns
import matplotlib.dates as mdates
# 可选: 为了确保有一个分类调色板
df['category'] = df['category'].astype('category')
tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
['volume'].mean().reset_index(name='average volume')
)
ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')
# 更改标签
ax.tick_params(axis='x', rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
输出:
更改绘图宽度:
import matplotlib.pyplot as plt
# ...
fig, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)
# ...
输出:
英文:
What about using a groupby.mean
and seaborn.lineplot
:
import seaborn as sns
import matplotlib.dates as mdates
# optional: to ensure having a categorical palette
df['category'] = df['category'].astype('category')
tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
['volume'].mean().reset_index(name='average volume')
)
ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')
# change labels
ax.tick_params(axis='x', rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))
Output:
Changing the plot width:
import matplotlib.pyplot as plt
# ...
fig, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)
# ...
Output:
答案2
得分: 1
pivot_table 提供了极少的代码就能控制所需的聚合形状。
你请求的时间序列:
import pandas as pd
import numpy as np
TS = pd.pivot_table(data = df,
values = ['volume'],
index = ['month'],
columns = ['category'],
aggfunc = np.mean)
输出:
volume
category 1 2
month
200101 1 7
200102 2 8
200103 3 9
200104 4 10
确实,先将日期转换为datetime总是更好的选择,正如其他回答中所建议的。对于此操作,已经为mozway点赞:
df['month'] = pd.to_datetime(df['month'], format='%Y%m')
绘图如预期;可以添加任何你喜欢的装饰。
TS.plot(figsize=(24,8),
ylabel = '月平均销量')
英文:
pivot_table offers great control in very few words over the desired aggregate shape.
Your requested time series:
import pandas as pd
import numpy as np
TS = pd.pivot_table(data = df,
values = ['volume'],
index = ['month'],
columns = ['category'],
aggfunc = np.mean)
Output:
volume
category 1 2
month
200101 1 7
200102 2 8
200103 3 9
200104 4 10
Indeed, prior conversion to datetime always preferrable, as suggested by other respondants. Upvoted mozway for this:
df['month'] = pd.to_datetime(df['month'], format='%Y%m')
Plot is as expected; add any ornaments you like.
TS.plot(figsize=(24,8),
ylabel = 'Monthly average volume')
答案3
得分: 0
此代码将创建一个折线图,其中 x 轴上是月份,y 轴上是体积,每条线对应一个客户类别。图例将显示与每条线相关的客户类别。
import pandas as pd
import matplotlib.pyplot as plt
# Your DataFrame
df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
# Convert the 'month' column to a pandas datetime format for proper sorting
df['month'] = pd.to_datetime(df['month'], format='%y%m%d')
# Group the data by 'category' and 'cust_id' and then pivot it to have 'category' as columns
grouped_data = df.groupby(['category', 'cust_id', 'month'])['volume'].sum().unstack(level=0)
# Plotting the data
plt.figure(figsize=(10, 6))
for category in grouped_data.columns:
plt.plot(grouped_data.index, grouped_data[category], label=f'Category {category}')
plt.xlabel('Month')
plt.ylabel('Volume')
plt.title('Time Series of Volume Grouped by Customer Category')
plt.legend()
plt.show()
英文:
This code will create a line plot with the months on the x-axis, the volume on the y-axis, and each line corresponding to a customer category. The legend will show the customer category associated with each line.
<!-- begin snippet: js hide: false console: true babel: false -->
<!-- language: lang-python -->
import pandas as pd
import matplotlib.pyplot as plt
# Your DataFrame
df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
# Convert the 'month' column to a pandas datetime format for proper sorting
df['month'] = pd.to_datetime(df['month'], format='%y%m%d')
# Group the data by 'category' and 'cust_id' and then pivot it to have 'category' as columns
grouped_data = df.groupby(['category', 'cust_id', 'month'])['volume'].sum().unstack(level=0)
# Plotting the data
plt.figure(figsize=(10, 6))
for category in grouped_data.columns:
plt.plot(grouped_data.index, grouped_data[category], label=f'Category {category}')
plt.xlabel('Month')
plt.ylabel('Volume')
plt.title('Time Series of Volume Grouped by Customer Category')
plt.legend()
plt.show()
<!-- end snippet -->
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论