在pandas中绘制分组数据的时间序列线图。

huangapple go评论76阅读模式
英文:

Line plot for time series of grouped data in pandas

问题

我想要显示按客户类别分组的时间序列的折线图。以下是包含3个客户属于2个类别的4个连续月份的数据快照:

df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

如何绘制一张折线图(x轴上的时间段),显示这2组每个月的平均销售量?

英文:

I have monthly data for customers belonging to separate categories. I would like to display line plot of the time series grouped by customer category

Here is a snapshot containing data ("volume") over 4 consecutive months for 3 customers belonging to 2 categories

df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

How to plot a line plot (time period on the x axe) displaying the average monthly volume for each of these 2 groups?

答案1

得分: 2

关于使用 groupby.meanseaborn.lineplot 的想法:

import seaborn as sns
import matplotlib.dates as mdates

# 可选: 为了确保有一个分类调色板
df['category'] = df['category'].astype('category')

tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
       ['volume'].mean().reset_index(name='average volume')
      )

ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')

# 更改标签
ax.tick_params(axis='x', rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

输出:

在pandas中绘制分组数据的时间序列线图。

更改绘图宽度:

import matplotlib.pyplot as plt

# ...

fig, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)

# ...

输出:

在pandas中绘制分组数据的时间序列线图。

英文:

What about using a groupby.mean and seaborn.lineplot:

import seaborn as sns
import matplotlib.dates as mdates

# optional: to ensure having a categorical palette
df['category'] = df['category'].astype('category')

tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
       ['volume'].mean().reset_index(name='average volume')
      )

ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')

# change labels
ax.tick_params(axis='x', rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

Output:

在pandas中绘制分组数据的时间序列线图。

Changing the plot width:

import matplotlib.pyplot as plt

# ...

fig, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)

# ...

Output:

在pandas中绘制分组数据的时间序列线图。

答案2

得分: 1

pivot_table 提供了极少的代码就能控制所需的聚合形状。

你请求的时间序列:

import pandas as pd
import numpy as np

TS = pd.pivot_table(data    = df,
                    values  = ['volume'],
                    index   = ['month'],
                    columns = ['category'],
                    aggfunc = np.mean)

输出:

         volume    
category      1   2
month              
200101        1   7
200102        2   8
200103        3   9
200104        4  10

确实,先将日期转换为datetime总是更好的选择,正如其他回答中所建议的。对于此操作,已经为mozway点赞:

df['month'] = pd.to_datetime(df['month'], format='%Y%m')

绘图如预期;可以添加任何你喜欢的装饰。

TS.plot(figsize=(24,8), 
        ylabel = '月平均销量')
英文:

pivot_table offers great control in very few words over the desired aggregate shape.

Your requested time series:

import pandas as pd
import numpy as np

TS = pd.pivot_table(data    = df,
                    values  = ['volume'],
                    index   = ['month'],
                    columns = ['category'],
                    aggfunc = np.mean)

Output:

         volume    
category      1   2
month              
200101        1   7
200102        2   8
200103        3   9
200104        4  10

Indeed, prior conversion to datetime always preferrable, as suggested by other respondants. Upvoted mozway for this:

df['month'] = pd.to_datetime(df['month'], format='%Y%m')

Plot is as expected; add any ornaments you like.

TS.plot(figsize=(24,8), 
        ylabel = 'Monthly average volume')

答案3

得分: 0

此代码将创建一个折线图,其中 x 轴上是月份,y 轴上是体积,每条线对应一个客户类别。图例将显示与每条线相关的客户类别。

import pandas as pd
import matplotlib.pyplot as plt

# Your DataFrame
df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

# Convert the 'month' column to a pandas datetime format for proper sorting
df['month'] = pd.to_datetime(df['month'], format='%y%m%d')

# Group the data by 'category' and 'cust_id' and then pivot it to have 'category' as columns
grouped_data = df.groupby(['category', 'cust_id', 'month'])['volume'].sum().unstack(level=0)

# Plotting the data
plt.figure(figsize=(10, 6))
for category in grouped_data.columns:
    plt.plot(grouped_data.index, grouped_data[category], label=f'Category {category}')

plt.xlabel('Month')
plt.ylabel('Volume')
plt.title('Time Series of Volume Grouped by Customer Category')
plt.legend()
plt.show()
英文:

This code will create a line plot with the months on the x-axis, the volume on the y-axis, and each line corresponding to a customer category. The legend will show the customer category associated with each line.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-python -->

import pandas as pd
import matplotlib.pyplot as plt

# Your DataFrame
df = pd.DataFrame({&#39;cust_id&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   &#39;month&#39;: [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   &#39;volume&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   &#39;category&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

# Convert the &#39;month&#39; column to a pandas datetime format for proper sorting
df[&#39;month&#39;] = pd.to_datetime(df[&#39;month&#39;], format=&#39;%y%m%d&#39;)

# Group the data by &#39;category&#39; and &#39;cust_id&#39; and then pivot it to have &#39;category&#39; as columns
grouped_data = df.groupby([&#39;category&#39;, &#39;cust_id&#39;, &#39;month&#39;])[&#39;volume&#39;].sum().unstack(level=0)

# Plotting the data
plt.figure(figsize=(10, 6))
for category in grouped_data.columns:
    plt.plot(grouped_data.index, grouped_data[category], label=f&#39;Category {category}&#39;)

plt.xlabel(&#39;Month&#39;)
plt.ylabel(&#39;Volume&#39;)
plt.title(&#39;Time Series of Volume Grouped by Customer Category&#39;)
plt.legend()
plt.show()

<!-- end snippet -->

huangapple
  • 本文由 发表于 2023年7月23日 22:19:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76748716.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定