在pandas中绘制分组数据的时间序列线图。

huangapple go评论100阅读模式
英文:

Line plot for time series of grouped data in pandas

问题

我想要显示按客户类别分组的时间序列的折线图。以下是包含3个客户属于2个类别的4个连续月份的数据快照:

  1. df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
  2. 'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
  3. 'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
  4. 'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

如何绘制一张折线图(x轴上的时间段),显示这2组每个月的平均销售量?

英文:

I have monthly data for customers belonging to separate categories. I would like to display line plot of the time series grouped by customer category

Here is a snapshot containing data ("volume") over 4 consecutive months for 3 customers belonging to 2 categories

  1. df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
  2. 'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
  3. 'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
  4. 'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

How to plot a line plot (time period on the x axe) displaying the average monthly volume for each of these 2 groups?

答案1

得分: 2

关于使用 groupby.meanseaborn.lineplot 的想法:

  1. import seaborn as sns
  2. import matplotlib.dates as mdates
  3. # 可选: 为了确保有一个分类调色板
  4. df['category'] = df['category'].astype('category')
  5. tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
  6. ['volume'].mean().reset_index(name='average volume')
  7. )
  8. ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')
  9. # 更改标签
  10. ax.tick_params(axis='x', rotation=45)
  11. ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

输出:

在pandas中绘制分组数据的时间序列线图。

更改绘图宽度:

  1. import matplotlib.pyplot as plt
  2. # ...
  3. fig, ax = plt.subplots(figsize=(15, 8))
  4. sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)
  5. # ...

输出:

在pandas中绘制分组数据的时间序列线图。

英文:

What about using a groupby.mean and seaborn.lineplot:

  1. import seaborn as sns
  2. import matplotlib.dates as mdates
  3. # optional: to ensure having a categorical palette
  4. df['category'] = df['category'].astype('category')
  5. tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
  6. ['volume'].mean().reset_index(name='average volume')
  7. )
  8. ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')
  9. # change labels
  10. ax.tick_params(axis='x', rotation=45)
  11. ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

Output:

在pandas中绘制分组数据的时间序列线图。

Changing the plot width:

  1. import matplotlib.pyplot as plt
  2. # ...
  3. fig, ax = plt.subplots(figsize=(15, 8))
  4. sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)
  5. # ...

Output:

在pandas中绘制分组数据的时间序列线图。

答案2

得分: 1

pivot_table 提供了极少的代码就能控制所需的聚合形状。

你请求的时间序列:

  1. import pandas as pd
  2. import numpy as np
  3. TS = pd.pivot_table(data = df,
  4. values = ['volume'],
  5. index = ['month'],
  6. columns = ['category'],
  7. aggfunc = np.mean)

输出:

  1. volume
  2. category 1 2
  3. month
  4. 200101 1 7
  5. 200102 2 8
  6. 200103 3 9
  7. 200104 4 10

确实,先将日期转换为datetime总是更好的选择,正如其他回答中所建议的。对于此操作,已经为mozway点赞:

  1. df['month'] = pd.to_datetime(df['month'], format='%Y%m')

绘图如预期;可以添加任何你喜欢的装饰。

  1. TS.plot(figsize=(24,8),
  2. ylabel = '月平均销量')
英文:

pivot_table offers great control in very few words over the desired aggregate shape.

Your requested time series:

  1. import pandas as pd
  2. import numpy as np
  3. TS = pd.pivot_table(data = df,
  4. values = ['volume'],
  5. index = ['month'],
  6. columns = ['category'],
  7. aggfunc = np.mean)

Output:

  1. volume
  2. category 1 2
  3. month
  4. 200101 1 7
  5. 200102 2 8
  6. 200103 3 9
  7. 200104 4 10

Indeed, prior conversion to datetime always preferrable, as suggested by other respondants. Upvoted mozway for this:

  1. df['month'] = pd.to_datetime(df['month'], format='%Y%m')

Plot is as expected; add any ornaments you like.

  1. TS.plot(figsize=(24,8),
  2. ylabel = 'Monthly average volume')

答案3

得分: 0

此代码将创建一个折线图,其中 x 轴上是月份,y 轴上是体积,每条线对应一个客户类别。图例将显示与每条线相关的客户类别。

  1. import pandas as pd
  2. import matplotlib.pyplot as plt
  3. # Your DataFrame
  4. df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
  5. 'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
  6. 'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
  7. 'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
  8. # Convert the 'month' column to a pandas datetime format for proper sorting
  9. df['month'] = pd.to_datetime(df['month'], format='%y%m%d')
  10. # Group the data by 'category' and 'cust_id' and then pivot it to have 'category' as columns
  11. grouped_data = df.groupby(['category', 'cust_id', 'month'])['volume'].sum().unstack(level=0)
  12. # Plotting the data
  13. plt.figure(figsize=(10, 6))
  14. for category in grouped_data.columns:
  15. plt.plot(grouped_data.index, grouped_data[category], label=f'Category {category}')
  16. plt.xlabel('Month')
  17. plt.ylabel('Volume')
  18. plt.title('Time Series of Volume Grouped by Customer Category')
  19. plt.legend()
  20. plt.show()
英文:

This code will create a line plot with the months on the x-axis, the volume on the y-axis, and each line corresponding to a customer category. The legend will show the customer category associated with each line.

<!-- begin snippet: js hide: false console: true babel: false -->

<!-- language: lang-python -->

  1. import pandas as pd
  2. import matplotlib.pyplot as plt
  3. # Your DataFrame
  4. df = pd.DataFrame({&#39;cust_id&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
  5. &#39;month&#39;: [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
  6. &#39;volume&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
  7. &#39;category&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
  8. # Convert the &#39;month&#39; column to a pandas datetime format for proper sorting
  9. df[&#39;month&#39;] = pd.to_datetime(df[&#39;month&#39;], format=&#39;%y%m%d&#39;)
  10. # Group the data by &#39;category&#39; and &#39;cust_id&#39; and then pivot it to have &#39;category&#39; as columns
  11. grouped_data = df.groupby([&#39;category&#39;, &#39;cust_id&#39;, &#39;month&#39;])[&#39;volume&#39;].sum().unstack(level=0)
  12. # Plotting the data
  13. plt.figure(figsize=(10, 6))
  14. for category in grouped_data.columns:
  15. plt.plot(grouped_data.index, grouped_data[category], label=f&#39;Category {category}&#39;)
  16. plt.xlabel(&#39;Month&#39;)
  17. plt.ylabel(&#39;Volume&#39;)
  18. plt.title(&#39;Time Series of Volume Grouped by Customer Category&#39;)
  19. plt.legend()
  20. plt.show()

<!-- end snippet -->

huangapple
  • 本文由 发表于 2023年7月23日 22:19:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/76748716.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定