2023年7月23日 22:19:17go评论100阅读模式

英文:

Line plot for time series of grouped data in pandas

问题

我想要显示按客户类别分组的时间序列的折线图。以下是包含3个客户属于2个类别的4个连续月份的数据快照：

df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

如何绘制一张折线图（x轴上的时间段），显示这2组每个月的平均销售量？

英文:

I have monthly data for customers belonging to separate categories. I would like to display line plot of the time series grouped by customer category

Here is a snapshot containing data ("volume") over 4 consecutive months for 3 customers belonging to 2 categories

df = pd.DataFrame({&#39;cust_id&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   &#39;month&#39;: [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   &#39;volume&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   &#39;category&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})

How to plot a line plot (time period on the x axe) displaying the average monthly volume for each of these 2 groups?

答案1

得分: 2

关于使用 groupby.mean 和 seaborn.lineplot 的想法：

import seaborn as sns
import matplotlib.dates as mdates
# 可选: 为了确保有一个分类调色板
df['category'] = df['category'].astype('category')
tmp = (df.groupby(['category', pd.to_datetime(df['month'], format='%Y%m')])
       ['volume'].mean().reset_index(name='average volume')
      )
ax = sns.lineplot(data=tmp, x='month', y='average volume', hue='category')
# 更改标签
ax.tick_params(axis='x', rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter('%Y-%m'))

输出：

更改绘图宽度：

import matplotlib.pyplot as plt
# ...
fig, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(data=tmp, x='month', y='average volume', hue='category', ax=ax)
# ...

输出：

英文:

What about using a groupby.mean and seaborn.lineplot:

import seaborn as sns
import matplotlib.dates as mdates
# optional: to ensure having a categorical palette
df[&#39;category&#39;] = df[&#39;category&#39;].astype(&#39;category&#39;)
tmp = (df.groupby([&#39;category&#39;, pd.to_datetime(df[&#39;month&#39;], format=&#39;%Y%m&#39;)])
       [&#39;volume&#39;].mean().reset_index(name=&#39;average volume&#39;)
      )
ax = sns.lineplot(data=tmp, x=&#39;month&#39;, y=&#39;average volume&#39;, hue=&#39;category&#39;)
# change labels
ax.tick_params(axis=&#39;x&#39;, rotation=45)
ax.xaxis.set_major_formatter(mdates.DateFormatter(&#39;%Y-%m&#39;))

Output:

Changing the plot width:

import matplotlib.pyplot as plt
# ...
fig, ax = plt.subplots(figsize=(15, 8))
sns.lineplot(data=tmp, x=&#39;month&#39;, y=&#39;average volume&#39;, hue=&#39;category&#39;, ax=ax)
# ...

Output:

答案2

得分: 1

pivot_table 提供了极少的代码就能控制所需的聚合形状。

你请求的时间序列：

import pandas as pd
import numpy as np
TS = pd.pivot_table(data    = df,
                    values  = ['volume'],
                    index   = ['month'],
                    columns = ['category'],
                    aggfunc = np.mean)

输出：

         volume    
category      1   2
month              
200101        1   7
200102        2   8
200103        3   9
200104        4  10

确实，先将日期转换为datetime总是更好的选择，正如其他回答中所建议的。对于此操作，已经为mozway点赞：

df['month'] = pd.to_datetime(df['month'], format='%Y%m')

绘图如预期；可以添加任何你喜欢的装饰。

TS.plot(figsize=(24,8), 
        ylabel = '月平均销量')

英文:

pivot_table offers great control in very few words over the desired aggregate shape.

Your requested time series:

import pandas as pd
import numpy as np
TS = pd.pivot_table(data    = df,
                    values  = [&#39;volume&#39;],
                    index   = [&#39;month&#39;],
                    columns = [&#39;category&#39;],
                    aggfunc = np.mean)

Output:

         volume    
category      1   2
month              
200101        1   7
200102        2   8
200103        3   9
200104        4  10

Indeed, prior conversion to datetime always preferrable, as suggested by other respondants. Upvoted mozway for this:

df[&#39;month&#39;] = pd.to_datetime(df[&#39;month&#39;], format=&#39;%Y%m&#39;)

Plot is as expected; add any ornaments you like.

TS.plot(figsize=(24,8), 
        ylabel = &#39;Monthly average volume&#39;)

答案3

得分: 0

此代码将创建一个折线图，其中 x 轴上是月份，y 轴上是体积，每条线对应一个客户类别。图例将显示与每条线相关的客户类别。

import pandas as pd
import matplotlib.pyplot as plt
# Your DataFrame
df = pd.DataFrame({'cust_id': [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   'month': [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   'volume': [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   'category': [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
# Convert the 'month' column to a pandas datetime format for proper sorting
df['month'] = pd.to_datetime(df['month'], format='%y%m%d')
# Group the data by 'category' and 'cust_id' and then pivot it to have 'category' as columns
grouped_data = df.groupby(['category', 'cust_id', 'month'])['volume'].sum().unstack(level=0)
# Plotting the data
plt.figure(figsize=(10, 6))
for category in grouped_data.columns:
    plt.plot(grouped_data.index, grouped_data[category], label=f'Category {category}')
plt.xlabel('Month')
plt.ylabel('Volume')
plt.title('Time Series of Volume Grouped by Customer Category')
plt.legend()
plt.show()

英文:

This code will create a line plot with the months on the x-axis, the volume on the y-axis, and each line corresponding to a customer category. The legend will show the customer category associated with each line.

import pandas as pd
import matplotlib.pyplot as plt
# Your DataFrame
df = pd.DataFrame({&#39;cust_id&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 3, 3],
                   &#39;month&#39;: [200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104, 200101, 200102, 200103, 200104],
                   &#39;volume&#39;: [1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12],
                   &#39;category&#39;: [1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2]})
# Convert the &#39;month&#39; column to a pandas datetime format for proper sorting
df[&#39;month&#39;] = pd.to_datetime(df[&#39;month&#39;], format=&#39;%y%m%d&#39;)
# Group the data by &#39;category&#39; and &#39;cust_id&#39; and then pivot it to have &#39;category&#39; as columns
grouped_data = df.groupby([&#39;category&#39;, &#39;cust_id&#39;, &#39;month&#39;])[&#39;volume&#39;].sum().unstack(level=0)
# Plotting the data
plt.figure(figsize=(10, 6))
for category in grouped_data.columns:
    plt.plot(grouped_data.index, grouped_data[category], label=f&#39;Category {category}&#39;)
plt.xlabel(&#39;Month&#39;)
plt.ylabel(&#39;Volume&#39;)
plt.title(&#39;Time Series of Volume Grouped by Customer Category&#39;)
plt.legend()
plt.show()

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在pandas中绘制分组数据的时间序列线图。

问题

答案1

答案2

答案3

在Pandas中，如何水平连接并交错列？

将字典转换为数据框(DataFrame)的方法

添加逗号分隔的字符串列表中的计数。

Updating existing Excel file with Pandas and Openpyxl throws an AttributeError: property 'book' of 'OpenpyxlWriter' object has no setter

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论