如何在同一图表上绘制折线图和箱线图,而 x 轴是日期。

huangapple go评论72阅读模式
英文:

How to plot a line and a box plot in the same graph and the x-axis is a date

问题

You received the error message "ValueError: List of boxplot statistics and positions values must have the same length" because the positions parameter in the boxplot function expects a list of positions for the box plots, and it seems that the length of your positions list is not matching the number of box plots you intend to create.

To fix this issue, you can adjust the positions parameter to provide positions for each month in your df2 DataFrame. Here's a modified version of your code with the correct positions parameter:

import pandas as pd
import matplotlib.pyplot as plt
from datetime import datetime

# Your DataFrame df1
df1 = pd.DataFrame({
    'date': ['2023-04-01', '2023-03-01', '2023-02-01', '2023-01-01', '2022-12-01'],
    'year': [2023, 2023, 2023, 2023, 2022],
    'month': [4, 3, 2, 1, 12],
    'discharge': [10, 20, 30, 15, 25]
})

# Your DataFrame df2 generation code (unchanged)

# Plot
fig, ax = plt.subplots()
ax.plot(df1['date'], df1['discharge'], label='Discharge')

# Calculate positions for the box plots
positions = range(1, 1 + len(dates.unique()))

# Plot the box plots for each month
df2.boxplot(column='discharge', by='month', positions=positions, widths=0.6, ax=ax)

plt.show()

In this code, the positions list is generated based on the unique months in your dates series, ensuring that it has the same length as the number of box plots you want to create for each month.

英文:

I've been trying to make a plot from two dataframe that contains discharge values from a monthly time serie. The first one is a dataframe (df1) which contains the following columns date, year, month and discharge. the second dataframe (df2) contains the same columns but it contains different discharge values for each month. I want to make a plot in the same figure using the two dataframes. The dataframe df1 has to be a line plot with a x-axis as date and y-axis as discharge. The second dataframe (df2) has to be a box-plot with a x-axis as date and y-axis as the grouped discharge for each month.

Here is the code I have tested:

df1 = pd.DataFrame({
    'date': ['2023-04-01', '2023-03-01', '2023-02-01', '2023-01-01', '2022-12-01'],
    'year': [2023,2023,2023,2023,2022],
    'month': [4,3,2,1,12],
    'discharge': [10, 20, 30, 15, 25]
})

# Define the start and end dates
start_date = datetime(2023, 5, 1)
end_date = datetime(2023, 9, 1)

# Generate the date range
dates = pd.date_range(start=start_date, end=end_date, freq='MS')

# Define the values for the DataFrame
df2  = {'date': dates.repeat(10),
        'year': [d.year for d in dates] * 10,
        'month': [d.month for d in dates] * 10,
        'discharge': [i+1 for i in range(len(dates))] * 10}

# Plot 

fig, ax = plt.subplots()
ax.plot(df1['date'], df1['discharge'], label='Discharge');

# plot the box plot
df2.boxplot(column='discharge', by='month', positions=[df2['date'][2]], widths=10, ax=ax)

I got this error:

ValueError: List of boxplot statistics and positions values must have same the length

答案1

得分: 1

以下是翻译好的部分:

错误的原因是pos只有一个值用于df2['date'][2]。但是,即使您手动使用unique()提供了其他值,也不会起作用。存在一些问题。其中一个问题是您需要同时对线图和箱图使用datetime。我假设您希望日期递增,y轴对于两个图表都需要相同。

为了实现这一点,您需要首先将每个点(对于两个图表)转换为一个整数,该整数是从任何一个图表的最早日期开始计算的天数。然后,将每个点的偏移量(从最早日期开始的天数)计算出来,并绘制在整数坐标轴上。在绘制后,您需要将x轴刻度标签改回日期格式...以下是代码,我提供了尽可能多的注释。希望这符合您的要求...

import datetime
df1 = pd.DataFrame({'date': ['2023-04-01', '2023-03-01', '2023-02-01', '2023-01-01', '2022-12-01'], 'year': [2023,2023,2023,2023,2022], 'month': [4,3,2,1,12], 'discharge': [10, 20, 30, 15, 25]})
df1['date'] = pd.to_datetime(df1['date'])

# 定义开始日期和结束日期
start_date = datetime.datetime(2023, 5, 1)
end_date = datetime.datetime(2023, 9, 1)

# 生成日期范围
dates = pd.date_range(start=start_date, end=end_date, freq='MS')

# 定义DataFrame的值
df2 = pd.DataFrame({'date': dates.repeat(10), 'year': [d.year for d in dates] * 10, 'month': [d.month for d in dates] * 10, 'discharge': [i+1 for i in range(len(dates))] * 10})

## 获取两个图表合并后的最早日期
begin = pd.concat([df1.date, pd.Series(df2.date.unique())]).min()

## 在DataFrame中添加linepos和boxpos列,以显示距离最早日期的偏移量
df1['linepos'] = (df1['date'] - begin).dt.days
df2['boxpos'] = (df2['date'] - begin).dt.days

## 绘制图表 - 请注意,我使用的是boxpos和linepos,而不是日期作为x轴
ax = df2[['discharge', 'boxpos']].boxplot(by='boxpos', widths=4, positions=df2.boxpos.unique(), figsize=(20,7))
ax.plot(df1['linepos'], df1['discharge'], label='Discharge')

## 将x轴限制在包括线图和箱图的范围内
ax.set_xlim([min(df2.boxpos.min(), df1.linepos.min())-10, max(df2.boxpos.max(), df1.linepos.max()) + 10])

## 要更改x轴刻度,获取所有x条目的列表并进行排序
locs = (list(df2.boxpos.unique()) + list(df1.linepos.unique()))
locs.sort()
ax.set_xticks(locs)

## 要添加标签,请获取唯一日期,进行排序,将其转换为您喜欢的格式并进行绘制
ax.set_xticklabels(pd.concat([df1.date, pd.Series(df2.date.unique())]).sort_values().reset_index(drop=True).dt.strftime('%Y-%m-%d'), rotation=45)

## 设置x和y标签
ax.set_xlabel('Dates')
ax.set_ylabel('Discharge')

如何在同一图表上绘制折线图和箱线图,而 x 轴是日期。

英文:

The reason for the error is because pos has just one value for df2['date'][2]. But, even if you manually gave the other values using unique(), it would not work. There are a couple of issues. One is that you need to use datetime for both the line and box plots. I am assuming you want the dates to be incrementally increasing as well as the y-axis would need to be the same for both plots.

To do this, you will need to first convert each of the points (for both plots) to have an integer which would be the number of days from the earliest date for either plot. Then, the offset (number of days from the earliest date) would be calculated for each of the points and plotted on an integer axis. Post potting, you will need to change the x-ticks back to date format... Below is the code and I have provided as many comments as possible. Hope this is what you are looking for...

import datetime
df1 = pd.DataFrame({'date': ['2023-04-01', '2023-03-01', '2023-02-01', '2023-01-01', '2022-12-01'], 'year': [2023,2023,2023,2023,2022], 'month': [4,3,2,1,12], 'discharge': [10, 20, 30, 15, 25]})
df1['date']=pd.to_datetime(df1['date'])
# Define the start and end dates
start_date = datetime.datetime(2023, 5, 1)
end_date = datetime.datetime(2023, 9, 1)
# Generate the date range
dates = pd.date_range(start=start_date, end=end_date, freq='MS')
# Define the values for the DataFrame
df2  = pd.DataFrame({'date': dates.repeat(10), 'year': [d.year for d in dates] * 10, 'month': [d.month for d in dates] * 10, 'discharge': [i+1 for i in range(len(dates))] * 10})
## Get the earliest date in BOTH plots combined
begin=pd.concat([df1.date, pd.Series(df2.date.unique())]).min()
## Add columns linepos and boxpos to the dataframes to show offset from earliest date
df1['linepos']=(df1['date']-begin).dt.days
df2['boxpos']=(df2['date']-begin).dt.days
## Plot plots - note I am using boxpos and linepos, not dates for x-axis
ax=df2[['discharge', 'boxpos']].boxplot(by='boxpos', widths=4, positions=df2.boxpos.unique(), figsize=(20,7))
ax.plot(df1['linepos'], df1['discharge'], label='Discharge')
## Set x-lim to include both line and boxes
ax.set_xlim( [ min(df2.boxpos.min(), df1.linepos.min())-10, max(df2.boxpos.max(), df1.linepos.max()) + 10 ] )
## To change the x-axis ticks, get the list of all x-entries and sort
locs=(list(df2.boxpos.unique())+list(df1.linepos.unique()))
locs.sort()
ax.set_xticks(locs)
## To add labels get unique dates, sort them, convert to format you like and plot
ax.set_xticklabels(pd.concat([df1.date, pd.Series(df2.date.unique())]).sort_values().reset_index(drop=True).dt.strftime('%Y-%m-%d'), rotation=45 )
## Set x and y labels
ax.set_xlabel('Dates')
ax.set_ylabel('Discharge')

如何在同一图表上绘制折线图和箱线图,而 x 轴是日期。

huangapple
  • 本文由 发表于 2023年5月15日 00:13:46
  • 转载请务必保留本文链接:https://go.coder-hub.com/76248489.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定