英文:
How to plot daily data against a 24 hour axis (00:00 - 23:59:59)
问题
我有一个包含date_time、date、time和一个名为VALUE1的列的数据集,显示了每个时间点的测量值。对于相同的ID,一天内有多次测量。此外,对于一个ID,有6种不同的24小时测量,显示在INSPECTION列中。
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
random.seed(0)
df = pd.DataFrame({'DATE_TIME': pd.date_range('2022-11-01', '2022-11-06 23:00:00', freq='20min'),
'ID': [random.randrange(1, 3) for n in range(430)]})
df['VALUE1'] = [random.uniform(110, 160) for n in range(430)]
df['VALUE2'] = [random.uniform(50, 80) for n in range(430)]
df['INSPECTION'] = df['DATE_TIME'].dt.day
df['MODE'] = np.select([df['INSPECTION'] == 1, df['INSPECTION'].isin([2, 3])], ['A', 'B'], 'C')
df['TIME'] = df['DATE_TIME'].dt.time
df['TIME'] = df['TIME'].astype('str')
df['TIMEINTERVAL'] = df.DATE_TIME.diff().astype('timedelta64[m]')
df['TIMEINTERVAL'] = df['TIMEINTERVAL'].fillna(0)
def to_day_period(s):
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Nighttime', 'Daytime', 'Daytime', 'Nighttime', 'Nighttime']
return pd.cut(
pd.to_timedelta(s),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
df['TIME_OF_DAY'] = to_day_period(df['TIME'])
df_monthly = df
# ++++++++++++++++++++++++++++++++ sns plot ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
df_id = df[df.ID==1]
sns.set_style('darkgrid')
sns.set(rc={'figure.figsize':(14,8)})
#print(df_id.INSPECTION.unique())
ax = sns.lineplot(data=df_id, x ='TIME', y = 'VALUE1',
hue='INSPECTION', palette='viridis',
legend='full', lw=3)
ax.xaxis.set_major_locator(ticker.MultipleLocator(10))
plt.legend(bbox_to_anchor=(1, 1))
plt.ylabel('VALUE1')
plt.xlabel('TIME')
plt.show()
如何在x轴上显示一个24小时的循环,而不重复显示时间?具体来说,x轴从00:40:00开始,然后再次显示00:00:00。有没有办法处理这个问题?我想在x轴上仅显示从00:00:00到23:59:00的时间,而不重复显示时间。
英文:
I have a dataset with date_time, date, time, and a VALUE1 column that shows measurement values of each time point. For the same ID, there are multiple measurements over a day. Besides, there are 6 different 24 hour measurements for an ID, which is shown in INSPECTION column.
import random
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as ticker
random.seed(0)
df = pd.DataFrame({'DATE_TIME': pd.date_range('2022-11-01', '2022-11-06 23:00:00', freq='20min'),
'ID': [random.randrange(1, 3) for n in range(430)]})
df['VALUE1'] = [random.uniform(110, 160) for n in range(430)]
df['VALUE2'] = [random.uniform(50, 80) for n in range(430)]
df['INSPECTION'] = df['DATE_TIME'].dt.day
# df['INSPECTION'] = df['INSPECTION'].replace(6, 1)
# df['INSPECTION'] = df['INSPECTION'].replace(3, 1)
df['MODE'] = np.select([df['INSPECTION'] == 1, df['INSPECTION'].isin([2, 3])], ['A', 'B'], 'C')
df['TIME'] = df['DATE_TIME'].dt.time
df['TIME'] = df['TIME'].astype('str')
df['TIMEINTERVAL'] = df.DATE_TIME.diff().astype('timedelta64[m]')
df['TIMEINTERVAL'] = df['TIMEINTERVAL'].fillna(0)
def to_day_period(s):
bins = ['0', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00']
labels = ['Nighttime', 'Daytime', 'Daytime', 'Nighttime', 'Nighttime']
return pd.cut(
pd.to_timedelta(s),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
df['TIME_OF_DAY'] = to_day_period(df['TIME'])
df_monthly = df
# ++++++++++++++++++++++++++++++++ sns plot ++++++++++++++++++++++++++++++++++++++++++++++++++++++++
df_id = df[df.ID==1]
sns.set_style('darkgrid')
sns.set(rc={'figure.figsize':(14,8)})
#print(df_id.INSPECTION.unique())
ax = sns.lineplot(data=df_id, x ='TIME', y = 'VALUE1',
hue='INSPECTION', palette='viridis',
legend='full', lw=3)
ax.xaxis.set_major_locator(ticker.MultipleLocator(10))
plt.legend(bbox_to_anchor=(1, 1))
plt.ylabel('VALUE1')
plt.xlabel('TIME')
plt.show()
How can I show a 24 hours day cycle on the x-axis without repeating the time again? To articulate, x-axis starts at 00:40:00 and then it shows 00:00:00 again. Is there a way to deal with this too? I want to show only time from 00:00:00 until 23:59:00 on the x-axis without repeating the time.
答案1
得分: 1
- 创建一个表示给定日期总秒数的列,将用作 x 轴,并确保每个给定 'INSPECTION' 的点正确定位。
- 给定特定日期,从当前日期减去午夜的时间,并使用
.total_seconds()
方法。 df.DATE_TIME.apply(lambda row: (row - row.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds())
- 给定特定日期,从当前日期减去午夜的时间,并使用
- 设置刻度为每小时。
ax.xaxis.set_major_locator(tkr.MultipleLocator(3600))
- 创建一个包含每个小时的列表,将用作标签。
['']
是下一天 '00:00' 的最后一个刻度。hours = [dtime(i).strftime('%H:%M') for i in range(24)] + ['']
- 这也可以使用
fig, (ax1, ax2) = plt.subplots(2, 1)
来实现,但这只是与问题无关的外观更改。- 有关在子图中绘图的其他详细信息,请参阅如何在多个子图中绘图。
- 使用
sns.move_legend
移动 seaborn 图例,而不是plt.legend
,请参考将 seaborn 绘图图例移动到不同位置。 - 与在
ax
中使用面向对象的接口,即matplotlib.axes.Axes
的别名,交替使用ax
和plt
相比,更一致。
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
from datetime import time as dtime
# 假设已存在一个带有 DATE_TIME 列的 DataFrame,并且 DATE_TIME 列为日期时间数据类型
# 添加一个总秒数的列
df['total_seconds'] = df.DATE_TIME.apply(lambda row: (row - row.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds())
# 遍历每个 ID
for id_ in sorted(df.ID.unique()):
# 选择给定 id_ 的数据
data = df[df.ID.eq(id_)]
# 创建一个图
fig = plt.figure(figsize=(10, 6))
# 绘制数据
ax = sns.lineplot(data=data, x='total_seconds', y='VALUE1', hue='INSPECTION', palette='viridis', legend='full')
# 设置标题和标签
ax.set(title=f'ID: {id_}', xlabel='TIME', ylabel='VALUE1')
# 移动图例
sns.move_legend(ax, bbox_to_anchor=(1.0, 0.5), loc='center left', frameon=False)
# 限制 x 轴的范围为一天中的秒数
ax.set_xlim(0, 24 * 3600)
# 创建一天中每小时的标签,并添加一个额外的位置用于最后的刻度
hours = [dtime(i).strftime('%H:%M') for i in range(24)] + ['']
# 在每小时创建 x 刻度
ax.xaxis.set_major_locator(tkr.MultipleLocator(3600))
# 设置刻度和对应的标签;截掉额外的起始和结束刻度以匹配标签
ax.set_xticks(ticks=ax.get_xticks()[1:-1], labels=hours, rotation=90)
# 移除轴线
ax.spines[['top', 'right']].set_visible(False)
df.head()
DATE_TIME ID VALUE1 VALUE2 INSPECTION MODE TIME TIMEINTERVAL total_seconds TIME_OF_DAY
0 2022-11-01 00:00:00 2 145.003985 57.488269 1 A 00:00:00 NaT 0.0 Nighttime
1 2022-11-01 00:20:00 2 142.449613 75.888882 1 A 00:20:00 0 days 00:20:00 1200.0 Nighttime
2 2022-11-01 00:40:00 1 119.748681 70.052981 1 A 00:40:00 0 days 00:20:00 2400.0 Nighttime
3 2022-11-01 01:00:00 2 149.170848 69.793085 1 A 01:00:00 0 days 00:20:00 3600.0 Nighttime
4 2022-11-01 01:20:00 2 148.873049 56.777515 1 A 01:20:00 0 days 00:20:00 4800.0 Nighttime
英文:
- Create a column representing the total number of seconds for a given day, which will be used as the x-axis, and will ensure every point for a given
'INSPECTION'
is properly positioned.- Given a specific day, subtract the day at midnight from the current datetime, and use the
.total_seconds()
method. df.DATE_TIME.apply(lambda row: (row - row.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds())
- Given a specific day, subtract the day at midnight from the current datetime, and use the
- Set ticks to be every hour.
ax.xaxis.set_major_locator(tkr.MultipleLocator(3600))
- Create a list of every hour, which will be used as the labels.
['']
is for the last tick at'00:00'
of the next day.hours = [dtime(i).strftime('%H:%M') for i in range(24)] + ['']
- This can also be done with
fig, (ax1, ax2) = plt.subplots(2, 1)
, but that's a cosmetic change that's not relevant to the question.- See How to plot in multiple subplots for additional details about plotting in subplots.
- A seaborn legend should be move with
sns.move_legend
, notplt.legend
, as per Move seaborn plot legend to a different position. - It is more consistent to stick with the object oriented interface using
ax
, the alias formatplotlib.axes.Axes
, than to alternate betweenax
andplt
. - Tested in
python 3.11.2
,pandas 2.0.0
,matplotlib 3.7.1
,seaborn 0.12.2
import seaborn as sns
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.ticker as tkr
from datetime import time as dtime
# given the existing dataframe with the DATE_TIME column as a datetime Dtype
# add a column for total seconds
df['total_seconds'] = df.DATE_TIME.apply(lambda row: (row - row.replace(hour=0, minute=0, second=0, microsecond=0)).total_seconds())
# iterate through each ID
for id_ in sorted(df.ID.unique()):
# select the data for the given id_
data = df[df.ID.eq(id_)]
# create a figure
fig = plt.figure(figsize=(10, 6))
# plot the data
ax = sns.lineplot(data=data, x ='total_seconds', y = 'VALUE1', hue='INSPECTION', palette='viridis', legend='full')
# set the title and labels
ax.set(title=f'ID: {id_}', xlabel='TIME', ylabel='VALUE1')
# move the legend
sns.move_legend(ax, bbox_to_anchor=(1.0, 0.5), loc='center left', frameon=False)
# constrain the x-axis limits to the number of seconds in a day
ax.set_xlim(0, 24*3600)
# create labels for every hour in the day, and add an extra spot for the last tick position
hours = [dtime(i).strftime('%H:%M') for i in range(24)] + ['']
# create xticks at every hour
ax.xaxis.set_major_locator(tkr.MultipleLocator(3600))
# set the ticks and corresponding labels; cut off extra starting and ending ticks to match labels
ax.set_xticks(ticks=ax.get_xticks()[1:-1], labels=hours, rotation=90)
# remove spines
ax.spines[['top', 'right']].set_visible(False)
df.head()
DATE_TIME ID VALUE1 VALUE2 INSPECTION MODE TIME TIMEINTERVAL total_seconds TIME_OF_DAY
0 2022-11-01 00:00:00 2 145.003985 57.488269 1 A 00:00:00 NaT 0.0 Nighttime
1 2022-11-01 00:20:00 2 142.449613 75.888882 1 A 00:20:00 0 days 00:20:00 1200.0 Nighttime
2 2022-11-01 00:40:00 1 119.748681 70.052981 1 A 00:40:00 0 days 00:20:00 2400.0 Nighttime
3 2022-11-01 01:00:00 2 149.170848 69.793085 1 A 01:00:00 0 days 00:20:00 3600.0 Nighttime
4 2022-11-01 01:20:00 2 148.873049 56.777515 1 A 01:20:00 0 days 00:20:00 4800.0 Nighttime
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论