英文:
How to find overlapping time start and end points?
问题
以下是您要求的部分翻译:
我想要找到每个ID的12:00:00之前的最早测量时间和12:00:00之后的最晚测量时间,以便选择最大重叠的起始和结束时间。这是示例数据:
import numpy as np
import pandas as pd
import random
df = pd.DataFrame({'DATE_TIME': pd.date_range('2022-11-01', '2022-11-06 23:00:00', freq='20min'),
'ID': [random.randrange(1, 20) for n in range(430)]})
df['VALUE1'] = [random.randrange(110, 140) for n in range(430)]
df['VALUE2'] = [random.randrange(50, 60) for n in range(430)]
df['VALUE3'] = [random.randrange(80, 100) for n in range(430)]
df['VALUE4'] = [random.randrange(30, 50) for n in range(430)]
df['MODEL'] = [random.randrange(1, 3) for n in range(430)]
df['SOLD'] = [random.randrange(0, 2) for n in range(430)]
df['INSPECTION'] = df['DATE_TIME'].dt.day
df['MODE'] = np.select([df['INSPECTION'] == 1, df['INSPECTION'].isin([2, 3])], ['A', 'B'], 'C')
df['TIME'] = df['DATE_TIME'].dt.time
# df['TIME'] = pd.to_timedelta(df['TIME'])
df['TIME'] = df['TIME'].astype('str')
# 创建白天和夜晚列-------------------------------------------------------------------------
def cycle_day_period(dataframe: pd.DataFrame, midnight='00:00:00', start_of_morning='06:00:00',
start_of_afternoon='13:00:00',
start_of_evening='18:00:00', end_of_evening='23:00:00', start_of_night='24:00:00'):
bins = [midnight, start_of_morning, start_of_afternoon, start_of_evening, end_of_evening, start_of_night]
labels = ['Night', 'Morning', 'Morning', 'Night', 'Night']
return pd.cut(
pd.to_timedelta(dataframe),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
df['CYCLE_PART'] = cycle_day_period(df['TIME'], '00:00:00', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00')
我期望找到类似于图片中相同日期24小时测量的T_start和T_end。请参考图片以获得问题的更清晰描述。
英文:
I would like to find for each ID, earliest measurement time before 12:00:00 and latest measurement time after 12:00:00. So that I can choose maximum overlapping start and ending time. Here is the sample data:
import numpy as np
import pandas as pd
import random
df = pd.DataFrame({'DATE_TIME': pd.date_range('2022-11-01', '2022-11-06 23:00:00', freq='20min'),
'ID': [random.randrange(1, 20) for n in range(430)]})
df['VALUE1'] = [random.randrange(110, 140) for n in range(430)]
df['VALUE2'] = [random.randrange(50, 60) for n in range(430)]
df['VALUE3'] = [random.randrange(80, 100) for n in range(430)]
df['VALUE4'] = [random.randrange(30, 50) for n in range(430)]
df['MODEL'] = [random.randrange(1, 3) for n in range(430)]
df['SOLD'] = [random.randrange(0, 2) for n in range(430)]
df['INSPECTION'] = df['DATE_TIME'].dt.day
df['MODE'] = np.select([df['INSPECTION'] == 1, df['INSPECTION'].isin([2, 3])], ['A', 'B'], 'C')
df['TIME'] = df['DATE_TIME'].dt.time
# df['TIME'] = pd.to_timedelta(df['TIME'])
df['TIME'] = df['TIME'].astype('str')
# Create DAY Night columns only-------------------------------------------------------------------------
def cycle_day_period(dataframe: pd.DataFrame, midnight='00:00:00', start_of_morning='06:00:00',
start_of_afternoon='13:00:00',
start_of_evening='18:00:00', end_of_evening='23:00:00', start_of_night='24:00:00'):
bins = [midnight, start_of_morning, start_of_afternoon, start_of_evening, end_of_evening, start_of_night]
labels = ['Night', 'Morning', 'Morning', 'Night', 'Night']
return pd.cut(
pd.to_timedelta(dataframe),
bins=list(map(pd.Timedelta, bins)),
labels=labels, right=False, ordered=False
)
df['CYCLE_PART'] = cycle_day_period(df['TIME'], '00:00:00', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00')
My expectation is to find T_start and T_end like (for a same day 24h measurement) in the picture. Please refer to the drawing since my wording of the problem might be confusing:
答案1
得分: 2
以下是您要翻译的内容:
-
What you want is unclear, but assuming you want to get the min and max Times that is present in all groups, first
groupby.agg
to get the min/max per group. Thenaggregate
again this time getting the max of the minima and min of the maxima:您想要的不太清楚,但假设您想获取所有组中存在的最小和最大时间,首先使用
groupby.agg
获取每个组的最小/最大值。然后再次使用aggregate
,这次获取最小值的最大值和最大值的最小值: -
If you really need to filter the value before after
12:00:00
:如果您确实需要在
12:00:00
之前或之后过滤值: -
Output:
输出:
-
Intermediate:
中间结果:
英文:
What you want is unclear, but assuming you want to get the min and max Times that is present in all groups, first groupby.agg
to get the min/max per group. Then aggregate
again this time getting the max of the minima and min of the maxima:
df.groupby('ID')['TIME'].agg(['min', 'max']).agg({'min': 'max', 'max': 'min'})
If you really need to filter the value before after 12:00:00
:
(df.groupby('ID')['TIME']
.agg(min=lambda x: x[x.lt('12:00:00')].min(),
max=lambda x: x[x.gt('12:00:00')].max())
.agg({'min': 'max', 'max': 'min'})
)
Output:
min 07:00:00
max 19:40:00
dtype: object
Intermediate:
df.groupby('ID')['TIME'].agg(['min', 'max'])
min max
ID
1 00:40:00 20:00:00
2 02:20:00 23:40:00
3 00:20:00 23:40:00
4 01:20:00 23:20:00
5 00:00:00 22:40:00
6 02:00:00 21:40:00
7 00:20:00 23:20:00
8 00:40:00 19:40:00 # min of maxima: 19:40:00
9 00:40:00 22:40:00
10 00:20:00 23:20:00
11 00:00:00 22:00:00
12 02:20:00 23:40:00
13 01:00:00 22:40:00
14 00:00:00 23:00:00
15 00:00:00 23:00:00
16 01:00:00 23:40:00
17 00:00:00 22:40:00
18 00:00:00 22:00:00
19 07:00:00 23:00:00 # max of minima: 07:00:00
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论