问题

以下是您要求的部分翻译：

我想要找到每个ID的12:00:00之前的最早测量时间和12:00:00之后的最晚测量时间，以便选择最大重叠的起始和结束时间。这是示例数据：

import numpy as np
import pandas as pd
import random

df = pd.DataFrame({'DATE_TIME': pd.date_range('2022-11-01', '2022-11-06 23:00:00', freq='20min'),
                   'ID': [random.randrange(1, 20) for n in range(430)]})

df['VALUE1'] = [random.randrange(110, 140) for n in range(430)]
df['VALUE2'] = [random.randrange(50, 60) for n in range(430)]
df['VALUE3'] = [random.randrange(80, 100) for n in range(430)]
df['VALUE4'] = [random.randrange(30, 50) for n in range(430)]

df['MODEL'] = [random.randrange(1, 3) for n in range(430)]

df['SOLD'] = [random.randrange(0, 2) for n in range(430)]

df['INSPECTION'] = df['DATE_TIME'].dt.day

df['MODE'] = np.select([df['INSPECTION'] == 1, df['INSPECTION'].isin([2, 3])], ['A', 'B'], 'C')

df['TIME'] = df['DATE_TIME'].dt.time
# df['TIME'] = pd.to_timedelta(df['TIME'])
df['TIME'] = df['TIME'].astype('str')


# 创建白天和夜晚列-------------------------------------------------------------------------
def cycle_day_period(dataframe: pd.DataFrame, midnight='00:00:00', start_of_morning='06:00:00',
                     start_of_afternoon='13:00:00',
                     start_of_evening='18:00:00', end_of_evening='23:00:00', start_of_night='24:00:00'):
    bins = [midnight, start_of_morning, start_of_afternoon, start_of_evening, end_of_evening, start_of_night]
    labels = ['Night', 'Morning', 'Morning', 'Night', 'Night']

    return pd.cut(
        pd.to_timedelta(dataframe),
        bins=list(map(pd.Timedelta, bins)),
        labels=labels, right=False, ordered=False
    )


df['CYCLE_PART'] = cycle_day_period(df['TIME'], '00:00:00', '06:00:00', '13:00:00', '18:00:00', '23:00:00', '24:00:00')

我期望找到类似于图片中相同日期24小时测量的T_start和T_end。请参考图片以获得问题的更清晰描述。

英文:

I would like to find for each ID, earliest measurement time before 12:00:00 and latest measurement time after 12:00:00. So that I can choose maximum overlapping start and ending time. Here is the sample data:

import numpy as np
import pandas as pd
import random

df = pd.DataFrame({&#39;DATE_TIME&#39;: pd.date_range(&#39;2022-11-01&#39;, &#39;2022-11-06 23:00:00&#39;, freq=&#39;20min&#39;),
                   &#39;ID&#39;: [random.randrange(1, 20) for n in range(430)]})

df[&#39;VALUE1&#39;] = [random.randrange(110, 140) for n in range(430)]
df[&#39;VALUE2&#39;] = [random.randrange(50, 60) for n in range(430)]
df[&#39;VALUE3&#39;] = [random.randrange(80, 100) for n in range(430)]
df[&#39;VALUE4&#39;] = [random.randrange(30, 50) for n in range(430)]

df[&#39;MODEL&#39;] = [random.randrange(1, 3) for n in range(430)]

df[&#39;SOLD&#39;] = [random.randrange(0, 2) for n in range(430)]

df[&#39;INSPECTION&#39;] = df[&#39;DATE_TIME&#39;].dt.day

df[&#39;MODE&#39;] = np.select([df[&#39;INSPECTION&#39;] == 1, df[&#39;INSPECTION&#39;].isin([2, 3])], [&#39;A&#39;, &#39;B&#39;], &#39;C&#39;)

df[&#39;TIME&#39;] = df[&#39;DATE_TIME&#39;].dt.time
# df[&#39;TIME&#39;] = pd.to_timedelta(df[&#39;TIME&#39;])
df[&#39;TIME&#39;] = df[&#39;TIME&#39;].astype(&#39;str&#39;)


# Create DAY Night columns only-------------------------------------------------------------------------
def cycle_day_period(dataframe: pd.DataFrame, midnight=&#39;00:00:00&#39;, start_of_morning=&#39;06:00:00&#39;,
                     start_of_afternoon=&#39;13:00:00&#39;,
                     start_of_evening=&#39;18:00:00&#39;, end_of_evening=&#39;23:00:00&#39;, start_of_night=&#39;24:00:00&#39;):
    bins = [midnight, start_of_morning, start_of_afternoon, start_of_evening, end_of_evening, start_of_night]
    labels = [&#39;Night&#39;, &#39;Morning&#39;, &#39;Morning&#39;, &#39;Night&#39;, &#39;Night&#39;]

    return pd.cut(
        pd.to_timedelta(dataframe),
        bins=list(map(pd.Timedelta, bins)),
        labels=labels, right=False, ordered=False
    )


df[&#39;CYCLE_PART&#39;] = cycle_day_period(df[&#39;TIME&#39;], &#39;00:00:00&#39;, &#39;06:00:00&#39;, &#39;13:00:00&#39;, &#39;18:00:00&#39;, &#39;23:00:00&#39;, &#39;24:00:00&#39;)

My expectation is to find T_start and T_end like (for a same day 24h measurement) in the picture. Please refer to the drawing since my wording of the problem might be confusing:

答案1

得分: 2

以下是您要翻译的内容：

What you want is unclear, but assuming you want to get the min and max Times that is present in all groups, first groupby.agg to get the min/max per group. Then aggregate again this time getting the max of the minima and min of the maxima:

您想要的不太清楚，但假设您想获取所有组中存在的最小和最大时间，首先使用 groupby.agg 获取每个组的最小/最大值。然后再次使用 aggregate，这次获取最小值的最大值和最大值的最小值：
If you really need to filter the value before after 12:00:00:

如果您确实需要在 12:00:00 之前或之后过滤值：
Output:

输出：
Intermediate:

中间结果：

英文:

What you want is unclear, but assuming you want to get the min and max Times that is present in all groups, first groupby.agg to get the min/max per group. Then aggregate again this time getting the max of the minima and min of the maxima:

df.groupby(&#39;ID&#39;)[&#39;TIME&#39;].agg([&#39;min&#39;, &#39;max&#39;]).agg({&#39;min&#39;: &#39;max&#39;, &#39;max&#39;: &#39;min&#39;})

If you really need to filter the value before after 12:00:00:

(df.groupby(&#39;ID&#39;)[&#39;TIME&#39;]
.agg(min=lambda x: x[x.lt(&#39;12:00:00&#39;)].min(),
max=lambda x: x[x.gt(&#39;12:00:00&#39;)].max())
.agg({&#39;min&#39;: &#39;max&#39;, &#39;max&#39;: &#39;min&#39;})
)

Output:

min    07:00:00
max    19:40:00
dtype: object

Intermediate:

df.groupby(&#39;ID&#39;)[&#39;TIME&#39;].agg([&#39;min&#39;, &#39;max&#39;])
min       max
ID                    
1   00:40:00  20:00:00
2   02:20:00  23:40:00
3   00:20:00  23:40:00
4   01:20:00  23:20:00
5   00:00:00  22:40:00
6   02:00:00  21:40:00
7   00:20:00  23:20:00
8   00:40:00  19:40:00  # min of maxima: 19:40:00
9   00:40:00  22:40:00
10  00:20:00  23:20:00
11  00:00:00  22:00:00
12  02:20:00  23:40:00
13  01:00:00  22:40:00
14  00:00:00  23:00:00
15  00:00:00  23:00:00
16  01:00:00  23:40:00
17  00:00:00  22:40:00
18  00:00:00  22:00:00
19  07:00:00  23:00:00  # max of minima: 07:00:00

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何找到重叠的时间起始点和结束点？

问题

答案1

用pandas将2020年和2021年数据的平均值替换2020行的数值。

如何从图表中网页抓取数据

Running python script (importing spacy) from Java using Runtime.exec

如何在PyQt5中停止一个线程

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论