2023年6月19日 21:14:01go评论96阅读模式

英文:

Can I use a boolean mask to find if a DateTime value falls between two other DateTime values in a different dataframe

问题

我想筛选我的数据点，直到只剩下参与者处于睡眠状态的数据点。我有一个包含日期时间值和我正在研究的值的数据框，还有一个不同的数据框，其中包含参与者开始睡觉和结束睡觉的时间。我想知道是否有一种方法可以通过迭代大数据框或开始和结束睡眠时间的数据框，或者是否有其他更好的方法，而不必手动输入175个晚上的时间来实现这一点。

开始/停止数据框如下，我为每个参与者都有一个：

df_sleep1:

日期         开始时间          结束时间
5/30/2023	5/29/2023 22:15	5/30/2023 7:22
5/31/2023	5/30/2023 23:19	5/31/2023 6:46
6/1/2023	6/1/2023 0:02	    6/1/2023 8:31

包含所有数据的数据框如下，我想添加一个 "asleep" 列：

df:

    	日期时间	        心率         参与者      睡着了
0	2023-05-29 23:44:00	76.0             1
1	2023-05-30 06:44:00	76.0             1
2	2023-05-30 20:45:00	84.0             1
3	2023-05-31 04:45:00	84.0             2
4	2023-06-1 20:46:00	81.0             2

我尝试过的方法：

dt = df['日期时间'].to_numpy()
start1 = df_sleep1['开始时间'].to_numpy()[:, None]
end1 = df_sleep1['结束时间'].to_numpy()[:, None]
    
mask1 = ((start1 <= dt) & (dt <= end1) & (df['参与者'] == 1))
df['睡着了'] = mask1.any(axis=0)

def sleepFunction(row):
    if (df_sleep1['开始时间'] <= dt) & (dt <= df_sleep1['结束时间']) & (df['参与者'] == 1):
        return True
    else:
        return False
df['睡眠状态'] = df.apply(lambda row: sleepFunction(row), axis = 1)

两者都会引发类似于数据框/数组形状不匹配的错误，这也不是我想要的方式。

英文:

I want to filter the datapoints I have, until I only have the datapoints were the participant was asleep left. I have my dataframe with DateTime values and the values I am researching, and a different dataframe that has when the participant started sleeping, and when they ended sleeping. Instead of having to write out every start and stop time in a boolean mask, I wanted to know if there is a way to do this by iterating over either the big dataframe or the dataframe with the start and end sleeping times, or any other better way than having to manually enter 175 nights.

The start/stop dataframe looks like this, I have one for every participant:

df_sleep1:

date	        start     	stop
5/30/2023	5/29/2023 22:15	5/30/2023 7:22
5/31/2023	5/30/2023 23:19	5/31/2023 6:46
6/1/2023	6/1/2023 0:02	6/1/2023 8:31

The dataframe with all the data looks like this, where I want to add an "asleep" column:

df:

    	DateTime	        HeartRate        Participant      Asleep
0	2023-05-29 23:44:00	76.0             1
1	2023-05-30 06:44:00	76.0             1
2	2023-05-30 20:45:00	84.0             1
3	2023-05-31 04:45:00	84.0             2
4	2023-06-1 20:46:00	81.0             2

What I have tried:

dt = df[&#39;DateTime&#39;].to_numpy()
start1 = df_sleep1[&#39;Start&#39;].to_numpy()[:, None]
end1 = df_sleep1[&#39;Stop&#39;].to_numpy()[:, None]
    
mask1 = ((start1 &lt;= dt) &amp; (dt &lt;= end1) &amp; (df[&#39;Participant&#39;] == 1))
df[&#39;Sleep&#39;] = mask1.any(axis=0)

def sleepFunction(row):
    if (df_sleep1[&#39;Start&#39;] &lt;= dt) &amp; (dt &lt;= df_sleep1[&#39;Stop&#39;]) &amp; (df[&#39;Participant&#39;] == 1):
        return True
    else:
        return False
df[&#39;sleepState&#39;] = df.apply(lambda row: sleepFunction(row), axis = 1)

Both give similar errors about the shapes of the dataframes/arrays not matching up, which is not something I want to do anyways.

答案1

得分: 0

如果时间间隔不重叠，一个高效的方法是使用merge_asof：按参与者按开始日期合并，然后确保日期在结束日期之后。

# 将参与者ID映射到df_sleep DataFrame
all_sleep = pd.concat({1: df_sleep1}, names=['Participant']).reset_index(level=0)
# 确保具有日期时间类型
all_sleep[['start', 'stop']] = all_sleep[['start', 'stop']].apply(pd.to_datetime)
df['DateTime'] = pd.to_datetime(df['DateTime'])
# 按日期和参与者合并
df['Asleep'] = (
 pd.merge_asof(df.sort_values(by='DateTime').reset_index(),
               all_sleep.sort_values(by='start'),
               left_on='DateTime', right_on='start',
               by='Participant'
              )
   .assign(Asleep=lambda d: d['DateTime'].le(d['stop']))
   .set_index('index')['Asleep']
)

输出：

             DateTime  HeartRate  Participant  Asleep
0 2023-05-29 23:44:00       76.0            1    True
1 2023-05-30 06:44:00       76.0            1    True
2 2023-05-30 20:45:00       84.0            1   False
3 2023-05-31 04:45:00       84.0            2   False
4 2023-06-01 20:46:00       81.0            2   False

英文:

If the intervals are non-overlapping, an efficient method would be to use a merge_asof: merge on the starting date by participant, then ensure that the date is after the end.

# map the Participant ID to the df_sleep DataFrame
all_sleep = pd.concat({1: df_sleep1}, names=[&#39;Participant&#39;]).reset_index(level=0)
# ensure having datetime types
all_sleep[[&#39;start&#39;, &#39;stop&#39;]] = all_sleep[[&#39;start&#39;, &#39;stop&#39;]].apply(pd.to_datetime)
df[&#39;DateTime&#39;] = pd.to_datetime(df[&#39;DateTime&#39;])
# merge by date and participant
df[&#39;Asleep&#39;] = (
 pd.merge_asof(df.sort_values(by=&#39;DateTime&#39;).reset_index(),
               all_sleep.sort_values(by=&#39;start&#39;),
               left_on=&#39;DateTime&#39;, right_on=&#39;start&#39;,
               by=&#39;Participant&#39;
              )
   .assign(Asleep=lambda d: d[&#39;DateTime&#39;].le(d[&#39;stop&#39;]))
   .set_index(&#39;index&#39;)[&#39;Asleep&#39;]
)

Output:

             DateTime  HeartRate  Participant  Asleep
0 2023-05-29 23:44:00       76.0            1    True
1 2023-05-30 06:44:00       76.0            1    True
2 2023-05-30 20:45:00       84.0            1   False
3 2023-05-31 04:45:00       84.0            2   False
4 2023-06-01 20:46:00       81.0            2   False

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Can I use a boolean mask to find if a DateTime value falls between two other DateTime values in a different dataframe

问题

答案1

如何更新PowerShell日期变量中的年份

奇怪的 write()/文件行为

如何在同一图表上绘制折线图和箱线图，而 x 轴是日期。

AttributeError: 导入Dask时，模块’pandas.core.strings’没有’StringMethods’属性。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。