英文:
check whether event occurred in 30-second intervals
问题
我有包含事件ID和事件发生时间戳的数据集。例如,2019年9月2日17:06。我想构建一个具有两个状态(无事件和事件)的马尔可夫链模型。为了避免构建连续时间的马尔可夫链,我想将时间段分为30秒,并检查在这30秒内是否发生事件。也许有人可以帮助我在R中如何实现它?谢谢!
我已经准备好了日期格式,并计算了两个事件之间的时间以及两个事件之间发生了多少次无事件。
data$timestamp <- as.POSIXct(data$timestamp, format="%m/%d/%Y %H:%M:%S")
nrow <- nrow(data)
for (i in 2:nrow) {
data$diff[i] <- difftime(data$timestamp[i], data$timestamp[i-1], units="secs")
}
data$NUm <- round(data$diff/30)
英文:
I have the data set with event ID and timestamp when this event happened. For example at 9/2/2019 17:06. I want to build Markov chain model with two states noevent and event. To avoid building continuous time Markov chain, I want to split the period by 30 second and checking if in those 30 seconds event happened or not. Maybe someone could help me how to do it in R? Thank you!
I only prepared the date format and calculated the time between two events as well how many no events happened between two events.
data$timestamp <- as.POSIXct(data$timestamp,format="%m/%d/%Y %H:%M:%S")
nrow <- nrow(data)
for (i in 2:nrow) {
data$diff[i] <- difftime(data$timestamp[i], data$timestamp[i-1], units="secs")}
data$NUm <-round(data$diff/30)
答案1
得分: 0
tidyverse solution
使用lubridate::floor_date()
来将时间戳舍入到30秒的间隔,并使用tidyr::complete()
来填充没有事件的间隔:
library(dplyr)
library(tidyr)
library(lubridate)
data %>%
mutate(timestamp = floor_date(timestamp, "30 seconds")) %>%
complete(timestamp = full_seq(timestamp, 30)) %>%
mutate(
event = ifelse(!is.na(id), "yes", "no"),
.keep = "unused"
)
# A tibble: 8 × 2
timestamp event
<dttm> <chr>
1 2023-02-19 10:01:00 yes
2 2023-02-19 10:01:30 no
3 2023-02-19 10:02:00 yes
4 2023-02-19 10:02:30 no
5 2023-02-19 10:03:00 no
6 2023-02-19 10:03:30 no
7 2023-02-19 10:04:00 no
8 2023-02-19 10:04:30 yes
Base R solution
与上面的逻辑类似,使用基本函数:
times <- as.POSIXlt(data$timestamp)
times$sec <- ifelse(times$sec < 30, 0, 30)
intervals <- seq(min(times), max(times), by = 30)
data.frame(
intervals,
event = ifelse(intervals %in% as.POSIXct(times), "yes", "no")
)
intervals event
1 2023-02-19 10:01:00 yes
2 2023-02-19 10:01:30 no
3 2023-02-19 10:02:00 yes
4 2023-02-19 10:02:30 no
5 2023-02-19 10:03:00 no
6 2023-02-19 10:03:30 no
7 2023-02-19 10:04:00 no
8 2023-02-19 10:04:30 yes
示例数据
在未来,最好在您的问题中包含示例数据。对于这些解决方案,我使用了以下示例数据:
data <- data.frame(
id = 1:3,
timestamp = as.POSIXct(c(
"2023-02-19 10:01:23",
"2023-02-19 10:02:01",
"2023-02-19 10:04:45"
))
)
英文:
tidyverse solution
Use lubridate::floor_date()
to round to 30-second intervals and tidyr::complete()
to fill in intervals with no events:
library(dplyr)
library(tidyr)
library(lubridate)
data %>%
mutate(timestamp = floor_date(timestamp, "30 seconds")) %>%
complete(timestamp = full_seq(timestamp, 30)) %>%
mutate(
event = ifelse(!is.na(id), "yes", "no"),
.keep = "unused"
)
# A tibble: 8 × 2
timestamp event
<dttm> <chr>
1 2023-02-19 10:01:00 yes
2 2023-02-19 10:01:30 no
3 2023-02-19 10:02:00 yes
4 2023-02-19 10:02:30 no
5 2023-02-19 10:03:00 no
6 2023-02-19 10:03:30 no
7 2023-02-19 10:04:00 no
8 2023-02-19 10:04:30 yes
Base R solution
Similar logic as above, using base functions:
times <- as.POSIXlt(data$timestamp)
times$sec <- ifelse(times$sec < 30, 0, 30)
intervals <- seq(min(times), max(times), by = 30)
data.frame(
intervals,
event = ifelse(intervals %in% as.POSIXct(times), "yes", "no")
)
intervals event
1 2023-02-19 10:01:00 yes
2 2023-02-19 10:01:30 no
3 2023-02-19 10:02:00 yes
4 2023-02-19 10:02:30 no
5 2023-02-19 10:03:00 no
6 2023-02-19 10:03:30 no
7 2023-02-19 10:04:00 no
8 2023-02-19 10:04:30 yes
Example data
In the future, it’s best if you include example data in your question. See How to make a great R reproducible example. For these solutions, I used:
data <- data.frame(
id = 1:3,
timestamp = as.POSIXct(c(
"2023-02-19 10:01:23",
"2023-02-19 10:02:01",
"2023-02-19 10:04:45"
))
)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论