在每个分组中添加两行。日期的开始和结束。

huangapple go评论65阅读模式
英文:

Add two rows per Group. Start and end of date

问题

我的数据集包含了护士在医院房间内的扫描时间。我的目标是确定房间无人看护的时间有多少分钟。

首先,我需要为每个房间每天创建2行新记录。这些新记录应该表示一天的开始和结束(00:00:00 和 23:59:59)。创建这两个记录的逻辑是,例如,如果2023-04-24的最后一次扫描是在下午3点,房间1,我们想知道从2023-04-24 23:59:59到2023-04-24 3:00:00经过了多少分钟。

我需要的结果如下:

    Room              ScanDT NewRecord
1  room1 2023-04-24 00:00:00       New
2  room1 2023-04-24 10:08:38       Old
3  room1 2023-04-24 10:09:36       Old
4  room1 2023-04-24 11:54:35       Old
5  room1 2023-04-24 23:59:59       New
6  room1 2023-05-24 00:00:00       New
7  room1 2023-05-24 13:51:10       Old
8  room1 2023-05-24 23:59:59       New
9  room2 2023-04-24 00:00:00       New
10 room2 2023-04-24 18:51:10       Old
11 room2 2023-04-24 20:51:10       Old
12 room2 2023-04-24 23:23:23       New

样本数据如下:

df <- data.frame(Room = c('room1', 'room1', 'room1', 'room1',
                          'room2', 'room2'),
                 ScanDT = as.POSIXct(c('2023-04-24 10:08:38', '2023-04-24 10:09:36', '2023-04-24 11:54:35', '2023-05-24 13:51:10',
                                       '2023-04-24 18:51:10', '2023-04-24 20:51:10')))
英文:

My dataset contains scan times by nurses in hospital rooms. My goal is to determine how many minutes the the room was unattended.

The first thing I need to do is to create 2 new rows per room for each day. The new records should represent the start and end of the day (00:00:00 and 23:59:59). The logic for creating these two records is that for example if the last scan for 2023-04-24 was at 3pm for Room1, we would like to know how many minutes had passed from 2023-04-24 23:59:59 to 2023-04-24 3:00:00.

   Room              ScanDT
1 room1 2023-04-24 10:08:38
2 room1 2023-04-24 10:09:36
3 room1 2023-04-24 11:54:35
4 room1 2023-05-24 13:51:10
5 room2 2023-04-24 18:51:10
6 room2 2023-04-24 20:51:10

What I need:

    Room              ScanDT NewRecord
1  room1 2023-04-24 00:00:00       New
2  room1 2023-04-24 10:08:38       Old
3  room1 2023-04-24 10:09:36       Old
4  room1 2023-04-24 11:54:35       Old
5  room1 2023-04-24 23:59:59       New
6  room1 2023-05-24 00:00:00       New
7  room1 2023-05-24 13:51:10       Old
8  room1 2023-05-24 23:59:59       New
9  room2 2023-04-24 00:00:00       New
10 room2 2023-04-24 18:51:10       Old
11 room2 2023-04-24 20:51:10       Old
12 room2 2023-04-24 23:23:23       New

Sample data

df &lt;- data.frame(Room = c(&#39;room1&#39;, &#39;room1&#39;, &#39;room1&#39;, &#39;room1&#39;,
                          &#39;room2&#39;, &#39;room2&#39;),
                 ScanDT = as.POSIXct(c(&#39;2023-04-24 10:08:38&#39;, &#39;2023-04-24 10:09:36&#39;, &#39;2023-04-24 11:54:35&#39;, &#39;2023-05-24 13:51:10&#39;,
                                       &#39;2023-04-24 18:51:10&#39;, &#39;2023-04-24 20:51:10&#39;)))

答案1

得分: 1

以下是使用 data.table 包的代码尝试:

library(data.table)
setDT(df)
## 添加一个扫描ID以展示如何保留它
df[, ScanID := .I]

df[, ScanDate := as.POSIXct(trunc(ScanDT, units="days"))]
dfout <- df[, .(
            ScanID=c(NA,ScanID,NA),
            ScanDT = c(ScanDate, ScanDT, ScanDate + as.difftime(1, units="days") - 1)), 
            by=.(Room, ScanDate)]
dfout
##      Room   ScanDate ScanID              ScanDT
##    <char>     <POSc>  <int>              <POSc>
## 1:  room1 2023-04-24     NA 2023-04-24 00:00:00
## 2:  room1 2023-04-24      1 2023-04-24 10:08:38
## 3:  room1 2023-04-24      2 2023-04-24 10:09:36
## 4:  room1 2023-04-24      3 2023-04-24 11:54:35
## 5:  room1 2023-04-24     NA 2023-04-24 23:59:59
## 6:  room1 2023-05-24     NA 2023-05-24 00:00:00
## 7:  room1 2023-05-24      4 2023-05-24 13:51:10
## 8:  room1 2023-05-24     NA 2023-05-24 23:59:59
## 9:  room2 2023-04-24     NA 2023-04-24 00:00:00
##10:  room2 2023-04-24      5 2023-04-24 18:51:10
##11:  room2 2023-04-24      6 2023-04-24 20:51:10
##12:  room2 2023-04-24     NA 2023-04-24 23:59:59

如有需要,可以进一步计算差异:

dfout[, ScanDTdiff := c(NA, `units<-`(diff(ScanDT), "mins")), by=.(Room, ScanDate)]

##      Room   ScanDate ScanID              ScanDT        ScanDTdiff
##    <char>     <POSc>  <int>              <POSc>        <difftime>
## 1:  room1 2023-04-24     NA 2023-04-24 00:00:00           NA mins
## 2:  room1 2023-04-24      1 2023-04-24 10:08:38  608.6333333 mins
## 3:  room1 2023-04-24      2 2023-04-24 10:09:36    0.9666667 mins
## 4:  room1 2023-04-24      3 2023-04-24 11:54:35  104.9833333 mins
## 5:  room1 2023-04-24     NA 2023-04-24 23:59:59  725.4000000 mins
## 6:  room1 2023-05-24     NA 2023-05-24 00:00:00           NA mins
## 7:  room1 2023-05-24      4 2023-05-24 13:51:10  831.1666667 mins
## 8:  room1 2023-05-24     NA 2023-05-24 23:59:59  608.8166667 mins
## 9:  room2 2023-04-24     NA 2023-04-24 00:00:00           NA mins
##10:  room2 2023-04-24      5 2023-04-24 18:51:10 1131.1666667 mins
##11:  room2 2023-04-24      6 2023-04-24 20:51:10  120.0000000 mins
##12:  room2 2023-04-24     NA 2023-04-24 23:59:59  188.8166667 mins
英文:

Here's an attempt using the data.table package:

library(data.table)
setDT(df)
## add a scan id to show how to keep it
df[, ScanID := .I]

df[, ScanDate := as.POSIXct(trunc(ScanDT, units=&quot;days&quot;))]
dfout &lt;- df[, .(
            ScanID=c(NA,ScanID,NA),
            ScanDT = c(ScanDate, ScanDT, ScanDate + as.difftime(1, units=&quot;days&quot;) - 1)), 
            by=.(Room, ScanDate)]
dfout
##      Room   ScanDate ScanID              ScanDT
##    &lt;char&gt;     &lt;POSc&gt;  &lt;int&gt;              &lt;POSc&gt;
## 1:  room1 2023-04-24     NA 2023-04-24 00:00:00
## 2:  room1 2023-04-24      1 2023-04-24 10:08:38
## 3:  room1 2023-04-24      2 2023-04-24 10:09:36
## 4:  room1 2023-04-24      3 2023-04-24 11:54:35
## 5:  room1 2023-04-24     NA 2023-04-24 23:59:59
## 6:  room1 2023-05-24     NA 2023-05-24 00:00:00
## 7:  room1 2023-05-24      4 2023-05-24 13:51:10
## 8:  room1 2023-05-24     NA 2023-05-24 23:59:59
## 9:  room2 2023-04-24     NA 2023-04-24 00:00:00
##10:  room2 2023-04-24      5 2023-04-24 18:51:10
##11:  room2 2023-04-24      6 2023-04-24 20:51:10
##12:  room2 2023-04-24     NA 2023-04-24 23:59:59

Further calculations of differences if required:

dfout[, ScanDTdiff := c(NA, `units&lt;-`(diff(ScanDT), &quot;mins&quot;)), by=.(Room, ScanDate)]

##      Room   ScanDate ScanID              ScanDT        ScanDTdiff
##    &lt;char&gt;     &lt;POSc&gt;  &lt;int&gt;              &lt;POSc&gt;        &lt;difftime&gt;
## 1:  room1 2023-04-24     NA 2023-04-24 00:00:00           NA mins
## 2:  room1 2023-04-24      1 2023-04-24 10:08:38  608.6333333 mins
## 3:  room1 2023-04-24      2 2023-04-24 10:09:36    0.9666667 mins
## 4:  room1 2023-04-24      3 2023-04-24 11:54:35  104.9833333 mins
## 5:  room1 2023-04-24     NA 2023-04-24 23:59:59  725.4000000 mins
## 6:  room1 2023-05-24     NA 2023-05-24 00:00:00           NA mins
## 7:  room1 2023-05-24      4 2023-05-24 13:51:10  831.1666667 mins
## 8:  room1 2023-05-24     NA 2023-05-24 23:59:59  608.8166667 mins
## 9:  room2 2023-04-24     NA 2023-04-24 00:00:00           NA mins
##10:  room2 2023-04-24      5 2023-04-24 18:51:10 1131.1666667 mins
##11:  room2 2023-04-24      6 2023-04-24 20:51:10  120.0000000 mins
##12:  room2 2023-04-24     NA 2023-04-24 23:59:59  188.8166667 mins

答案2

得分: 1

这样怎么样?

df <- data.frame(
    Room = c(
        'room1', 'room1', 'room1', 'room1', 'room2', 'room2'),
    ScanDT = as.POSIXct(c(
        '2023-04-24 10:08:38', '2023-04-24 10:09:36', '2023-04-24 11:54:35',
        '2023-05-24 13:51:10', '2023-04-24 18:51:10', '2023-04-24 20:51:10')))
    
df$day = as.Date(df$ScanDT)

res = do.call(rbind, by(df, df[,c("Room", "day")], function(x) {
    x$NewRecord = "Old"
    rbind(
        data.frame(
            Room=x$Room[1],
            ScanDT=as.POSIXct(paste(x$day[1], "00:00:00")),
            day=x$day[1],
            NewRecord = "New"),
        x,
        data.frame(
            Room=x$Room[1],
            ScanDT=as.POSIXct(paste(x$day[1], "23:59:59")),
            day=x$day[1],
            NewRecord = "New")
    )
}))

Output:

         Room              ScanDT        day NewRecord
    1   room1 2023-04-24 00:00:00 2023-04-24       New
    2   room1 2023-04-24 10:08:38 2023-04-24       Old
    3   room1 2023-04-24 10:09:36 2023-04-24       Old
    4   room1 2023-04-24 11:54:35 2023-04-24       Old
    5   room1 2023-04-24 23:59:59 2023-04-24       New
    12  room2 2023-04-24 00:00:00 2023-04-24       New
    51  room2 2023-04-24 18:51:10 2023-04-24       Old
    11  room2 2023-04-24 23:59:59 2023-04-24       New
    13  room2 2023-04-25 00:00:00 2023-04-25       New
    6   room2 2023-04-24 20:51:10 2023-04-25       Old
    111 room2 2023-04-25 23:59:59 2023-04-25       New
    14  room1 2023-05-24 00:00:00 2023-05-24       New
    41  room1 2023-05-24 13:51:10 2023-05-24       Old
    112 room1 2023-05-24 23:59:59 2023-05-24       New

如果需要保留排序顺序,则需要进行更多的工作。
英文:

How about this?

df &lt;- data.frame(
Room = c(
&#39;room1&#39;, &#39;room1&#39;, &#39;room1&#39;, &#39;room1&#39;, &#39;room2&#39;, &#39;room2&#39;),
ScanDT = as.POSIXct(c(
&#39;2023-04-24 10:08:38&#39;, &#39;2023-04-24 10:09:36&#39;, &#39;2023-04-24 11:54:35&#39;,
&#39;2023-05-24 13:51:10&#39;, &#39;2023-04-24 18:51:10&#39;, &#39;2023-04-24 20:51:10&#39;)))
df$day = as.Date(df$ScanDT)
res = do.call(rbind, by(df, df[,c(&quot;Room&quot;, &quot;day&quot;)], function(x) {
x$NewRecord = &quot;Old&quot;
rbind(
data.frame(
Room=x$Room[1],
ScanDT=as.POSIXct(paste(x$day[1], &quot;00:00:00&quot;)),
day=x$day[1],
NewRecord = &quot;New&quot;),
x,
data.frame(
Room=x$Room[1],
ScanDT=as.POSIXct(paste(x$day[1], &quot;23:59:59&quot;)),
day=x$day[1],
NewRecord = &quot;New&quot;)
)
}))

Output:

     Room              ScanDT        day NewRecord
1   room1 2023-04-24 00:00:00 2023-04-24       New
2   room1 2023-04-24 10:08:38 2023-04-24       Old
3   room1 2023-04-24 10:09:36 2023-04-24       Old
4   room1 2023-04-24 11:54:35 2023-04-24       Old
5   room1 2023-04-24 23:59:59 2023-04-24       New
12  room2 2023-04-24 00:00:00 2023-04-24       New
51  room2 2023-04-24 18:51:10 2023-04-24       Old
11  room2 2023-04-24 23:59:59 2023-04-24       New
13  room2 2023-04-25 00:00:00 2023-04-25       New
6   room2 2023-04-24 20:51:10 2023-04-25       Old
111 room2 2023-04-25 23:59:59 2023-04-25       New
14  room1 2023-05-24 00:00:00 2023-05-24       New
41  room1 2023-05-24 13:51:10 2023-05-24       Old
112 room1 2023-05-24 23:59:59 2023-05-24       New

If the sort order needs to be preserved, then there is a little more work to do.

huangapple
  • 本文由 发表于 2023年5月25日 04:57:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327356.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定