为什么我的时间差在R中没有按预期显示?

huangapple go评论75阅读模式
英文:

Why are my time differences not coming out as expected in R?

问题

我使用了来自R包track2KBA的数据集,其中包含了一种海鸟物种的追踪数据。我想要测量每次重新定位之间的时间差,根据个体鸟类分组。

但是当我运行我的脚本时,我得到的时间差不是我所期望的。例如,第一个时间差应该是6秒。

这是我的代码:

library(track2KBA)
library(tidyverse)
library(lubridate)

boobies$datetime <- (paste(boobies$date_gmt, boobies$time_gmt))

boobies <- boobies %>%
  mutate(datetime = lubridate::ymd_hms(datetime)) %>%
  group_by(track_id) %>%
  arrange(datetime) %>%
  mutate(difference = datetime - lag(datetime))

以下是来自该包的一些示例数据:

boobies <- structure(list(track_id = c(69303L, 69302L, 69303L, 69302L, 69303L, 69302L), date_gmt = c("2012-07-21", "2012-07-21", "2012-07-21", "2012-07-21", "2012-07-21", "2012-07-21"), time_gmt = c("11:01:54", "11:02:00", "11:03:33", "11:03:42", "11:05:13", "11:05:26"), longitude = c(-5.72769, -5.72639, -5.72769, -5.72635, -5.72769, -5.72639), latitude = c(-16.00749, -16.00713, -16.00749, -16.00723, -16.00749, -16.0071), lon_colony = c(-5.73, -5.73, -5.73, -5.73, -5.73, -5.73), lat_colony = c(-16.01, -16.01, -16.01, -16.01, -16.01, -16.01), datetime = c("2012-07-21 11:01:54", "2012-07-21 11:02:00", "2012-07-21 11:03:33", "2012-07-21 11:03:42", "2012-07-21 11:05:13", "2012-07-21 11:05:26")), .internal.selfref = <pointer: (nil)>, row.names = c(NA, 6L), class = c("data.table", "data.frame"))
英文:

I'm using a dataset from an R package track2KBA which has tracking data on a seabird species. I want to measure the time difference between each relocation grouped by the individual bird.

But when I run my script I don't get the differences I'd expect. For instance, the first difference should be 6 seconds.

   track_id date_gmt   time_gmt longitude latitude lon_colony lat_colony datetime            difference
      &lt;int&gt; &lt;chr&gt;      &lt;chr&gt;        &lt;dbl&gt;    &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt; &lt;dttm&gt;              &lt;drtn&gt;    
 1    69303 2012-07-21 11:01:54     -5.73    -16.0      -5.73      -16.0 2012-07-21 11:01:54  NA secs  
 2    69302 2012-07-21 11:02:00     -5.73    -16.0      -5.73      -16.0 2012-07-21 11:02:00  NA secs  
 3    69303 2012-07-21 11:03:33     -5.73    -16.0      -5.73      -16.0 2012-07-21 11:03:33  99 secs  
 4    69302 2012-07-21 11:03:42     -5.73    -16.0      -5.73      -16.0 2012-07-21 11:03:42 102 secs  
 5    69303 2012-07-21 11:05:13     -5.73    -16.0      -5.73      -16.0 2012-07-21 11:05:13 100 secs  
 6    69302 2012-07-21 11:05:26     -5.73    -16.0      -5.73      -16.0 2012-07-21 11:05:26 104 secs  

Here's my code:

library(track2KBA)
library(tidyverse)
library(lubridate)

boobies$datetime &lt;-
 (paste(boobies$date_gmt, boobies$time_gmt))

boobies &lt;- boobies %&gt;%
  mutate(datetime = lubridate::ymd_hms(datetime)) %&gt;%
  group_by(track_id) %&gt;%
  arrange(datetime) %&gt;%
  mutate(difference = datetime - lag(datetime))

And some sample data which comes from the package:

    boobies &lt;- structure(list(track_id = c(69303L, 69302L, 69303L, 69302L, 69303L, 
69302L), date_gmt = c(&quot;2012-07-21&quot;, &quot;2012-07-21&quot;, &quot;2012-07-21&quot;, 
&quot;2012-07-21&quot;, &quot;2012-07-21&quot;, &quot;2012-07-21&quot;), time_gmt = c(&quot;11:01:54&quot;, 
&quot;11:02:00&quot;, &quot;11:03:33&quot;, &quot;11:03:42&quot;, &quot;11:05:13&quot;, &quot;11:05:26&quot;), 
    longitude = c(-5.72769, -5.72639, -5.72769, -5.72635, -5.72769, 
    -5.72639), latitude = c(-16.00749, -16.00713, -16.00749, 
    -16.00723, -16.00749, -16.0071), lon_colony = c(-5.73, -5.73, 
    -5.73, -5.73, -5.73, -5.73), lat_colony = c(-16.01, -16.01, 
    -16.01, -16.01, -16.01, -16.01), datetime = c(&quot;2012-07-21 11:01:54&quot;, 
    &quot;2012-07-21 11:02:00&quot;, &quot;2012-07-21 11:03:33&quot;, &quot;2012-07-21 11:03:42&quot;, 
    &quot;2012-07-21 11:05:13&quot;, &quot;2012-07-21 11:05:26&quot;)), .internal.selfref = &lt;pointer: (nil)&gt;, row.names = c(NA, 6L), class = c(&quot;data.table&quot;, &quot;data.frame&quot;))

答案1

得分: 1

数据存在问题。您得到的答案(在差异列的开头有两个NA)似乎是正确的,因为前两行是第一个两个track_id(我假设对应于鸟类)的前两个数据点。第一个数据点没有点可以参考,因此它们都是NA。

不管怎样,有两种方法可以做到这一点:分组和非分组

library(tidyverse)

# 没有按track_id分组(这样可以得到您寻找的6秒差异)

mutate(boobies, difference = difftime(datetime, lag(datetime), units = "secs"))

# 输出
  track_id date_gmt   time_gmt longitude latitude lon_colony lat_colony
1    69303 2012-07-21 11:01:54     -5.73    -16.0      -5.73      -16.0
2    69302 2012-07-21 11:02:00     -5.73    -16.0      -5.73      -16.0
3    69303 2012-07-21 11:03:33     -5.73    -16.0      -5.73      -16.0
4    69302 2012-07-21 11:03:42     -5.73    -16.0      -5.73      -16.0
5    69303 2012-07-21 11:05:13     -5.73    -16.0      -5.73      -16.0
6    69302 2012-07-21 11:05:26     -5.73    -16.0      -5.73      -16.0
  datetime            difference
  <dttm>              <drtn>    
1 2012-07-21 11:01:54 NA secs   
2 2012-07-21 11:02:00  6 secs   
3 2012-07-21 11:03:33 93 secs   
4 2012-07-21 11:03:42  9 secs   
5 2012-07-21 11:05:13 91 secs   
6 2012-07-21 11:05:26 13 secs   

# 按track_id分组

mutate(boobies, difference = difftime(datetime, lag(datetime), units = "secs"), .by = track_id)

# 输出:
# A tibble: 6 × 9
  track_id date_gmt   time_gmt longitude latitude lon_colony lat_colony
1    69303 2012-07-21 11:01:54     -5.73    -16.0      -5.73      -16.0
2    69302 2012-07-21 11:02:00     -5.73    -16.0      -5.73      -16.0
3    69303 2012-07-21 11:03:33     -5.73    -16.0      -5.73      -16.0
4    69302 2012-07-21 11:03:42     -5.73    -16.0      -5.73      -16.0
5    69303 2012-07-21 11:05:13     -5.73    -16.0      -5.73      -16.0
6    69302 2012-07-21 11:05:26     -5.73    -16.0      -5.73      -16.0
  datetime            difference
  <dttm>              <drtn>    
1 2012-07-21 11:01:54  NA secs  
2 2012-07-21 11:02:00  NA secs  
3 2012-07-21 11:03:33  99 secs  
4 2012-07-21 11:03:42 102 secs  
5 2012-07-21 11:05:13 100 secs  
6 2012-07-21 11:05:26 104 secs  
英文:

There's an issue with your data. The answer you got (with two NAs at the start of the difference column) is the correct one (it seems), because the first two rows are the first two data points for the first two track_ids (which I presume correspond to birds). The first points don't have a point to refer back to, hence them both being NA.

Anyway, here are the two ways of doing it: grouped and non grouped

library(tidyverse)
# not grouped by track_id (this gets the 6 second difference you were looking for)
mutate(boobies, difference = difftime(datetime, lag(datetime), units = &quot;secs&quot;))
# Output
track_id date_gmt   time_gmt longitude latitude lon_colony lat_colony
&lt;int&gt; &lt;date&gt;     &lt;chr&gt;        &lt;dbl&gt;    &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;
1    69303 2012-07-21 11:01:54     -5.73    -16.0      -5.73      -16.0
2    69302 2012-07-21 11:02:00     -5.73    -16.0      -5.73      -16.0
3    69303 2012-07-21 11:03:33     -5.73    -16.0      -5.73      -16.0
4    69302 2012-07-21 11:03:42     -5.73    -16.0      -5.73      -16.0
5    69303 2012-07-21 11:05:13     -5.73    -16.0      -5.73      -16.0
6    69302 2012-07-21 11:05:26     -5.73    -16.0      -5.73      -16.0
datetime            difference
&lt;dttm&gt;              &lt;drtn&gt;    
1 2012-07-21 11:01:54 NA secs   
2 2012-07-21 11:02:00  6 secs   
3 2012-07-21 11:03:33 93 secs   
4 2012-07-21 11:03:42  9 secs   
5 2012-07-21 11:05:13 91 secs   
6 2012-07-21 11:05:26 13 secs   
# grouped by track_id
mutate(boobies, difference = difftime(datetime, lag(datetime), units = &quot;secs&quot;), .by = track_id)
# Output:
# A tibble: 6 &#215; 9
track_id date_gmt   time_gmt longitude latitude lon_colony lat_colony
&lt;int&gt; &lt;date&gt;     &lt;chr&gt;        &lt;dbl&gt;    &lt;dbl&gt;      &lt;dbl&gt;      &lt;dbl&gt;
1    69303 2012-07-21 11:01:54     -5.73    -16.0      -5.73      -16.0
2    69302 2012-07-21 11:02:00     -5.73    -16.0      -5.73      -16.0
3    69303 2012-07-21 11:03:33     -5.73    -16.0      -5.73      -16.0
4    69302 2012-07-21 11:03:42     -5.73    -16.0      -5.73      -16.0
5    69303 2012-07-21 11:05:13     -5.73    -16.0      -5.73      -16.0
6    69302 2012-07-21 11:05:26     -5.73    -16.0      -5.73      -16.0
datetime            difference
&lt;dttm&gt;              &lt;drtn&gt;    
1 2012-07-21 11:01:54  NA secs  
2 2012-07-21 11:02:00  NA secs  
3 2012-07-21 11:03:33  99 secs  
4 2012-07-21 11:03:42 102 secs  
5 2012-07-21 11:05:13 100 secs  
6 2012-07-21 11:05:26 104 secs  

huangapple
  • 本文由 发表于 2023年7月24日 19:00:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753816.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定