英文:
Why are my time differences not coming out as expected in R?
问题
我使用了来自R包track2KBA
的数据集,其中包含了一种海鸟物种的追踪数据。我想要测量每次重新定位之间的时间差,根据个体鸟类分组。
但是当我运行我的脚本时,我得到的时间差不是我所期望的。例如,第一个时间差应该是6秒。
这是我的代码:
library(track2KBA)
library(tidyverse)
library(lubridate)
boobies$datetime <- (paste(boobies$date_gmt, boobies$time_gmt))
boobies <- boobies %>%
mutate(datetime = lubridate::ymd_hms(datetime)) %>%
group_by(track_id) %>%
arrange(datetime) %>%
mutate(difference = datetime - lag(datetime))
以下是来自该包的一些示例数据:
boobies <- structure(list(track_id = c(69303L, 69302L, 69303L, 69302L, 69303L, 69302L), date_gmt = c("2012-07-21", "2012-07-21", "2012-07-21", "2012-07-21", "2012-07-21", "2012-07-21"), time_gmt = c("11:01:54", "11:02:00", "11:03:33", "11:03:42", "11:05:13", "11:05:26"), longitude = c(-5.72769, -5.72639, -5.72769, -5.72635, -5.72769, -5.72639), latitude = c(-16.00749, -16.00713, -16.00749, -16.00723, -16.00749, -16.0071), lon_colony = c(-5.73, -5.73, -5.73, -5.73, -5.73, -5.73), lat_colony = c(-16.01, -16.01, -16.01, -16.01, -16.01, -16.01), datetime = c("2012-07-21 11:01:54", "2012-07-21 11:02:00", "2012-07-21 11:03:33", "2012-07-21 11:03:42", "2012-07-21 11:05:13", "2012-07-21 11:05:26")), .internal.selfref = <pointer: (nil)>, row.names = c(NA, 6L), class = c("data.table", "data.frame"))
英文:
I'm using a dataset from an R package track2KBA
which has tracking data on a seabird species. I want to measure the time difference between each relocation grouped by the individual bird.
But when I run my script I don't get the differences I'd expect. For instance, the first difference should be 6 seconds.
track_id date_gmt time_gmt longitude latitude lon_colony lat_colony datetime difference
<int> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <dttm> <drtn>
1 69303 2012-07-21 11:01:54 -5.73 -16.0 -5.73 -16.0 2012-07-21 11:01:54 NA secs
2 69302 2012-07-21 11:02:00 -5.73 -16.0 -5.73 -16.0 2012-07-21 11:02:00 NA secs
3 69303 2012-07-21 11:03:33 -5.73 -16.0 -5.73 -16.0 2012-07-21 11:03:33 99 secs
4 69302 2012-07-21 11:03:42 -5.73 -16.0 -5.73 -16.0 2012-07-21 11:03:42 102 secs
5 69303 2012-07-21 11:05:13 -5.73 -16.0 -5.73 -16.0 2012-07-21 11:05:13 100 secs
6 69302 2012-07-21 11:05:26 -5.73 -16.0 -5.73 -16.0 2012-07-21 11:05:26 104 secs
Here's my code:
library(track2KBA)
library(tidyverse)
library(lubridate)
boobies$datetime <-
(paste(boobies$date_gmt, boobies$time_gmt))
boobies <- boobies %>%
mutate(datetime = lubridate::ymd_hms(datetime)) %>%
group_by(track_id) %>%
arrange(datetime) %>%
mutate(difference = datetime - lag(datetime))
And some sample data which comes from the package:
boobies <- structure(list(track_id = c(69303L, 69302L, 69303L, 69302L, 69303L,
69302L), date_gmt = c("2012-07-21", "2012-07-21", "2012-07-21",
"2012-07-21", "2012-07-21", "2012-07-21"), time_gmt = c("11:01:54",
"11:02:00", "11:03:33", "11:03:42", "11:05:13", "11:05:26"),
longitude = c(-5.72769, -5.72639, -5.72769, -5.72635, -5.72769,
-5.72639), latitude = c(-16.00749, -16.00713, -16.00749,
-16.00723, -16.00749, -16.0071), lon_colony = c(-5.73, -5.73,
-5.73, -5.73, -5.73, -5.73), lat_colony = c(-16.01, -16.01,
-16.01, -16.01, -16.01, -16.01), datetime = c("2012-07-21 11:01:54",
"2012-07-21 11:02:00", "2012-07-21 11:03:33", "2012-07-21 11:03:42",
"2012-07-21 11:05:13", "2012-07-21 11:05:26")), .internal.selfref = <pointer: (nil)>, row.names = c(NA, 6L), class = c("data.table", "data.frame"))
答案1
得分: 1
数据存在问题。您得到的答案(在差异列的开头有两个NA)似乎是正确的,因为前两行是第一个两个track_id
(我假设对应于鸟类)的前两个数据点。第一个数据点没有点可以参考,因此它们都是NA。
不管怎样,有两种方法可以做到这一点:分组和非分组
library(tidyverse)
# 没有按track_id分组(这样可以得到您寻找的6秒差异)
mutate(boobies, difference = difftime(datetime, lag(datetime), units = "secs"))
# 输出
track_id date_gmt time_gmt longitude latitude lon_colony lat_colony
1 69303 2012-07-21 11:01:54 -5.73 -16.0 -5.73 -16.0
2 69302 2012-07-21 11:02:00 -5.73 -16.0 -5.73 -16.0
3 69303 2012-07-21 11:03:33 -5.73 -16.0 -5.73 -16.0
4 69302 2012-07-21 11:03:42 -5.73 -16.0 -5.73 -16.0
5 69303 2012-07-21 11:05:13 -5.73 -16.0 -5.73 -16.0
6 69302 2012-07-21 11:05:26 -5.73 -16.0 -5.73 -16.0
datetime difference
<dttm> <drtn>
1 2012-07-21 11:01:54 NA secs
2 2012-07-21 11:02:00 6 secs
3 2012-07-21 11:03:33 93 secs
4 2012-07-21 11:03:42 9 secs
5 2012-07-21 11:05:13 91 secs
6 2012-07-21 11:05:26 13 secs
# 按track_id分组
mutate(boobies, difference = difftime(datetime, lag(datetime), units = "secs"), .by = track_id)
# 输出:
# A tibble: 6 × 9
track_id date_gmt time_gmt longitude latitude lon_colony lat_colony
1 69303 2012-07-21 11:01:54 -5.73 -16.0 -5.73 -16.0
2 69302 2012-07-21 11:02:00 -5.73 -16.0 -5.73 -16.0
3 69303 2012-07-21 11:03:33 -5.73 -16.0 -5.73 -16.0
4 69302 2012-07-21 11:03:42 -5.73 -16.0 -5.73 -16.0
5 69303 2012-07-21 11:05:13 -5.73 -16.0 -5.73 -16.0
6 69302 2012-07-21 11:05:26 -5.73 -16.0 -5.73 -16.0
datetime difference
<dttm> <drtn>
1 2012-07-21 11:01:54 NA secs
2 2012-07-21 11:02:00 NA secs
3 2012-07-21 11:03:33 99 secs
4 2012-07-21 11:03:42 102 secs
5 2012-07-21 11:05:13 100 secs
6 2012-07-21 11:05:26 104 secs
英文:
There's an issue with your data. The answer you got (with two NAs at the start of the difference column) is the correct one (it seems), because the first two rows are the first two data points for the first two track_id
s (which I presume correspond to birds). The first points don't have a point to refer back to, hence them both being NA.
Anyway, here are the two ways of doing it: grouped and non grouped
library(tidyverse)
# not grouped by track_id (this gets the 6 second difference you were looking for)
mutate(boobies, difference = difftime(datetime, lag(datetime), units = "secs"))
# Output
track_id date_gmt time_gmt longitude latitude lon_colony lat_colony
<int> <date> <chr> <dbl> <dbl> <dbl> <dbl>
1 69303 2012-07-21 11:01:54 -5.73 -16.0 -5.73 -16.0
2 69302 2012-07-21 11:02:00 -5.73 -16.0 -5.73 -16.0
3 69303 2012-07-21 11:03:33 -5.73 -16.0 -5.73 -16.0
4 69302 2012-07-21 11:03:42 -5.73 -16.0 -5.73 -16.0
5 69303 2012-07-21 11:05:13 -5.73 -16.0 -5.73 -16.0
6 69302 2012-07-21 11:05:26 -5.73 -16.0 -5.73 -16.0
datetime difference
<dttm> <drtn>
1 2012-07-21 11:01:54 NA secs
2 2012-07-21 11:02:00 6 secs
3 2012-07-21 11:03:33 93 secs
4 2012-07-21 11:03:42 9 secs
5 2012-07-21 11:05:13 91 secs
6 2012-07-21 11:05:26 13 secs
# grouped by track_id
mutate(boobies, difference = difftime(datetime, lag(datetime), units = "secs"), .by = track_id)
# Output:
# A tibble: 6 × 9
track_id date_gmt time_gmt longitude latitude lon_colony lat_colony
<int> <date> <chr> <dbl> <dbl> <dbl> <dbl>
1 69303 2012-07-21 11:01:54 -5.73 -16.0 -5.73 -16.0
2 69302 2012-07-21 11:02:00 -5.73 -16.0 -5.73 -16.0
3 69303 2012-07-21 11:03:33 -5.73 -16.0 -5.73 -16.0
4 69302 2012-07-21 11:03:42 -5.73 -16.0 -5.73 -16.0
5 69303 2012-07-21 11:05:13 -5.73 -16.0 -5.73 -16.0
6 69302 2012-07-21 11:05:26 -5.73 -16.0 -5.73 -16.0
datetime difference
<dttm> <drtn>
1 2012-07-21 11:01:54 NA secs
2 2012-07-21 11:02:00 NA secs
3 2012-07-21 11:03:33 99 secs
4 2012-07-21 11:03:42 102 secs
5 2012-07-21 11:05:13 100 secs
6 2012-07-21 11:05:26 104 secs
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论