英文:
How to obtain specific data based on a condition in a list in R?
问题
我有一个动物数据集,它们被无线电追踪了一年。然而,无线电追踪事件不均匀,有时动物每周被追踪3次,有时每月只有一次。
在这张图片中,我有3种不同动物的ID。如果你看动物ID 0-10,你可以看到在值列下有一个57。这意味着连续追踪日之间有57天的间隔。
这是代码:
date = c("2015-05-01","2015-05-04","2015-05-05","2015-07-01","2015-07-02","2015-07-05","2015-07-06",
"2015-05-01","2015-05-04","2015-05-05","2015-05-27","2015-05-28","2015-06-05","2015-06-06",
"2015-05-01","2015-05-02","2015-05-03","2015-05-04","2015-05-05","2015-05-06","2015-05-07")
ID = c("0-10","0-10","0-10","0-10","0-10","0-10","0-10",
"0-2","0-2","0-2","0-2","0-2","0-2","0-2",
"0-8","0-8","0-8","0-8","0-8","0-8","0-8")
data2015v2 = data.frame(date,ID)
data2015v2$date = as.Date(data2015v2$date)
delta.days.2015 = with(data2015v2, tapply(date, ID, FUN = function (x) as.integer(diff(x))))
我想知道哪些动物有大于14天的间隔,而不必一个个列表查看。我认为我需要使用循环,但我不知道如何设置一个。感谢任何帮助。
英文:
I have a dataset of animals that were radio-tracked for a year. However, the radio-tracking events were uneven and sometimes animals were tracked 3 times a week, and sometimes only once a month.
I have provided a dummy dataset of relevant columns.
In this picture, I have the IDs of 3 different animals. If you take animal ID 0-10, you can see that under the value column there is a 57. This means there is a gap of 57 days between consecutive tracking days.
The code is as follows:
date = c("2015-05-01","2015-05-04","2015-05-05","2015-07-01","2015-07-02","2015-07-05","2015-07-06",
"2015-05-01","2015-05-04","2015-05-05","2015-05-27","2015-05-28","2015-06-05","2015-06-06",
"2015-05-01","2015-05-02","2015-05-03","2015-05-04","2015-05-05","2015-05-06","2015-05-07")
ID = c("0-10","0-10","0-10","0-10","0-10","0-10","0-10",
"0-2","0-2","0-2","0-2","0-2","0-2","0-2",
"0-8","0-8","0-8","0-8","0-8","0-8","0-8")
data2015v2 = data.frame(date,ID)
data2015v2$date = as.Date(data2015v2$date)
delta.days.2015 = with(data2015v2, tapply(date, ID, FUN = function (x) as.integer(diff(x))))
I want to know which animals have gaps longer than 14 days, without having to go over each list one by one. I think I need to use a loop, but I don't know how to set up one. Any help is appreciated.
答案1
得分: 1
使用dplyr
,您可以根据动物ID进行分组,并使用summarize
函数来仅包括最大间隔(以天为单位)的数据:
library(dplyr)
library(lubridate)
data2015v2 %>%
mutate(date = ymd(date)) %>%
group_by(ID) %>%
summarize(max_gap = max(date - lag(date), na.rm = TRUE))
#> # A tibble: 3 × 2
#> ID max_gap
#> <chr> <drtn>
#> 1 0-10 57 days
#> 2 0-2 22 days
#> 3 0-8 1 days
要使结果数据框仅包括监测间隔超过14天的ID,您可以在max_gap
列上使用filter
:
data2015v2 %>%
mutate(date = ymd(date)) %>%
group_by(ID) %>%
summarize(max_gap = max(date - lag(date), na.rm = TRUE)) %>%
filter(max_gap > 14)
#> # A tibble: 2 × 2
#> ID max_gap
#> <chr> <drtn>
#> 1 0-10 57 days
#> 2 0-2 22 days
创建于2023-05-11,使用reprex v2.0.2
英文:
With dplyr
you could group_by
animal ID, and summarize
the data to include just the maximum gap (in days):
library(dplyr)
library(lubridate)
data2015v2 %>%
mutate(date = ymd(date)) %>%
group_by(ID) %>%
summarize(max_gap = max(date - lag(date), na.rm = TRUE))
#> # A tibble: 3 × 2
#> ID max_gap
#> <chr> <drtn>
#> 1 0-10 57 days
#> 2 0-2 22 days
#> 3 0-8 1 days
To have the resulting data frame only include IDs where there was a gap in monitoring that exceeded 14 days, you could filter
on the max_gap column:
data2015v2 %>%
mutate(date = ymd(date)) %>%
group_by(ID) %>%
summarize(max_gap = max(date - lag(date), na.rm = TRUE)) %>%
filter(max_gap > 14)
#> # A tibble: 2 × 2
#> ID max_gap
#> <chr> <drtn>
#> 1 0-10 57 days
#> 2 0-2 22 days
<sup>Created on 2023-05-11 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论