英文:
How to obtain specific data based on a condition in a list in R?
问题
我有一个动物数据集,它们被无线电追踪了一年。然而,无线电追踪事件不均匀,有时动物每周被追踪3次,有时每月只有一次。
在这张图片中,我有3种不同动物的ID。如果你看动物ID 0-10,你可以看到在值列下有一个57。这意味着连续追踪日之间有57天的间隔。
这是代码:
date = c("2015-05-01","2015-05-04","2015-05-05","2015-07-01","2015-07-02","2015-07-05","2015-07-06",
 "2015-05-01","2015-05-04","2015-05-05","2015-05-27","2015-05-28","2015-06-05","2015-06-06",
 "2015-05-01","2015-05-02","2015-05-03","2015-05-04","2015-05-05","2015-05-06","2015-05-07")
ID = c("0-10","0-10","0-10","0-10","0-10","0-10","0-10",
 "0-2","0-2","0-2","0-2","0-2","0-2","0-2",
 "0-8","0-8","0-8","0-8","0-8","0-8","0-8")
data2015v2 = data.frame(date,ID)
data2015v2$date = as.Date(data2015v2$date)
delta.days.2015 = with(data2015v2, tapply(date, ID, FUN = function (x) as.integer(diff(x))))
我想知道哪些动物有大于14天的间隔,而不必一个个列表查看。我认为我需要使用循环,但我不知道如何设置一个。感谢任何帮助。
英文:
I have a dataset of animals that were radio-tracked for a year. However, the radio-tracking events were uneven and sometimes animals were tracked 3 times a week, and sometimes only once a month.
I have provided a dummy dataset of relevant columns.
In this picture, I have the IDs of 3 different animals. If you take animal ID 0-10, you can see that under the value column there is a 57. This means there is a gap of 57 days between consecutive tracking days.
The code is as follows:
    date = c("2015-05-01","2015-05-04","2015-05-05","2015-07-01","2015-07-02","2015-07-05","2015-07-06",
     "2015-05-01","2015-05-04","2015-05-05","2015-05-27","2015-05-28","2015-06-05","2015-06-06",
     "2015-05-01","2015-05-02","2015-05-03","2015-05-04","2015-05-05","2015-05-06","2015-05-07")
    ID = c("0-10","0-10","0-10","0-10","0-10","0-10","0-10",
   "0-2","0-2","0-2","0-2","0-2","0-2","0-2",
   "0-8","0-8","0-8","0-8","0-8","0-8","0-8")
    data2015v2 = data.frame(date,ID)
    data2015v2$date = as.Date(data2015v2$date)
    delta.days.2015 = with(data2015v2, tapply(date, ID, FUN = function (x) as.integer(diff(x))))
I want to know which animals have gaps longer than 14 days, without having to go over each list one by one. I think I need to use a loop, but I don't know how to set up one. Any help is appreciated.
答案1
得分: 1
使用dplyr,您可以根据动物ID进行分组,并使用summarize函数来仅包括最大间隔(以天为单位)的数据:
library(dplyr)
library(lubridate)
data2015v2 %>%
  mutate(date = ymd(date)) %>%
  group_by(ID) %>%
  summarize(max_gap = max(date - lag(date), na.rm = TRUE))
#> # A tibble: 3 × 2
#>   ID    max_gap
#>   <chr> <drtn> 
#> 1 0-10  57 days
#> 2 0-2   22 days
#> 3 0-8   1 days
要使结果数据框仅包括监测间隔超过14天的ID,您可以在max_gap列上使用filter:
data2015v2 %>%
  mutate(date = ymd(date)) %>%
  group_by(ID) %>%
  summarize(max_gap = max(date - lag(date), na.rm = TRUE)) %>%
  filter(max_gap > 14)
#> # A tibble: 2 × 2
#>   ID    max_gap
#>   <chr> <drtn> 
#> 1 0-10  57 days
#> 2 0-2   22 days
创建于2023-05-11,使用reprex v2.0.2
英文:
With dplyr you could group_by animal ID, and summarize the data to include just the maximum gap (in days):
library(dplyr)
library(lubridate)
data2015v2 %>%
  mutate(date = ymd(date)) %>%
  group_by(ID) %>%
  summarize(max_gap = max(date - lag(date), na.rm = TRUE))
#> # A tibble: 3 × 2
#>   ID    max_gap
#>   <chr> <drtn> 
#> 1 0-10  57 days
#> 2 0-2   22 days
#> 3 0-8    1 days
To have the resulting data frame only include IDs where there was a gap in monitoring that exceeded 14 days, you could filter on the max_gap column:
data2015v2 %>%
  mutate(date = ymd(date)) %>%
  group_by(ID) %>%
  summarize(max_gap = max(date - lag(date), na.rm = TRUE)) %>%
  filter(max_gap > 14)
#> # A tibble: 2 × 2
#>   ID    max_gap
#>   <chr> <drtn> 
#> 1 0-10  57 days
#> 2 0-2   22 days
<sup>Created on 2023-05-11 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。



评论