英文:
Populate Zeros into df Where Organisms Were not Counted - R
问题
我有一个数据框,如下所示,其中包含每个站点和日期的一种物种(3种独特的焦点物种)的原始计数。然而,数据框不包括在未遇到生物体时访问的站点。我有一个第二个数据框,其中包含每个站点访问的所有日期。我需要在数据框中为访问了站点但未观察到每种3种物种中的任何一种生物的情况下填充零值。
例如,站点“admin_pond”被访问了4次,但当我将数据为一种物种展开时,它显示只有3次访问,因为4次中有3次遇到了生物体。我需要相应地将第4次访问填充为零。
# 针对遇到生物体的站点的数据
mid_clean_up <- 结构(列表(日期=结构(c(19116、19116、19116、19117、
19117、19117、19123、19123、19123、19124、19124、19130、19130、
19130、19131、19131、19132、19138、19138、19139、19139、19146、
19146、19147、19147、19150、19150、19151、19151、19157、19157、
19158、19158、19166、19170、19170、19171、19171、19184、19184、
19185、19185、19191、19191、19192、19192、19206、19244、19244、
19245、19265),类=“日期”),站点= c(“wood_lab_pond”,
“wood_lab_pond”,“wood_lab_pond”,“phelps_pond”,“phelps_pond”,
“phelps_pond”,“admin_pond”,“admin_pond”,“admin_pond”,“rv_pond”,
“rv_pond”,“admin_pond”,“admin_pond”,“admin_pond”,“admin_pond”,
“admin_pond”,“admin_pond”,“wood_lab_pond”,“wood_lab_pond”,
“wood_lab_pond”,“wood_lab_pond”,“phelps_pond”,“phelps_pond”,
“phelps_pond”,“phelps_pond”,“rv_pond”,“rv_pond”,“rv_pond”,
“rv_pond”,“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,
“tryon_weber”,“vorisek_pond”,“vorisek_pond”,“vorisek_pond”,
“vorisek_pond”,“rv_pond”,“rv_pond”,“rv_pond”,“rv_pond”,“tuttle_pond”,
“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,“tryon_weber”,“tuttle_pond”,
“tuttle_pond”,“rv_pond”,“tuttle_pond”,“tuttle_pond”),species_capture = c(“pseudacris_crucifer”,
“rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
“rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
“rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
“rana_catesbeiana”,“pseudacris_crucifer”,“rana_catesbeiana”,
“rana_clamitans”,“pseudacris_crucifer”,“rana_catesbeiana”,
“pseudacris_crucifer”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_clamitans”,“rana_catesbeiana”,
“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,
“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,
“rana_clamitans”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_catesbeiana”,“rana_clamitans”),n = c(2L,2L,1L,4L,4L,7L,4L,9L,5L,
16L,1L,2L,15L,3L,3L,20L,1L,4L,22L,3L,3L,3L,10L,
6L,16L,7L,2L,5L,1L,15L,19L,22L,3L,1L,14L,8L,13L,
1L,13L,7L,29L,3L,39L,3L,31L,2L,2L,29L,1L,11L,20L,
2L)),类= c(“tbl_df”,“tbl”,“data.frame”),row.names = c(NA,
-52L))
# 针对一种物种访问的数据透视表
bull_frog_visits <- mid_clean_up %>%
选择(站点,日期,species_capture,n) %>%
筛选(species_capture ==“rana_catesbeiana”) %>%
选择(!species_capture) %>%
按站点分组() %>%
变异(n_visit = match(date,unique(date)),
n_visit = paste0(“visit_”,n_visit,sep =“”)) %>%
选择(!date) %>%
取消分组() %>%
透视更宽(names_from = c(“n_visit”
<details>
<summary>英文:</summary>
I have a df, as shown below, that has raw counts of a species (3 unique focal species) per site and date. However the df does not include sites when no organisms were encountered. I have a second df that has all the dates for each site visit. I need to populate zeros across the data frame for when sites were visited but no organism for each of the 3 species were not observed.
For example, site "admin_pond" was visited 4 times but when I pivot the data wider for one species it shows it only had 3 visits because 3 out of the 4 times organisms were encountered. I need to populate that 4th visit as a zero accordlingly.
[![one species][1]][1]
[![true visits][2]][2]
data for sites where organisms were encountered
mid_clean_up <- structure(list(date = structure(c(19116, 19116, 19116, 19117,
19117, 19117, 19123, 19123, 19123, 19124, 19124, 19130, 19130,
19130, 19131, 19131, 19132, 19138, 19138, 19139, 19139, 19146,
19146, 19147, 19147, 19150, 19150, 19151, 19151, 19157, 19157,
19158, 19158, 19166, 19170, 19170, 19171, 19171, 19184, 19184,
19185, 19185, 19191, 19191, 19192, 19192, 19206, 19244, 19244,
19245, 19265, 19265), class = "Date"), site = c("wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "admin_pond", "admin_pond", "admin_pond", "rv_pond",
"rv_pond", "admin_pond", "admin_pond", "admin_pond", "admin_pond",
"admin_pond", "admin_pond", "wood_lab_pond", "wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "phelps_pond", "rv_pond", "rv_pond", "rv_pond",
"rv_pond", "tuttle_pond", "tuttle_pond", "tuttle_pond", "tuttle_pond",
"tryon_weber", "vorisek_pond", "vorisek_pond", "vorisek_pond",
"vorisek_pond", "rv_pond", "rv_pond", "rv_pond", "rv_pond", "tuttle_pond",
"tuttle_pond", "tuttle_pond", "tuttle_pond", "tryon_weber", "tuttle_pond",
"tuttle_pond", "rv_pond", "tuttle_pond", "tuttle_pond"), species_capture = c("pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "pseudacris_crucifer", "rana_catesbeiana",
"rana_clamitans", "pseudacris_crucifer", "rana_catesbeiana",
"pseudacris_crucifer", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_catesbeiana",
"rana_clamitans"), n = c(2L, 2L, 1L, 4L, 4L, 7L, 4L, 9L, 5L,
16L, 1L, 2L, 15L, 3L, 3L, 20L, 1L, 4L, 22L, 3L, 3L, 3L, 10L,
6L, 16L, 7L, 2L, 5L, 1L, 15L, 19L, 22L, 3L, 1L, 14L, 8L, 13L,
1L, 13L, 7L, 29L, 3L, 39L, 3L, 31L, 2L, 2L, 29L, 1L, 11L, 20L,
2L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-52L))
pivot table for visits for one species
bull_frog_visits <- mid_clean_up %>%
select(site, date, species_capture, n) %>%
filter(species_capture == "rana_catesbeiana") %>%
select(!species_capture) %>%
group_by(site) %>%
mutate(n_visit = match(date, unique(date)),
n_visit = paste0("visit_", n_visit, sep = "")) %>%
select(!date) %>%
ungroup() %>%
pivot_wider(names_from = c("n_visit"), values_from = c("n"))%>%
#add_row(site = "phelps_pond", capture_type = "recapture") %>%
group_by(site) %>%
mutate(across(contains("visit"),
~ifelse(is.na(.) &
!is.na(lag(.)), 0, .)))
All sites Visited data
raw_visits <- structure(list(site = c("wood_lab_pond", "phelps_pond", "admin_pond",
"rv_pond", "admin_pond", "admin_pond", "admin_pond", "wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "rv_pond", "rv_pond", "tuttle_pond", "tuttle_pond",
"tuttle_pond", "tryon_weber", "tryon_weber", "tryon_weber", "vorisek_pond",
"vorisek_pond", "rv_pond", "rv_pond", "tuttle_pond", "tuttle_pond",
"tryon_weber", "tuttle_pond", "rv_pond", "tuttle_pond"), date = structure(c(19116,
19117, 19123, 19124, 19130, 19131, 19132, 19138, 19139, 19140,
19146, 19147, 19148, 19150, 19151, 19157, 19158, 19159, 19165,
19166, 19166, 19170, 19171, 19184, 19185, 19191, 19192, 19206,
19244, 19245, 19265), class = "Date")), class = "data.frame", row.names = c(NA,
-31L))
[1]: https://i.stack.imgur.com/SiEUE.png
[2]: https://i.stack.imgur.com/xCrwc.png
</details>
# 答案1
**得分**: 1
我相信这将获取所有的站点访问记录,将它们与观察到的物种记录连接起来,并完成列表,包括所有的物种,对于那些缺失的物种使用 n = 0。
结果
连接中 `by = join_by(site, date)`。
# 一个 tibble: 95 × 4
date site species_capture n
<date> <chr> <chr> <int>
1 2022-05-04 wood_lab_pond pseudacris_crucifer 2
2 2022-05-04 wood_lab_pond rana_catesbeiana 2
3 2022-05-04 wood_lab_pond rana_clamitans 1
4 2022-05-05 phelps_pond pseudacris_crucifer 4
5 2022-05-05 phelps_pond rana_catesbeiana 4
6 2022-05-05 phelps_pond rana_clamitans 7
7 2022-05-11 admin_pond pseudacris_crucifer 4
8 2022-05-11 admin_pond rana_catesbeiana 9
9 2022-05-11 admin_pond rana_clamitans 5
10 2022-05-12 rv_pond pseudacris_crucifer 16
# ℹ 还有 85 行
# ℹ 使用 `print(n = ...)` 以查看更多行。
<details>
<summary>英文:</summary>
I believe this will take all the site visits, connect them to the observed species records, and complete the list to include all the species, using n = 0 for those which were missing.
raw_visits %>%
left_join(mid_clean_up) %>%
complete(nesting(date, site),
species_capture = unique(mid_clean_up$species_capture),
fill = list(n = 0))
Result
Joining with `by = join_by(site, date)`
# A tibble: 95 × 4
date site species_capture n
<date> <chr> <chr> <int>
1 2022-05-04 wood_lab_pond pseudacris_crucifer 2
2 2022-05-04 wood_lab_pond rana_catesbeiana 2
3 2022-05-04 wood_lab_pond rana_clamitans 1
4 2022-05-05 phelps_pond pseudacris_crucifer 4
5 2022-05-05 phelps_pond rana_catesbeiana 4
6 2022-05-05 phelps_pond rana_clamitans 7
7 2022-05-11 admin_pond pseudacris_crucifer 4
8 2022-05-11 admin_pond rana_catesbeiana 9
9 2022-05-11 admin_pond rana_clamitans 5
10 2022-05-12 rv_pond pseudacris_crucifer 16
# ℹ 85 more rows
# ℹ Use `print(n = ...)` to see more rows
</details>
# 答案2
**得分**: 0
以下是已翻译的内容:
一种解决方法是将以下代码段移动到其调用流程的末尾。也就是说,
```R
bull_frog_visits <- mid_clean_up %>%
select(site, date, species_capture, n) %>%
group_by(site) %>%
mutate(n_visit = match(date, unique(date)),
n_visit = paste0("visit_", n_visit, sep = "")) %>%
select(!date) %>%
ungroup() %>%
pivot_wider(names_from = c("n_visit"),
values_from = c("n"),
values_fn = ~ ifelse(is.na(.),0,.)) %>%
group_by(site) %>%
mutate(across(contains("visit"),
~ifelse(is.na(.) &
!is.na(lag(.)), 0, .))) %>%
filter(species_capture == "rana_catesbeiana") %>%
select(!species_capture)
这将生成(希望如此的)输出:
# A tibble: 6 × 7
# Groups: site [6]
site visit_1 visit_2 visit_3 visit_4 visit_5 visit_6
<chr> <dbl> <int> <dbl> <dbl> <int> <dbl>
1 wood_lab_pond 2 4 3 NA NA NA
2 phelps_pond 4 3 6 NA NA NA
3 admin_pond 9 15 20 0 NA NA
4 rv_pond 1 7 5 13 29 11
5 tuttle_pond 15 22 39 31 29 20
6 vorisek_pond 14 13 NA NA NA NA
英文:
One solution is to move
filter(species_capture == "rana_catesbeiana") %>%
select(!species_capture)
to the end of its call-flow. That is,
bull_frog_visits <- mid_clean_up %>%
select(site, date, species_capture, n) %>%
group_by(site) %>%
mutate(n_visit = match(date, unique(date)),
n_visit = paste0("visit_", n_visit, sep = "")) %>%
select(!date) %>%
ungroup() %>%
pivot_wider(names_from = c("n_visit"),
values_from = c("n"),
values_fn = ~ ifelse(is.na(.),0,.)) %>%
#add_row(site = "phelps_pond", capture_type = "recapture") %>%
group_by(site) %>%
mutate(across(contains("visit"),
~ifelse(is.na(.) &
!is.na(lag(.)), 0, .)))%>%
filter(species_capture == "rana_catesbeiana") %>%
select(!species_capture)
This produces the (hopefully desired) output:
# A tibble: 6 × 7
# Groups: site [6]
site visit_1 visit_2 visit_3 visit_4 visit_5 visit_6
<chr> <dbl> <int> <dbl> <dbl> <int> <dbl>
1 wood_lab_pond 2 4 3 NA NA NA
2 phelps_pond 4 3 6 NA NA NA
3 admin_pond 9 15 20 0 NA NA
4 rv_pond 1 7 5 13 29 11
5 tuttle_pond 15 22 39 31 29 20
6 vorisek_pond 14 13 NA NA NA NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论