在R中将零填充到未计数的生物体的数据框中。

huangapple go评论91阅读模式
英文:

Populate Zeros into df Where Organisms Were not Counted - R

问题

  1. 我有一个数据框,如下所示,其中包含每个站点和日期的一种物种(3种独特的焦点物种)的原始计数。然而,数据框不包括在未遇到生物体时访问的站点。我有一个第二个数据框,其中包含每个站点访问的所有日期。我需要在数据框中为访问了站点但未观察到每种3种物种中的任何一种生物的情况下填充零值。
  2. 例如,站点“admin_pond”被访问了4次,但当我将数据为一种物种展开时,它显示只有3次访问,因为4次中有3次遇到了生物体。我需要相应地将第4次访问填充为零。
  3. # 针对遇到生物体的站点的数据
  4. mid_clean_up <- 结构(列表(日期=结构(c19116191161911619117
  5. 191171911719123191231912319124191241913019130
  6. 191301913119131191321913819138191391913919146
  7. 191461914719147191501915019151191511915719157
  8. 191581915819166191701917019171191711918419184
  9. 191851918519191191911919219192192061924419244
  10. 1924519265),类=“日期”),站点= c(“wood_lab_pond”,
  11. wood_lab_pond”,“wood_lab_pond”,“phelps_pond”,“phelps_pond”,
  12. phelps_pond”,“admin_pond”,“admin_pond”,“admin_pond”,“rv_pond”,
  13. rv_pond”,“admin_pond”,“admin_pond”,“admin_pond”,“admin_pond”,
  14. admin_pond”,“admin_pond”,“wood_lab_pond”,“wood_lab_pond”,
  15. wood_lab_pond”,“wood_lab_pond”,“phelps_pond”,“phelps_pond”,
  16. phelps_pond”,“phelps_pond”,“rv_pond”,“rv_pond”,“rv_pond”,
  17. rv_pond”,“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,
  18. tryon_weber”,“vorisek_pond”,“vorisek_pond”,“vorisek_pond”,
  19. vorisek_pond”,“rv_pond”,“rv_pond”,“rv_pond”,“rv_pond”,“tuttle_pond”,
  20. tuttle_pond”,“tuttle_pond”,“tuttle_pond”,“tryon_weber”,“tuttle_pond”,
  21. tuttle_pond”,“rv_pond”,“tuttle_pond”,“tuttle_pond”),species_capture = c(“pseudacris_crucifer”,
  22. rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
  23. rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
  24. rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
  25. rana_catesbeiana”,“pseudacris_crucifer”,“rana_catesbeiana”,
  26. rana_clamitans”,“pseudacris_crucifer”,“rana_catesbeiana”,
  27. pseudacris_crucifer”,“rana_catesbeiana”,“rana_clamitans”,
  28. rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
  29. rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
  30. rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
  31. rana_catesbeiana”,“rana_clamitans”,“rana_clamitans”,“rana_catesbeiana”,
  32. rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,
  33. rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,
  34. rana_clamitans”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
  35. rana_catesbeiana”,“rana_catesbeiana”,“rana_clamitans”),n = c2L2L1L4L4L7L4L9L5L
  36. 16L1L2L15L3L3L20L1L4L22L3L3L3L10L
  37. 6L16L7L2L5L1L15L19L22L3L1L14L8L13L
  38. 1L13L7L29L3L39L3L31L2L2L29L1L11L20L
  39. 2L)),类= c(“tbl_df”,“tbl”,“data.frame”),row.names = cNA
  40. -52L))
  41. # 针对一种物种访问的数据透视表
  42. bull_frog_visits <- mid_clean_up >%
  43. 选择(站点,日期,species_capturen %>%
  44. 筛选(species_capture ==rana_catesbeiana”) >%
  45. 选择(!species_capture %>%
  46. 按站点分组() >%
  47. 变异(n_visit = matchdateuniquedate)),
  48. n_visit = paste0(“visit_”,n_visitsep =“”)) %>%
  49. 选择(!date >%
  50. 取消分组() %>%
  51. 透视更宽(names_from = c(“n_visit
  52. <details>
  53. <summary>英文:</summary>
  54. I have a df, as shown below, that has raw counts of a species (3 unique focal species) per site and date. However the df does not include sites when no organisms were encountered. I have a second df that has all the dates for each site visit. I need to populate zeros across the data frame for when sites were visited but no organism for each of the 3 species were not observed.
  55. For example, site &quot;admin_pond&quot; was visited 4 times but when I pivot the data wider for one species it shows it only had 3 visits because 3 out of the 4 times organisms were encountered. I need to populate that 4th visit as a zero accordlingly.
  56. [![one species][1]][1]
  57. [![true visits][2]][2]
  58. data for sites where organisms were encountered

mid_clean_up <- structure(list(date = structure(c(19116, 19116, 19116, 19117,
19117, 19117, 19123, 19123, 19123, 19124, 19124, 19130, 19130,
19130, 19131, 19131, 19132, 19138, 19138, 19139, 19139, 19146,
19146, 19147, 19147, 19150, 19150, 19151, 19151, 19157, 19157,
19158, 19158, 19166, 19170, 19170, 19171, 19171, 19184, 19184,
19185, 19185, 19191, 19191, 19192, 19192, 19206, 19244, 19244,
19245, 19265, 19265), class = "Date"), site = c("wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "admin_pond", "admin_pond", "admin_pond", "rv_pond",
"rv_pond", "admin_pond", "admin_pond", "admin_pond", "admin_pond",
"admin_pond", "admin_pond", "wood_lab_pond", "wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "phelps_pond", "rv_pond", "rv_pond", "rv_pond",
"rv_pond", "tuttle_pond", "tuttle_pond", "tuttle_pond", "tuttle_pond",
"tryon_weber", "vorisek_pond", "vorisek_pond", "vorisek_pond",
"vorisek_pond", "rv_pond", "rv_pond", "rv_pond", "rv_pond", "tuttle_pond",
"tuttle_pond", "tuttle_pond", "tuttle_pond", "tryon_weber", "tuttle_pond",
"tuttle_pond", "rv_pond", "tuttle_pond", "tuttle_pond"), species_capture = c("pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "pseudacris_crucifer", "rana_catesbeiana",
"rana_clamitans", "pseudacris_crucifer", "rana_catesbeiana",
"pseudacris_crucifer", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_catesbeiana",
"rana_clamitans"), n = c(2L, 2L, 1L, 4L, 4L, 7L, 4L, 9L, 5L,
16L, 1L, 2L, 15L, 3L, 3L, 20L, 1L, 4L, 22L, 3L, 3L, 3L, 10L,
6L, 16L, 7L, 2L, 5L, 1L, 15L, 19L, 22L, 3L, 1L, 14L, 8L, 13L,
1L, 13L, 7L, 29L, 3L, 39L, 3L, 31L, 2L, 2L, 29L, 1L, 11L, 20L,
2L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-52L))

pivot table for visits for one species

bull_frog_visits <- mid_clean_up %>%
select(site, date, species_capture, n) %>%
filter(species_capture == "rana_catesbeiana") %>%
select(!species_capture) %>%
group_by(site) %>%
mutate(n_visit = match(date, unique(date)),
n_visit = paste0("visit_", n_visit, sep = "")) %>%
select(!date) %>%
ungroup() %>%
pivot_wider(names_from = c("n_visit"), values_from = c("n"))%>%
#add_row(site = "phelps_pond", capture_type = "recapture") %>%
group_by(site) %>%
mutate(across(contains("visit"),
~ifelse(is.na(.) &
!is.na(lag(.)), 0, .)))

  1. All sites Visited data

raw_visits <- structure(list(site = c("wood_lab_pond", "phelps_pond", "admin_pond",
"rv_pond", "admin_pond", "admin_pond", "admin_pond", "wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "rv_pond", "rv_pond", "tuttle_pond", "tuttle_pond",
"tuttle_pond", "tryon_weber", "tryon_weber", "tryon_weber", "vorisek_pond",
"vorisek_pond", "rv_pond", "rv_pond", "tuttle_pond", "tuttle_pond",
"tryon_weber", "tuttle_pond", "rv_pond", "tuttle_pond"), date = structure(c(19116,
19117, 19123, 19124, 19130, 19131, 19132, 19138, 19139, 19140,
19146, 19147, 19148, 19150, 19151, 19157, 19158, 19159, 19165,
19166, 19166, 19170, 19171, 19184, 19185, 19191, 19192, 19206,
19244, 19245, 19265), class = "Date")), class = "data.frame", row.names = c(NA,
-31L))

  1. [1]: https://i.stack.imgur.com/SiEUE.png
  2. [2]: https://i.stack.imgur.com/xCrwc.png
  3. </details>
  4. # 答案1
  5. **得分**: 1
  6. 我相信这将获取所有的站点访问记录,将它们与观察到的物种记录连接起来,并完成列表,包括所有的物种,对于那些缺失的物种使用 n = 0
  7. 结果
  8. 连接中 `by = join_by(site, date)`
  9. # 一个 tibble: 95 × 4
  10. date site species_capture n
  11. <date> <chr> <chr> <int>
  12. 1 2022-05-04 wood_lab_pond pseudacris_crucifer 2
  13. 2 2022-05-04 wood_lab_pond rana_catesbeiana 2
  14. 3 2022-05-04 wood_lab_pond rana_clamitans 1
  15. 4 2022-05-05 phelps_pond pseudacris_crucifer 4
  16. 5 2022-05-05 phelps_pond rana_catesbeiana 4
  17. 6 2022-05-05 phelps_pond rana_clamitans 7
  18. 7 2022-05-11 admin_pond pseudacris_crucifer 4
  19. 8 2022-05-11 admin_pond rana_catesbeiana 9
  20. 9 2022-05-11 admin_pond rana_clamitans 5
  21. 10 2022-05-12 rv_pond pseudacris_crucifer 16
  22. # ℹ 还有 85 行
  23. # ℹ 使用 `print(n = ...)` 以查看更多行。
  24. <details>
  25. <summary>英文:</summary>
  26. I believe this will take all the site visits, connect them to the observed species records, and complete the list to include all the species, using n = 0 for those which were missing.
  27. raw_visits %&gt;%
  28. left_join(mid_clean_up) %&gt;%
  29. complete(nesting(date, site),
  30. species_capture = unique(mid_clean_up$species_capture),
  31. fill = list(n = 0))
  32. Result
  33. Joining with `by = join_by(site, date)`
  34. # A tibble: 95 &#215; 4
  35. date site species_capture n
  36. &lt;date&gt; &lt;chr&gt; &lt;chr&gt; &lt;int&gt;
  37. 1 2022-05-04 wood_lab_pond pseudacris_crucifer 2
  38. 2 2022-05-04 wood_lab_pond rana_catesbeiana 2
  39. 3 2022-05-04 wood_lab_pond rana_clamitans 1
  40. 4 2022-05-05 phelps_pond pseudacris_crucifer 4
  41. 5 2022-05-05 phelps_pond rana_catesbeiana 4
  42. 6 2022-05-05 phelps_pond rana_clamitans 7
  43. 7 2022-05-11 admin_pond pseudacris_crucifer 4
  44. 8 2022-05-11 admin_pond rana_catesbeiana 9
  45. 9 2022-05-11 admin_pond rana_clamitans 5
  46. 10 2022-05-12 rv_pond pseudacris_crucifer 16
  47. # ℹ 85 more rows
  48. # ℹ Use `print(n = ...)` to see more rows
  49. </details>
  50. # 答案2
  51. **得分**: 0
  52. 以下是已翻译的内容:
  53. 一种解决方法是将以下代码段移动到其调用流程的末尾。也就是说,
  54. ```R
  55. bull_frog_visits <- mid_clean_up %>%
  56. select(site, date, species_capture, n) %>%
  57. group_by(site) %>%
  58. mutate(n_visit = match(date, unique(date)),
  59. n_visit = paste0("visit_", n_visit, sep = "")) %>%
  60. select(!date) %>%
  61. ungroup() %>%
  62. pivot_wider(names_from = c("n_visit"),
  63. values_from = c("n"),
  64. values_fn = ~ ifelse(is.na(.),0,.)) %>%
  65. group_by(site) %>%
  66. mutate(across(contains("visit"),
  67. ~ifelse(is.na(.) &
  68. !is.na(lag(.)), 0, .))) %>%
  69. filter(species_capture == "rana_catesbeiana") %>%
  70. select(!species_capture)

这将生成(希望如此的)输出:

  1. # A tibble: 6 × 7
  2. # Groups: site [6]
  3. site visit_1 visit_2 visit_3 visit_4 visit_5 visit_6
  4. <chr> <dbl> <int> <dbl> <dbl> <int> <dbl>
  5. 1 wood_lab_pond 2 4 3 NA NA NA
  6. 2 phelps_pond 4 3 6 NA NA NA
  7. 3 admin_pond 9 15 20 0 NA NA
  8. 4 rv_pond 1 7 5 13 29 11
  9. 5 tuttle_pond 15 22 39 31 29 20
  10. 6 vorisek_pond 14 13 NA NA NA NA
英文:

One solution is to move

  1. filter(species_capture == &quot;rana_catesbeiana&quot;) %&gt;%
  2. select(!species_capture)

to the end of its call-flow. That is,

  1. bull_frog_visits &lt;- mid_clean_up %&gt;%
  2. select(site, date, species_capture, n) %&gt;%
  3. group_by(site) %&gt;%
  4. mutate(n_visit = match(date, unique(date)),
  5. n_visit = paste0(&quot;visit_&quot;, n_visit, sep = &quot;&quot;)) %&gt;%
  6. select(!date) %&gt;%
  7. ungroup() %&gt;%
  8. pivot_wider(names_from = c(&quot;n_visit&quot;),
  9. values_from = c(&quot;n&quot;),
  10. values_fn = ~ ifelse(is.na(.),0,.)) %&gt;%
  11. #add_row(site = &quot;phelps_pond&quot;, capture_type = &quot;recapture&quot;) %&gt;%
  12. group_by(site) %&gt;%
  13. mutate(across(contains(&quot;visit&quot;),
  14. ~ifelse(is.na(.) &amp;
  15. !is.na(lag(.)), 0, .)))%&gt;%
  16. filter(species_capture == &quot;rana_catesbeiana&quot;) %&gt;%
  17. select(!species_capture)

This produces the (hopefully desired) output:

  1. # A tibble: 6 &#215; 7
  2. # Groups: site [6]
  3. site visit_1 visit_2 visit_3 visit_4 visit_5 visit_6
  4. &lt;chr&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
  5. 1 wood_lab_pond 2 4 3 NA NA NA
  6. 2 phelps_pond 4 3 6 NA NA NA
  7. 3 admin_pond 9 15 20 0 NA NA
  8. 4 rv_pond 1 7 5 13 29 11
  9. 5 tuttle_pond 15 22 39 31 29 20
  10. 6 vorisek_pond 14 13 NA NA NA NA

huangapple
  • 本文由 发表于 2023年5月25日 04:51:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327316.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定