在R中将零填充到未计数的生物体的数据框中。

huangapple go评论57阅读模式
英文:

Populate Zeros into df Where Organisms Were not Counted - R

问题

我有一个数据框,如下所示,其中包含每个站点和日期的一种物种(3种独特的焦点物种)的原始计数。然而,数据框不包括在未遇到生物体时访问的站点。我有一个第二个数据框,其中包含每个站点访问的所有日期。我需要在数据框中为访问了站点但未观察到每种3种物种中的任何一种生物的情况下填充零值。

例如,站点“admin_pond”被访问了4次,但当我将数据为一种物种展开时,它显示只有3次访问,因为4次中有3次遇到了生物体。我需要相应地将第4次访问填充为零。

# 针对遇到生物体的站点的数据
mid_clean_up <- 结构(列表(日期=结构(c(191161911619116191171911719117191231912319123191241912419130191301913019131191311913219138191381913919139191461914619147191471915019150191511915119157191571915819158191661917019170191711917119184191841918519185191911919119192191921920619244192441924519265),类=“日期”),站点= c(“wood_lab_pond”,
“wood_lab_pond”,“wood_lab_pond”,“phelps_pond”,“phelps_pond”,
“phelps_pond”,“admin_pond”,“admin_pond”,“admin_pond”,“rv_pond”,
“rv_pond”,“admin_pond”,“admin_pond”,“admin_pond”,“admin_pond”,
“admin_pond”,“admin_pond”,“wood_lab_pond”,“wood_lab_pond”,
“wood_lab_pond”,“wood_lab_pond”,“phelps_pond”,“phelps_pond”,
“phelps_pond”,“phelps_pond”,“rv_pond”,“rv_pond”,“rv_pond”,
“rv_pond”,“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,
“tryon_weber”,“vorisek_pond”,“vorisek_pond”,“vorisek_pond”,
“vorisek_pond”,“rv_pond”,“rv_pond”,“rv_pond”,“rv_pond”,“tuttle_pond”,
“tuttle_pond”,“tuttle_pond”,“tuttle_pond”,“tryon_weber”,“tuttle_pond”,
“tuttle_pond”,“rv_pond”,“tuttle_pond”,“tuttle_pond”),species_capture = c(“pseudacris_crucifer”,
“rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
“rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
“rana_catesbeiana”,“rana_clamitans”,“pseudacris_crucifer”,
“rana_catesbeiana”,“pseudacris_crucifer”,“rana_catesbeiana”,
“rana_clamitans”,“pseudacris_crucifer”,“rana_catesbeiana”,
“pseudacris_crucifer”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_clamitans”,“rana_clamitans”,“rana_catesbeiana”,
“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,
“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,“rana_catesbeiana”,
“rana_clamitans”,“rana_clamitans”,“rana_catesbeiana”,“rana_clamitans”,
“rana_catesbeiana”,“rana_catesbeiana”,“rana_clamitans”),n = c(2L2L1L4L4L7L4L9L5L16L1L2L15L3L3L20L1L4L22L3L3L3L10L6L16L7L2L5L1L15L19L22L3L1L14L8L13L1L13L7L29L3L39L3L31L2L2L29L1L11L20L2L)),类= c(“tbl_df”,“tbl”,“data.frame”),row.names = c(NA-52L))

# 针对一种物种访问的数据透视表
bull_frog_visits <- mid_clean_up %>% 
  选择(站点,日期,species_capture,n) %>% 
  筛选(species_capture ==“rana_catesbeiana”) %>% 
  选择(!species_capture) %>% 
  按站点分组() %>% 
  变异(n_visit = match(date,unique(date)),
         n_visit = paste0(“visit_”,n_visit,sep =“”)) %>% 
  选择(!date) %>% 
  取消分组() %>% 
  透视更宽(names_from = c(“n_visit”

<details>
<summary>英文:</summary>

I have a df, as shown below, that has raw counts of a species (3 unique focal species) per site and date. However the df does not include sites when no organisms were encountered. I have a second df that  has all the dates for each site visit. I need to populate zeros across the data frame for when sites were visited but no organism for each of the 3 species were not observed. 

For example, site &quot;admin_pond&quot; was visited 4 times but when I pivot the data wider for one species it shows it only had 3 visits because 3 out of the 4 times organisms were encountered. I need to populate that 4th visit as a zero accordlingly.

[![one species][1]][1]

[![true visits][2]][2]



data for sites where organisms were encountered

mid_clean_up <- structure(list(date = structure(c(19116, 19116, 19116, 19117,
19117, 19117, 19123, 19123, 19123, 19124, 19124, 19130, 19130,
19130, 19131, 19131, 19132, 19138, 19138, 19139, 19139, 19146,
19146, 19147, 19147, 19150, 19150, 19151, 19151, 19157, 19157,
19158, 19158, 19166, 19170, 19170, 19171, 19171, 19184, 19184,
19185, 19185, 19191, 19191, 19192, 19192, 19206, 19244, 19244,
19245, 19265, 19265), class = "Date"), site = c("wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "admin_pond", "admin_pond", "admin_pond", "rv_pond",
"rv_pond", "admin_pond", "admin_pond", "admin_pond", "admin_pond",
"admin_pond", "admin_pond", "wood_lab_pond", "wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "phelps_pond", "rv_pond", "rv_pond", "rv_pond",
"rv_pond", "tuttle_pond", "tuttle_pond", "tuttle_pond", "tuttle_pond",
"tryon_weber", "vorisek_pond", "vorisek_pond", "vorisek_pond",
"vorisek_pond", "rv_pond", "rv_pond", "rv_pond", "rv_pond", "tuttle_pond",
"tuttle_pond", "tuttle_pond", "tuttle_pond", "tryon_weber", "tuttle_pond",
"tuttle_pond", "rv_pond", "tuttle_pond", "tuttle_pond"), species_capture = c("pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "rana_clamitans", "pseudacris_crucifer",
"rana_catesbeiana", "pseudacris_crucifer", "rana_catesbeiana",
"rana_clamitans", "pseudacris_crucifer", "rana_catesbeiana",
"pseudacris_crucifer", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_catesbeiana",
"rana_clamitans", "rana_catesbeiana", "rana_clamitans", "rana_clamitans",
"rana_catesbeiana", "rana_clamitans", "rana_catesbeiana", "rana_catesbeiana",
"rana_clamitans"), n = c(2L, 2L, 1L, 4L, 4L, 7L, 4L, 9L, 5L,
16L, 1L, 2L, 15L, 3L, 3L, 20L, 1L, 4L, 22L, 3L, 3L, 3L, 10L,
6L, 16L, 7L, 2L, 5L, 1L, 15L, 19L, 22L, 3L, 1L, 14L, 8L, 13L,
1L, 13L, 7L, 29L, 3L, 39L, 3L, 31L, 2L, 2L, 29L, 1L, 11L, 20L,
2L)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-52L))

pivot table for visits for one species

bull_frog_visits <- mid_clean_up %>%
select(site, date, species_capture, n) %>%
filter(species_capture == "rana_catesbeiana") %>%
select(!species_capture) %>%
group_by(site) %>%
mutate(n_visit = match(date, unique(date)),
n_visit = paste0("visit_", n_visit, sep = "")) %>%
select(!date) %>%
ungroup() %>%
pivot_wider(names_from = c("n_visit"), values_from = c("n"))%>%
#add_row(site = "phelps_pond", capture_type = "recapture") %>%
group_by(site) %>%
mutate(across(contains("visit"),
~ifelse(is.na(.) &
!is.na(lag(.)), 0, .)))


All sites Visited data

raw_visits <- structure(list(site = c("wood_lab_pond", "phelps_pond", "admin_pond",
"rv_pond", "admin_pond", "admin_pond", "admin_pond", "wood_lab_pond",
"wood_lab_pond", "wood_lab_pond", "phelps_pond", "phelps_pond",
"phelps_pond", "rv_pond", "rv_pond", "tuttle_pond", "tuttle_pond",
"tuttle_pond", "tryon_weber", "tryon_weber", "tryon_weber", "vorisek_pond",
"vorisek_pond", "rv_pond", "rv_pond", "tuttle_pond", "tuttle_pond",
"tryon_weber", "tuttle_pond", "rv_pond", "tuttle_pond"), date = structure(c(19116,
19117, 19123, 19124, 19130, 19131, 19132, 19138, 19139, 19140,
19146, 19147, 19148, 19150, 19151, 19157, 19158, 19159, 19165,
19166, 19166, 19170, 19171, 19184, 19185, 19191, 19192, 19206,
19244, 19245, 19265), class = "Date")), class = "data.frame", row.names = c(NA,
-31L))


[1]: https://i.stack.imgur.com/SiEUE.png
[2]: https://i.stack.imgur.com/xCrwc.png
</details>
# 答案1
**得分**: 1
我相信这将获取所有的站点访问记录,将它们与观察到的物种记录连接起来,并完成列表,包括所有的物种,对于那些缺失的物种使用 n = 0。
结果
连接中 `by = join_by(site, date)`。
# 一个 tibble: 95 × 4
date       site          species_capture         n
<date>     <chr>         <chr>               <int>
1 2022-05-04 wood_lab_pond pseudacris_crucifer     2
2 2022-05-04 wood_lab_pond rana_catesbeiana        2
3 2022-05-04 wood_lab_pond rana_clamitans          1
4 2022-05-05 phelps_pond   pseudacris_crucifer     4
5 2022-05-05 phelps_pond   rana_catesbeiana        4
6 2022-05-05 phelps_pond   rana_clamitans          7
7 2022-05-11 admin_pond    pseudacris_crucifer     4
8 2022-05-11 admin_pond    rana_catesbeiana        9
9 2022-05-11 admin_pond    rana_clamitans          5
10 2022-05-12 rv_pond       pseudacris_crucifer    16
# ℹ 还有 85 行
# ℹ 使用 `print(n = ...)` 以查看更多行。
<details>
<summary>英文:</summary>
I believe this will take all the site visits, connect them to the observed species records, and complete the list to include all the species, using n = 0 for those which were missing.
raw_visits %&gt;%
left_join(mid_clean_up) %&gt;%
complete(nesting(date, site), 
species_capture = unique(mid_clean_up$species_capture), 
fill = list(n = 0))
Result
Joining with `by = join_by(site, date)`
# A tibble: 95 &#215; 4
date       site          species_capture         n
&lt;date&gt;     &lt;chr&gt;         &lt;chr&gt;               &lt;int&gt;
1 2022-05-04 wood_lab_pond pseudacris_crucifer     2
2 2022-05-04 wood_lab_pond rana_catesbeiana        2
3 2022-05-04 wood_lab_pond rana_clamitans          1
4 2022-05-05 phelps_pond   pseudacris_crucifer     4
5 2022-05-05 phelps_pond   rana_catesbeiana        4
6 2022-05-05 phelps_pond   rana_clamitans          7
7 2022-05-11 admin_pond    pseudacris_crucifer     4
8 2022-05-11 admin_pond    rana_catesbeiana        9
9 2022-05-11 admin_pond    rana_clamitans          5
10 2022-05-12 rv_pond       pseudacris_crucifer    16
# ℹ 85 more rows
# ℹ Use `print(n = ...)` to see more rows
</details>
# 答案2
**得分**: 0
以下是已翻译的内容:
一种解决方法是将以下代码段移动到其调用流程的末尾。也就是说,
```R
bull_frog_visits <- mid_clean_up %>%
select(site, date, species_capture, n) %>%
group_by(site) %>%
mutate(n_visit = match(date, unique(date)),
n_visit = paste0("visit_", n_visit, sep = "")) %>%
select(!date) %>%
ungroup() %>%
pivot_wider(names_from = c("n_visit"), 
values_from = c("n"),
values_fn = ~ ifelse(is.na(.),0,.)) %>%
group_by(site) %>%
mutate(across(contains("visit"), 
~ifelse(is.na(.) &
!is.na(lag(.)), 0, .))) %>%
filter(species_capture == "rana_catesbeiana") %>%
select(!species_capture)

这将生成(希望如此的)输出:

# A tibble: 6 × 7
# Groups:   site [6]
  site          visit_1 visit_2 visit_3 visit_4 visit_5 visit_6
  <chr>           <dbl>   <int>   <dbl>   <dbl>   <int>   <dbl>
1 wood_lab_pond       2       4       3      NA      NA      NA
2 phelps_pond         4       3       6      NA      NA      NA
3 admin_pond          9      15      20       0      NA      NA
4 rv_pond             1       7       5      13      29      11
5 tuttle_pond        15      22      39      31      29      20
6 vorisek_pond       14      13      NA      NA      NA      NA
英文:

One solution is to move

  filter(species_capture == &quot;rana_catesbeiana&quot;) %&gt;% 
select(!species_capture)

to the end of its call-flow. That is,

bull_frog_visits &lt;- mid_clean_up %&gt;%
select(site, date, species_capture, n)  %&gt;% 
group_by(site) %&gt;% 
mutate(n_visit = match(date, unique(date)),
n_visit = paste0(&quot;visit_&quot;, n_visit, sep = &quot;&quot;)) %&gt;% 
select(!date) %&gt;% 
ungroup() %&gt;%
pivot_wider(names_from = c(&quot;n_visit&quot;), 
values_from = c(&quot;n&quot;),
values_fn = ~ ifelse(is.na(.),0,.)) %&gt;% 
#add_row(site = &quot;phelps_pond&quot;, capture_type = &quot;recapture&quot;) %&gt;% 
group_by(site) %&gt;% 
mutate(across(contains(&quot;visit&quot;), 
~ifelse(is.na(.) &amp;
!is.na(lag(.)), 0, .)))%&gt;% 
filter(species_capture == &quot;rana_catesbeiana&quot;) %&gt;% 
select(!species_capture)

This produces the (hopefully desired) output:

# A tibble: 6 &#215; 7
# Groups:   site [6]
site          visit_1 visit_2 visit_3 visit_4 visit_5 visit_6
&lt;chr&gt;           &lt;dbl&gt;   &lt;int&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;int&gt;   &lt;dbl&gt;
1 wood_lab_pond       2       4       3      NA      NA      NA
2 phelps_pond         4       3       6      NA      NA      NA
3 admin_pond          9      15      20       0      NA      NA
4 rv_pond             1       7       5      13      29      11
5 tuttle_pond        15      22      39      31      29      20
6 vorisek_pond       14      13      NA      NA      NA      NA

huangapple
  • 本文由 发表于 2023年5月25日 04:51:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/76327316.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定