英文:
How can I add zeros or NAs to each species that does not have a corresponding date in R?
问题
我有一个数据库,看起来像这样(尽管它有更多的物种和日期):
日期 | 物种 | 计数 |
---|---|---|
2022/06/01 | AMAV | 50 |
2022/06/05 | AMAV | 20 |
2022/06/07 | SESA | 10 |
2022/06/07 | AMAV | 8 |
如果原始数据中某个日期没有物种列出,我想要在该日期为每个物种设置计数为0。当没有列出物种时,这意味着该物种没有被检测到,因此计数应为零。
因此,对于这个示例,我想要它看起来像这样:
日期 | 物种 | 计数 |
---|---|---|
2022/06/01 | AMAV | 50 |
2022/06/01 | SESA | 0 |
2022/06/05 | AMAV | 20 |
2022/06/05 | SESA | 0 |
2022/06/07 | SESA | 10 |
2022/06/07 | AMAV | 8 |
所以我想要每个日期列出所有的物种。
我想知道是否有人知道在R中如何实现这一点。我实在不知道如何解决这个问题,因此会感激任何提示!
英文:
I have a database what looks like this (although it has more species and dates):
Date | Species | Count |
---|---|---|
2022/06/01 | AMAV | 50 |
2022/06/05 | AMAV | 20 |
2022/06/07 | SESA | 10 |
2022/06/07 | AMAV | 8 |
I want each date to include each species at a count of 0 if there is origninally no species listed for that date. When there is no species listed, it just means that the species was not detected and therefore it should be a count of zero.
So for this example, I would want it to look like:
Date | Species | Count |
---|---|---|
2022/06/01 | AMAV | 50 |
2022/06/01 | SESA | 0 |
2022/06/05 | AMAV | 20 |
2022/06/05 | SESA | 0 |
2022/06/07 | SESA | 10 |
2022/06/07 | AMAV | 8 |
So I want all of my species listed for each date.
I'm wondering if anyone knows a way to accomplish this in R. I honestly have no idea how to tackle this problem, so would appreciate any tips!
答案1
得分: 0
# 所有组合
full <- expand.grid(Date = unique(df$Date), Species = unique(df$Species))
# 填充新表
merge(full, df, by = c('Date', 'Species'), all.x = TRUE)
# Date Species Count
#1 2022/06/01 AMAV 50
#2 2022/06/01 SESA NA
#3 2022/06/05 AMAV 20
#4 2022/06/05 SESA NA
#5 2022/06/07 AMAV 8
#6 2022/06/07 SESA 10
之后,如果需要,你可以使用标准的 df$Count[is.na(df$Count)] <- 0
将 NA 值替换为零。
英文:
Mostly a classic problem of filling in a series with missing dates. Create data expanding all values across all dates, and left outer join to this data.
# All combinations
full <- expand.grid(Date = unique(df$Date), Species = unique(df$Species))
# Filled new table
merge(full, df, by = c('Date', 'Species'), all.x = TRUE)
# Date Species Count
#1 2022/06/01 AMAV 50
#2 2022/06/01 SESA NA
#3 2022/06/05 AMAV 20
#4 2022/06/05 SESA NA
#5 2022/06/07 AMAV 8
#6 2022/06/07 SESA 10
Afterwards you can replace NA values with zeros' using the standard df$Count[is.na(df$Count)] <- 0
if that is preferred.
答案2
得分: 0
以下是已翻译的代码部分:
library(dplyr)
library(tidyr)
df %>%
mutate(x = ifelse(Species == "AMAV", "SESA", "AMAV")) %>%
pivot_longer(c(Species, x),
names_to = "name",
values_to = "Species") %>%
mutate(Count = ifelse(name == "x", 0, Count), .keep = "unused") %>%
arrange(Date, -Count) %>%
group_by(Date) %>%
slice(1:2) %>%
ungroup()
Date Count Species
1 2022/06/01 50 AMAV
2 2022/06/01 0 SESA
3 2022/06/05 20 AMAV
4 2022/06/05 0 SESA
5 2022/06/07 10 SESA
6 2022/06/07 8 AMAV
英文:
A tidyverse solution:
Here we first create a helper column x
,
by pivoting we get the column in shape,
then replacing x by 0 and
with arrange bringing rows in position to group and slice:
library(dplyr)
library(tidyr)
df %>%
mutate(x = ifelse(Species=="AMAV", "SESA", "AMAV")) %>%
pivot_longer(c(Species, x),
names_to="name",
values_to = "Species") %>%
mutate(Count = ifelse(name=="x", 0, Count), .keep="unused") %>%
arrange(Date, -Count) %>%
group_by(Date) %>%
slice(1:2) %>%
ungroup()
Date Count Species
<chr> <dbl> <chr>
1 2022/06/01 50 AMAV
2 2022/06/01 0 SESA
3 2022/06/05 20 AMAV
4 2022/06/05 0 SESA
5 2022/06/07 10 SESA
6 2022/06/07 8 AMAV
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论