英文:
Conditionally fill a column downwards based on the presence/absence of information in another column
问题
我明白你的需求,你想要填充数据并删除多余的行,使数据表具有下面的结构:
Family | Genus | Species |
---|---|---|
F1 | G1 | S2 |
F1 | G1 | S2 |
F1 | G2 | S1 |
F1 | G2 | S2 |
你之前提到的尝试几乎正确,但需要稍作修改。下面是一种可能的解决方案:
library(dplyr)
library(tidyr)
df %>%
fill(Family, Genus, .direction = "downup") %>%
fill(Species, .direction = "downup") %>%
filter(!(is.na(Family) & is.na(Genus) & is.na(Species))) %>%
distinct()
这将首先使用 fill
函数填充Family和Genus,然后再填充Species。接着,它将删除那些所有列都为NA的行,并使用 distinct
函数来去除重复行。这应该会生成你所期望的结果。
请注意,你需要安装并加载dplyr
和tidyr
包,如果还没有安装的话。
英文:
I am trying to fill a large file containing taxonomic information so that each row contains relevant data. At present, the file is structured, thus, from infraclass through to subspecies:
Family | Genus | Species |
---|---|---|
F1 | NA | NA |
NA | G1 | NA |
NA | NA | S1 |
NA | NA | S2 |
NA | G2 | NA |
NA | NA | S1 |
NA | NA | S2 |
Here, for example, I would like to fill Genus downwards when there are data in Species, likewise for Family, and then remove the redundant rows. The end result should be something like this:
Family | Genus | Species |
---|---|---|
F1 | G1 | S2 |
F1 | G1 | S2 |
F1 | G2 | S1 |
F1 | G2 | S2 |
I've included example data, below:
df <- data.frame(
family = c("Rheaidae",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
genus = c(NA,"Rhea",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
species = c(NA,NA,"americana",NA,NA,NA,NA,NA,"pennata",NA,NA,NA),
subspecies = c(NA,NA,NA,"americana","intermedia","nobilis","araneipes","albescens",NA,"garleppi","tarapacensis","pennata")
)
I've tried several different ways of combining mutate, ifelse, loops, etc. but I'm not getting anywhere. I've provided one example, below, which didn't work but was my most successful attempt, thus far.
df %>%
mutate(g = ifelse(is.na(subspecies), "NA",
zoo::na.locf(species, na.rm = F)))
family | genus | species | subspecies | g |
---|---|---|---|---|
Rheaidae | NA | NA | NA | NA |
NA | Rhea | NA | NA | NA |
NA | NA | americana | NA | NA |
NA | NA | NA | americana | americana |
NA | NA | NA | intermedia | americana |
NA | NA | NA | nobilis | americana |
NA | NA | NA | araneipes | americana |
NA | NA | NA | albescens | americana |
NA | NA | pennata | NA | NA |
NA | NA | NA | garleppi | pennata |
NA | NA | NA | tarapacensis | pennata |
NA | NA | NA | pennata | pennata |
EDIT
@benson23 suggested using
df %>% fill(everything(), .direction = "downup") %>% distinct_all()
I'd come across something like this previously, but even though it does a good job of filling cells, it leaves me with erroneous rows. I've given an example, below:
family | common | genus | species | subspecies |
---|---|---|---|---|
Struthionidae | NA | NA | NA | NA |
NA | Ostriches | NA | NA | NA |
NA | NA | Struthio | NA | NA |
NA | NA | NA | camelus | NA |
NA | NA | NA | NA | australis |
NA | NA | NA | molybdophanes | NA |
Rheaidae | NA | NA | NA | NA |
NA | Rheas | NA | NA | NA |
NA | NA | Rhea | NA | NA |
NA | NA | NA | americana | NA |
NA | NA | NA | NA | americana |
NA | NA | NA | NA | intermedia |
Becomes:
family | common | genus | species | subspecies |
---|---|---|---|---|
Struthionidae | Ostriches | Struthio | camelus | australis |
Struthionidae | Ostriches | Struthio | camelus | australis |
Struthionidae | Ostriches | Struthio | camelus | australis |
Struthionidae | Ostriches | Struthio | camelus | australis |
Struthionidae | Ostriches | Struthio | camelus | australis |
Struthionidae | Ostriches | Struthio | molybdophanes | australis |
Rheaidae | Ostriches | Struthio | molybdophanes | australis |
Rheaidae | Rheas | Struthio | molybdophanes | australis |
Rheaidae | Rheas | Rhea | molybdophanes | australis |
Rheaidae | Rheas | Rhea | americana | australis |
Rheaidae | Rheas | Rhea | americana | americana |
Rheaidae | Rheas | Rhea | americana | intermedia |
Rather than:
family | common | genus | species | subspecies |
---|---|---|---|---|
Struthionidae | Ostriches | Struthio | camelus | australis |
Struthionidae | Ostriches | Struthio | molybdophanes | NA |
Rheaidae | Rheas | Rhea | americana | americana |
Rheaidae | Rheas | Rhea | americana | intermedia |
In short, the data run-on and there's no clear way to remove the erroneous rows.
答案1
得分: 2
df %>%
fill(family) %>%
group_by(family) %>%
fill(common:species, .direction = 'downup') %>%
group_by(across(family:species)) %>%
reframe(across(everything(), ~if(all(is.na(.x))) NA else na.omit(.x)))
# A tibble: 4 × 5
family common genus species subspecies
<chr> <chr> <chr> <chr> <chr>
1 Rheaidae Rheas Rhea americana americana
2 Rheaidae Rheas Rhea americana intermedia
3 Struthionidae Ostriches Struthio camelus australis
4 Struthionidae Ostriches Struthio molybdophanes NA
英文:
df %>%
fill(family)%>%
group_by(family)%>%
fill(common:species, .direction = 'downup')%>%
group_by(across(family:species))%>%
reframe(across(everything(), ~if(all(is.na(.x)))NA else na.omit(.x)))
# A tibble: 4 × 5
family common genus species subspecies
<chr> <chr> <chr> <chr> <chr>
1 Rheaidae Rheas Rhea americana americana
2 Rheaidae Rheas Rhea americana intermedia
3 Struthionidae Ostriches Struthio camelus australis
4 Struthionidae Ostriches Struthio molybdophanes NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论