根据另一列中信息的有无,有条件地向下填充一列。

huangapple go评论47阅读模式
英文:

Conditionally fill a column downwards based on the presence/absence of information in another column

问题

我明白你的需求,你想要填充数据并删除多余的行,使数据表具有下面的结构:

Family Genus Species
F1 G1 S2
F1 G1 S2
F1 G2 S1
F1 G2 S2

你之前提到的尝试几乎正确,但需要稍作修改。下面是一种可能的解决方案:

library(dplyr)
library(tidyr)

df %>%
  fill(Family, Genus, .direction = "downup") %>%
  fill(Species, .direction = "downup") %>%
  filter(!(is.na(Family) & is.na(Genus) & is.na(Species))) %>%
  distinct()

这将首先使用 fill 函数填充Family和Genus,然后再填充Species。接着,它将删除那些所有列都为NA的行,并使用 distinct 函数来去除重复行。这应该会生成你所期望的结果。

请注意,你需要安装并加载dplyrtidyr包,如果还没有安装的话。

英文:

I am trying to fill a large file containing taxonomic information so that each row contains relevant data. At present, the file is structured, thus, from infraclass through to subspecies:

Family Genus Species
F1 NA NA
NA G1 NA
NA NA S1
NA NA S2
NA G2 NA
NA NA S1
NA NA S2

Here, for example, I would like to fill Genus downwards when there are data in Species, likewise for Family, and then remove the redundant rows. The end result should be something like this:

Family Genus Species
F1 G1 S2
F1 G1 S2
F1 G2 S1
F1 G2 S2

I've included example data, below:

df <- data.frame(
family = c("Rheaidae",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
genus = c(NA,"Rhea",NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
species = c(NA,NA,"americana",NA,NA,NA,NA,NA,"pennata",NA,NA,NA),
subspecies = c(NA,NA,NA,"americana","intermedia","nobilis","araneipes","albescens",NA,"garleppi","tarapacensis","pennata")
)

I've tried several different ways of combining mutate, ifelse, loops, etc. but I'm not getting anywhere. I've provided one example, below, which didn't work but was my most successful attempt, thus far.

df %>%
  mutate(g = ifelse(is.na(subspecies), "NA", 
            zoo::na.locf(species, na.rm = F)))
family genus species subspecies g
Rheaidae NA NA NA NA
NA Rhea NA NA NA
NA NA americana NA NA
NA NA NA americana americana
NA NA NA intermedia americana
NA NA NA nobilis americana
NA NA NA araneipes americana
NA NA NA albescens americana
NA NA pennata NA NA
NA NA NA garleppi pennata
NA NA NA tarapacensis pennata
NA NA NA pennata pennata

EDIT
@benson23 suggested using

df %>% fill(everything(), .direction = "downup") %>% distinct_all()

I'd come across something like this previously, but even though it does a good job of filling cells, it leaves me with erroneous rows. I've given an example, below:

family common genus species subspecies
Struthionidae NA NA NA NA
NA Ostriches NA NA NA
NA NA Struthio NA NA
NA NA NA camelus NA
NA NA NA NA australis
NA NA NA molybdophanes NA
Rheaidae NA NA NA NA
NA Rheas NA NA NA
NA NA Rhea NA NA
NA NA NA americana NA
NA NA NA NA americana
NA NA NA NA intermedia

Becomes:

family common genus species subspecies
Struthionidae Ostriches Struthio camelus australis
Struthionidae Ostriches Struthio camelus australis
Struthionidae Ostriches Struthio camelus australis
Struthionidae Ostriches Struthio camelus australis
Struthionidae Ostriches Struthio camelus australis
Struthionidae Ostriches Struthio molybdophanes australis
Rheaidae Ostriches Struthio molybdophanes australis
Rheaidae Rheas Struthio molybdophanes australis
Rheaidae Rheas Rhea molybdophanes australis
Rheaidae Rheas Rhea americana australis
Rheaidae Rheas Rhea americana americana
Rheaidae Rheas Rhea americana intermedia

Rather than:

family common genus species subspecies
Struthionidae Ostriches Struthio camelus australis
Struthionidae Ostriches Struthio molybdophanes NA
Rheaidae Rheas Rhea americana americana
Rheaidae Rheas Rhea americana intermedia

In short, the data run-on and there's no clear way to remove the erroneous rows.

答案1

得分: 2

df %>%
   fill(family) %>%
   group_by(family) %>%
   fill(common:species, .direction = 'downup') %>%
   group_by(across(family:species)) %>%
   reframe(across(everything(), ~if(all(is.na(.x))) NA else na.omit(.x)))

# A tibble: 4 × 5
  family        common    genus    species       subspecies
  <chr>         <chr>     <chr>    <chr>         <chr>     
1 Rheaidae      Rheas     Rhea     americana     americana 
2 Rheaidae      Rheas     Rhea     americana     intermedia
3 Struthionidae Ostriches Struthio camelus       australis 
4 Struthionidae Ostriches Struthio molybdophanes NA
英文:
df %&gt;%
   fill(family)%&gt;%
   group_by(family)%&gt;%
   fill(common:species, .direction = &#39;downup&#39;)%&gt;%
   group_by(across(family:species))%&gt;%
   reframe(across(everything(), ~if(all(is.na(.x)))NA else na.omit(.x)))

# A tibble: 4 &#215; 5
  family        common    genus    species       subspecies
  &lt;chr&gt;         &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;         &lt;chr&gt;     
1 Rheaidae      Rheas     Rhea     americana     americana 
2 Rheaidae      Rheas     Rhea     americana     intermedia
3 Struthionidae Ostriches Struthio camelus       australis 
4 Struthionidae Ostriches Struthio molybdophanes NA   

huangapple
  • 本文由 发表于 2023年3月31日 17:34:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/75896931.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定