2023年3月31日 17:34:22go评论83阅读模式

英文:

Conditionally fill a column downwards based on the presence/absence of information in another column

问题

我明白你的需求，你想要填充数据并删除多余的行，使数据表具有下面的结构：

Family	Genus	Species
F1	G1	S2
F1	G1	S2
F1	G2	S1
F1	G2	S2

你之前提到的尝试几乎正确，但需要稍作修改。下面是一种可能的解决方案：

library(dplyr)
library(tidyr)
df %>%
  fill(Family, Genus, .direction = "downup") %>%
  fill(Species, .direction = "downup") %>%
  filter(!(is.na(Family) & is.na(Genus) & is.na(Species))) %>%
  distinct()

这将首先使用 fill 函数填充Family和Genus，然后再填充Species。接着，它将删除那些所有列都为NA的行，并使用 distinct 函数来去除重复行。这应该会生成你所期望的结果。

请注意，你需要安装并加载dplyr和tidyr包，如果还没有安装的话。

英文:

I am trying to fill a large file containing taxonomic information so that each row contains relevant data. At present, the file is structured, thus, from infraclass through to subspecies:

Family	Genus	Species
F1	NA	NA
NA	G1	NA
NA	NA	S1
NA	NA	S2
NA	G2	NA
NA	NA	S1
NA	NA	S2

Here, for example, I would like to fill Genus downwards when there are data in Species, likewise for Family, and then remove the redundant rows. The end result should be something like this:

Family	Genus	Species
F1	G1	S2
F1	G1	S2
F1	G2	S1
F1	G2	S2

I've included example data, below:

df &lt;- data.frame(
family = c(&quot;Rheaidae&quot;,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
genus = c(NA,&quot;Rhea&quot;,NA,NA,NA,NA,NA,NA,NA,NA,NA,NA),
species = c(NA,NA,&quot;americana&quot;,NA,NA,NA,NA,NA,&quot;pennata&quot;,NA,NA,NA),
subspecies = c(NA,NA,NA,&quot;americana&quot;,&quot;intermedia&quot;,&quot;nobilis&quot;,&quot;araneipes&quot;,&quot;albescens&quot;,NA,&quot;garleppi&quot;,&quot;tarapacensis&quot;,&quot;pennata&quot;)
)

I've tried several different ways of combining mutate, ifelse, loops, etc. but I'm not getting anywhere. I've provided one example, below, which didn't work but was my most successful attempt, thus far.

df %&gt;%
  mutate(g = ifelse(is.na(subspecies), &quot;NA&quot;, 
            zoo::na.locf(species, na.rm = F)))

family	genus	species	subspecies	g
Rheaidae	NA	NA	NA	NA
NA	Rhea	NA	NA	NA
NA	NA	americana	NA	NA
NA	NA	NA	americana	americana
NA	NA	NA	intermedia	americana
NA	NA	NA	nobilis	americana
NA	NA	NA	araneipes	americana
NA	NA	NA	albescens	americana
NA	NA	pennata	NA	NA
NA	NA	NA	garleppi	pennata
NA	NA	NA	tarapacensis	pennata
NA	NA	NA	pennata	pennata

EDIT
@benson23 suggested using

df %&gt;% fill(everything(), .direction = &quot;downup&quot;) %&gt;% distinct_all()

I'd come across something like this previously, but even though it does a good job of filling cells, it leaves me with erroneous rows. I've given an example, below:

family	common	genus	species	subspecies
Struthionidae	NA	NA	NA	NA
NA	Ostriches	NA	NA	NA
NA	NA	Struthio	NA	NA
NA	NA	NA	camelus	NA
NA	NA	NA	NA	australis
NA	NA	NA	molybdophanes	NA
Rheaidae	NA	NA	NA	NA
NA	Rheas	NA	NA	NA
NA	NA	Rhea	NA	NA
NA	NA	NA	americana	NA
NA	NA	NA	NA	americana
NA	NA	NA	NA	intermedia

Becomes:

family	common	genus	species	subspecies
Struthionidae	Ostriches	Struthio	camelus	australis
Struthionidae	Ostriches	Struthio	camelus	australis
Struthionidae	Ostriches	Struthio	camelus	australis
Struthionidae	Ostriches	Struthio	camelus	australis
Struthionidae	Ostriches	Struthio	camelus	australis
Struthionidae	Ostriches	Struthio	molybdophanes	australis
Rheaidae	Ostriches	Struthio	molybdophanes	australis
Rheaidae	Rheas	Struthio	molybdophanes	australis
Rheaidae	Rheas	Rhea	molybdophanes	australis
Rheaidae	Rheas	Rhea	americana	australis
Rheaidae	Rheas	Rhea	americana	americana
Rheaidae	Rheas	Rhea	americana	intermedia

Rather than:

family	common	genus	species	subspecies
Struthionidae	Ostriches	Struthio	camelus	australis
Struthionidae	Ostriches	Struthio	molybdophanes	NA
Rheaidae	Rheas	Rhea	americana	americana
Rheaidae	Rheas	Rhea	americana	intermedia

In short, the data run-on and there's no clear way to remove the erroneous rows.

答案1

得分: 2

df %>%
   fill(family) %>%
   group_by(family) %>%
   fill(common:species, .direction = 'downup') %>%
   group_by(across(family:species)) %>%
   reframe(across(everything(), ~if(all(is.na(.x))) NA else na.omit(.x)))
# A tibble: 4 × 5
  family        common    genus    species       subspecies
  <chr>         <chr>     <chr>    <chr>         <chr>     
1 Rheaidae      Rheas     Rhea     americana     americana 
2 Rheaidae      Rheas     Rhea     americana     intermedia
3 Struthionidae Ostriches Struthio camelus       australis 
4 Struthionidae Ostriches Struthio molybdophanes NA

英文:

df %&gt;%
   fill(family)%&gt;%
   group_by(family)%&gt;%
   fill(common:species, .direction = &#39;downup&#39;)%&gt;%
   group_by(across(family:species))%&gt;%
   reframe(across(everything(), ~if(all(is.na(.x)))NA else na.omit(.x)))
# A tibble: 4 &#215; 5
  family        common    genus    species       subspecies
  &lt;chr&gt;         &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;         &lt;chr&gt;     
1 Rheaidae      Rheas     Rhea     americana     americana 
2 Rheaidae      Rheas     Rhea     americana     intermedia
3 Struthionidae Ostriches Struthio camelus       australis 
4 Struthionidae Ostriches Struthio molybdophanes NA

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据另一列中信息的有无，有条件地向下填充一列。

问题

答案1

多个匹配项的光标位置（R中的Officer包）

for loop or lapply (sapply, tapply…?) to extract all counts between two types of variables in a data frame in r

Assign Positions MS Access

如何确保nlme()函数调用的可重现性？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。