如何在R中使用相邻列的值和附加文本来替换数据框中的NA值

huangapple go评论95阅读模式
英文:

How to replace NA values in dataframe with adjacent columns value + additional text to differentiate them in R

问题

我好奇是否可以用左侧列中的文本(不是NA值的列)替换数据框中的NA值,然后在末尾添加"_unclassified"文本。

以下是一个示例数据框:

  1. feature <- c("1",
  2. "2",
  3. "3",
  4. "4",
  5. "5" )
  6. phylum <- c("Firmicutes",
  7. "Firmicutes",
  8. "Firmicutes",
  9. "Proteobacteria",
  10. "Firmicutes" )
  11. class <- c(NA,
  12. "Clostridia",
  13. "Clostridia",
  14. "Gammaproteobacteria",
  15. "Bacilli" )
  16. order <- c(NA,
  17. NA,
  18. "Oscillospirales",
  19. "Enterobacterales",
  20. "Staphylococcales" )
  21. family <- c(NA,
  22. NA,
  23. NA,
  24. "Enterobacteriaceae",
  25. "Staphylococcaceae" )
  26. genus <- c(NA,
  27. NA,
  28. NA,
  29. NA,
  30. "Staphylococcus")
  31. df <- data.frame(feature, phylum, class, order, family, genus)

例如,

feature 1 将在class,order,family,genus中有Firmicutes_unclassified

feature 2 将在order,family和genus中有Clostridia_unclassified

feature 3 将在family和genus中有Oscillospirales_unclassified

feature 4 将在genus中有Enterobacteriaceae_unclassified。

英文:

I'm curious if I could replace NA values in my data frame with text from the column to the left (that does not have NA), with an additional "_unclassified" text on the end.

Here is an example data frame:

  1. feature &lt;- c(&quot;1&quot;,
  2. &quot;2&quot;,
  3. &quot;3&quot;,
  4. &quot;4&quot;,
  5. &quot;5&quot; )
  6. phylum &lt;- c(&quot;Firmicutes&quot;,
  7. &quot;Firmicutes&quot;,
  8. &quot;Firmicutes&quot;,
  9. &quot;Proteobacteria&quot;,
  10. &quot;Firmicutes&quot; )
  11. class &lt;- c(NA,
  12. &quot;Clostridia&quot;,
  13. &quot;Clostridia&quot;,
  14. &quot;Gammaproteobacteria&quot;,
  15. &quot;Bacilli&quot; )
  16. order &lt;- c(NA,
  17. NA,
  18. &quot;Oscillospirales&quot;,
  19. &quot;Enterobacterales&quot;,
  20. &quot;Staphylococcales&quot; )
  21. family &lt;- c(NA,
  22. NA,
  23. NA,
  24. &quot;Enterobacteriaceae&quot;,
  25. &quot;Staphylococcaceae&quot; )
  26. genus &lt;- c(NA,
  27. NA,
  28. NA,
  29. NA,
  30. &quot;Staphylococcus&quot;)
  31. df &lt;- data.frame(feature, phylum, class, order, family, genus)

For example,

feature 1 would have Firmicutes_unclassified across class, order, family, genus

feature 2 would have Clostridia_unclassified across order, family, and genus

feature 3 would have Oscillospirales_unclassified across family and genus

feature 4 would have Enterobacteriaceae_unclassified for genus

答案1

得分: 1

  1. > df
  2. feature phylum class order family
  3. 1 1 厚壁菌门 厚壁菌门_未分类 厚壁菌门_未分类 厚壁菌门_未分类
  4. 2 2 厚壁菌门 梭菌纲 梭菌纲_未分类 梭菌纲_未分类
  5. 3 3 厚壁菌门 梭菌纲 压摩纲 压摩纲_未分类
  6. 4 4 变形菌门 伽马变形菌纲 肠杆菌目 肠杆菌科
  7. 5 5 厚壁菌门 乳杆菌纲 葡萄球菌目 葡萄球菌科
  8. genus
  9. 1 厚壁菌门_未分类
  10. 2 梭菌纲_未分类
  11. 3 压摩纲_未分类
  12. 4 肠杆菌科_未分类
  13. 5 葡萄球菌
英文:

An option with na.locf from zoo

  1. library(zoo)
  2. df[-1] &lt;- t(apply(df[-1], 1, \(x) ifelse(is.na(x), paste0(na.locf0(x),
  3. &#39;_unclassified&#39;), x)))

-output

  1. &gt; df
  2. feature phylum class order family
  3. 1 1 Firmicutes Firmicutes_unclassified Firmicutes_unclassified Firmicutes_unclassified
  4. 2 2 Firmicutes Clostridia Clostridia_unclassified Clostridia_unclassified
  5. 3 3 Firmicutes Clostridia Oscillospirales Oscillospirales_unclassified
  6. 4 4 Proteobacteria Gammaproteobacteria Enterobacterales Enterobacteriaceae
  7. 5 5 Firmicutes Bacilli Staphylococcales Staphylococcaceae
  8. genus
  9. 1 Firmicutes_unclassified
  10. 2 Clostridia_unclassified
  11. 3 Oscillospirales_unclassified
  12. 4 Enterobacteriaceae_unclassified
  13. 5 Staphylococcus

答案2

得分: 1

使用纯粹的基础R,可以使用以下一行代码来执行这个操作:

  1. df[-1] <- t(apply(df[-1], MARGIN=1, function(x) replace(x, is.na(x), paste0(tail(na.omit(x), n=1), '_unclassified'))))
  2. df

这行代码会在数据框中去除第一列(df[-1])后,对每一行应用一个匿名函数,该函数会将包含NA值的元素替换为去除NA值后的最后一个元素,并在其后添加 '_unclassified' 后缀。

英文:

One-liner, using just base R.

  1. df[-1] &lt;- t(apply(df[-1], MARGIN=1, \(x) replace(x, is.na(x), paste0(tail(na.omit(x), n=1), &#39;_unclassified&#39;))))
  2. df
  3. # feature phylum class order family genus
  4. # 1 1 Firmicutes Firmicutes_unclassified Firmicutes_unclassified Firmicutes_unclassified Firmicutes_unclassified
  5. # 2 2 Firmicutes Clostridia Clostridia_unclassified Clostridia_unclassified Clostridia_unclassified
  6. # 3 3 Firmicutes Clostridia Oscillospirales Oscillospirales_unclassified Oscillospirales_unclassified
  7. # 4 4 Proteobacteria Gammaproteobacteria Enterobacterales Enterobacteriaceae Enterobacteriaceae_unclassified
  8. # 5 5 Firmicutes Bacilli Staphylococcales Staphylococcaceae Staphylococcus

Explanation:

We apply an anonymous function \(x) on MARGIN=1 (i.e. row-wise) on the data frame while excluding first column df[-1]. In the anonymous function we replace in every row x where is.na(x) is TRUE by the tail of length n=1 of na.omit(x) (i.e. x without the NAs while paste0ing suffix &#39;_unclassified&#39; to it.

答案3

得分: 0

以下是翻译好的部分:

  1. library(tidyverse)
  2. library(vctrs)
  3. feature &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot; )
  4. phylum &lt;- c(&quot;Firmicutes&quot;, &quot;Firmicutes&quot;, &quot;Firmicutes&quot;, &quot;Proteobacteria&quot;, &quot;Firmicutes&quot; )
  5. class &lt;- c(NA, &quot;Clostridia&quot;, &quot;Clostridia&quot;, &quot;Gammaproteobacteria&quot;, &quot;Bacilli&quot; )
  6. order &lt;- c(NA, NA, &quot;Oscillospirales&quot;, &quot;Enterobacterales&quot;, &quot;Staphylococcales&quot; )
  7. family &lt;- c(NA, NA, NA, &quot;Enterobacteriaceae&quot;, &quot;Staphylococcaceae&quot;)
  8. genus &lt;- c(NA, NA, NA, NA, &quot;Staphylococcus&quot;)
  9. df &lt;- data.frame(feature, phylum, class, order, family, genus)
  10. df2 &lt;- df %&gt;%
  11. t() %&gt;%
  12. as_tibble(.name_repair = ~vec_as_names(..., repair = &quot;unique&quot;, quiet = TRUE)) %&gt;%
  13. mutate(across(everything(), ~if_else(
  14. is.na(.x),
  15. paste0(vec_fill_missing(.x, direction = &quot;down&quot;), &quot;_unclassified&quot;),
  16. .x))) %&gt;%
  17. t() %&gt;%
  18. as_tibble(.name_repair = ~colnames(df))
  19. df2
  20. #&gt; # A tibble: 5 &#215; 6
  21. #&gt; feature phylum class order family genus
  22. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  23. #&gt; 1 1 Firmicutes Firmicutes_unclassified Firmicutes_unclas… Firmi… Firm…
  24. #&gt; 2 2 Firmicutes Clostridia Clostridia_unclas… Clost… Clos…
  25. #&gt; 3 3 Firmicutes Clostridia Oscillospirales Oscil… Osci…
  26. #&gt; 4 4 Proteobacteria Gammaproteobacteria Enterobacterales Enter… Ente…
  27. #&gt; 5 5 Firmicutes Bacilli Staphylococcales Staph… Stap…

创建于2023年02月10日,使用 reprex v2.0.2

英文:

Here is one potential solution which transposes the dataframe, fills the NAs with the most recent non-NA value, then transposes the dataframe back again:

  1. library(tidyverse)
  2. library(vctrs)
  3. feature &lt;- c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot; )
  4. phylum &lt;- c(&quot;Firmicutes&quot;, &quot;Firmicutes&quot;, &quot;Firmicutes&quot;, &quot;Proteobacteria&quot;, &quot;Firmicutes&quot; )
  5. class &lt;- c(NA, &quot;Clostridia&quot;, &quot;Clostridia&quot;, &quot;Gammaproteobacteria&quot;, &quot;Bacilli&quot; )
  6. order &lt;- c(NA, NA, &quot;Oscillospirales&quot;, &quot;Enterobacterales&quot;, &quot;Staphylococcales&quot; )
  7. family &lt;- c(NA, NA, NA, &quot;Enterobacteriaceae&quot;, &quot;Staphylococcaceae&quot;)
  8. genus &lt;- c(NA, NA, NA, NA, &quot;Staphylococcus&quot;)
  9. df &lt;- data.frame(feature, phylum, class, order, family, genus)
  10. df2 &lt;- df %&gt;%
  11. t() %&gt;%
  12. as_tibble(.name_repair = ~vec_as_names(..., repair = &quot;unique&quot;, quiet = TRUE)) %&gt;%
  13. mutate(across(everything(), ~if_else(
  14. is.na(.x),
  15. paste0(vec_fill_missing(.x, direction = &quot;down&quot;), &quot;_unclassified&quot;),
  16. .x))) %&gt;%
  17. t() %&gt;%
  18. as_tibble(.name_repair = ~colnames(df))
  19. df2
  20. #&gt; # A tibble: 5 &#215; 6
  21. #&gt; feature phylum class order family genus
  22. #&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  23. #&gt; 1 1 Firmicutes Firmicutes_unclassified Firmicutes_unclas… Firmi… Firm…
  24. #&gt; 2 2 Firmicutes Clostridia Clostridia_unclas… Clost… Clos…
  25. #&gt; 3 3 Firmicutes Clostridia Oscillospirales Oscil… Osci…
  26. #&gt; 4 4 Proteobacteria Gammaproteobacteria Enterobacterales Enter… Ente…
  27. #&gt; 5 5 Firmicutes Bacilli Staphylococcales Staph… Stap…

<sup>Created on 2023-02-10 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月10日 06:34:28
  • 转载请务必保留本文链接:https://go.coder-hub.com/75405150.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定