提取多列中的第一个非NA值。

huangapple go评论84阅读模式
英文:

Extract first Non NA value over multiple columns

问题

我仍在学习R,想知道是否有一种优雅的方法来操作下面的数据框(df)以获得df2。

我不确定是否应该使用循环来实现这一目标,但基本上我想提取第一行中如果“X_No”值为NA,则提取第一个非NA的“X_No”值。最好通过从df到期望的df2的示例来描述这一点。

希望对此有一个优雅的解决方案,因为有超过1000列类似于提供的示例。我已经在网上搜索了类似的示例,但未找到能够生成期望结果的示例。

非常感谢您的帮助。感谢。

英文:

I'm still learning R and was wondering if I there was an elegant way of manipulating the below df to achieve df2.

I'm not sure if it's a loop that is supposed to be used for this, but basically I want to extract the first Non NA "X_No" Value if the "X_No" value is NA in the first row. This would perhaps be best described through an example from df to the desired df2.

  1. A_ID <- c('A','B','I','N')
  2. A_No <- c(11,NA,15,NA)
  3. B_ID <- c('B','C','D','J')
  4. B_No <- c(NA,NA,12,NA)
  5. C_ID <- c('E','F','G','P')
  6. C_No <- c(NA,13,14,20)
  7. D_ID <- c('J','K','L','M')
  8. D_No <- c(NA,NA,NA,40)
  9. E_ID <- c('W','X','Y','Z')
  10. E_No <- c(50,32,48,40)
  11. df <- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)
  12. ID <- c('A','D','F','M','W')
  13. No <- c(11,12,13,40,50)
  14. df2 <- data.frame(ID,No)

I'm hoping for an elegant solution to this as there are over a 1000 columns similar to the example provided.
I've looked all over the web for a similar example however to no avail that would reproduce the expected result.

Your help is very much appreciated.
Thankyou

答案1

得分: 3

我不确定是否应该称其为“优雅”,但这里是一个潜在的解决方案:

  1. library(tidyverse)
  2. A_ID <- c('A','B','I','N')
  3. A_No <- c(11,NA,15,NA)
  4. B_ID <- c('B','C','D','J')
  5. B_No <- c(NA,NA,12,NA)
  6. C_ID <- c('E','F','G','P')
  7. C_No <- c(NA,13,14,20)
  8. D_ID <- c('J','K','L','M')
  9. D_No <- c(NA,NA,NA,40)
  10. E_ID <- c('W','X','Y','Z')
  11. E_No <- c(50,32,48,40)
  12. df <- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)
  13. ID <- c('A','D','F','M','W')
  14. No <- c(11,12,13,40,50)
  15. df2 <- data.frame(ID,No)
  16. output <- df %>%
  17. pivot_longer(everything(),
  18. names_sep = "_",
  19. names_to = c("Col", ".value")) %>%
  20. drop_na() %>%
  21. group_by(Col) %>%
  22. slice_head(n = 1) %>%
  23. ungroup() %>%
  24. select(-Col)
  25. df2
  26. #> ID No
  27. #> 1 A 11
  28. #> 2 D 12
  29. #> 3 F 13
  30. #> 4 M 40
  31. #> 5 W 50
  32. output
  33. #> # A tibble: 5 × 2
  34. #> ID No
  35. #> <chr> <dbl>
  36. #> 1 A 11
  37. #> 2 D 12
  38. #> 3 F 13
  39. #> 4 M 40
  40. #> 5 W 50
  41. all_equal(df2, output)
  42. #> [1] TRUE

2023-02-08创建,使用reprex v2.0.2

英文:

I don't know if I'd call it "elegant", but here is a potential solution:

  1. library(tidyverse)
  2. A_ID &lt;- c(&#39;A&#39;,&#39;B&#39;,&#39;I&#39;,&#39;N&#39;)
  3. A_No &lt;- c(11,NA,15,NA)
  4. B_ID &lt;- c(&#39;B&#39;,&#39;C&#39;,&#39;D&#39;,&#39;J&#39;)
  5. B_No &lt;- c(NA,NA,12,NA)
  6. C_ID &lt;- c(&#39;E&#39;,&#39;F&#39;,&#39;G&#39;,&#39;P&#39;)
  7. C_No &lt;- c(NA,13,14,20)
  8. D_ID &lt;- c(&#39;J&#39;,&#39;K&#39;,&#39;L&#39;,&#39;M&#39;)
  9. D_No &lt;- c(NA,NA,NA,40)
  10. E_ID &lt;- c(&#39;W&#39;,&#39;X&#39;,&#39;Y&#39;,&#39;Z&#39;)
  11. E_No &lt;- c(50,32,48,40)
  12. df &lt;- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)
  13. ID &lt;- c(&#39;A&#39;,&#39;D&#39;,&#39;F&#39;,&#39;M&#39;,&#39;W&#39;)
  14. No &lt;- c(11,12,13,40,50)
  15. df2 &lt;- data.frame(ID,No)
  16. output &lt;- df %&gt;%
  17. pivot_longer(everything(),
  18. names_sep = &quot;_&quot;,
  19. names_to = c(&quot;Col&quot;, &quot;.value&quot;)) %&gt;%
  20. drop_na() %&gt;%
  21. group_by(Col) %&gt;%
  22. slice_head(n = 1) %&gt;%
  23. ungroup() %&gt;%
  24. select(-Col)
  25. df2
  26. #&gt; ID No
  27. #&gt; 1 A 11
  28. #&gt; 2 D 12
  29. #&gt; 3 F 13
  30. #&gt; 4 M 40
  31. #&gt; 5 W 50
  32. output
  33. #&gt; # A tibble: 5 &#215; 2
  34. #&gt; ID No
  35. #&gt; &lt;chr&gt; &lt;dbl&gt;
  36. #&gt; 1 A 11
  37. #&gt; 2 D 12
  38. #&gt; 3 F 13
  39. #&gt; 4 M 40
  40. #&gt; 5 W 50
  41. all_equal(df2, output)
  42. #&gt; [1] TRUE

<sup>Created on 2023-02-08 with reprex v2.0.2</sup>

答案2

得分: 2

  1. 使用 `base R` `max.col`(假设列是交替的,包括IDNo
  2. ```R
  3. ind <- max.col(!is.na(t(df[c(FALSE, TRUE)])), "first")
  4. m1 <- cbind(seq_along(ind), ind)
  5. data.frame(ID = t(df[c(TRUE, FALSE)])[m1], No = t(df[c(FALSE, TRUE)])[m1])
  6. ID No
  7. 1 A 11
  8. 2 D 12
  9. 3 F 13
  10. 4 M 40
  11. 5 W 50
  1. <details>
  2. <summary>英文:</summary>
  3. Using `base R` with `max.col` (assuming the columns are alternating with ID, No)

ind <- max.col(!is.na(t(df[c(FALSE, TRUE)])), "first")
m1 <- cbind(seq_along(ind), ind)
data.frame(ID = t(df[c(TRUE, FALSE)])[m1], No = t(df[c(FALSE, TRUE)])[m1])
ID No
1 A 11
2 D 12
3 F 13
4 M 40
5 W 50

答案3

得分: 1

这是一个可扩展到(非常)大型数据集的data.table解决方案。

功能性

  1. 根据列名,将数据框拆分为列块的列表。因此,所有以A_开头的列都进入第一个元素,所有以B_开头的列都进入第二个元素。

  2. 然后,将这些列表元素叠加在一起,使用data.table::rbindlist。忽略列名(只有当A_的列数与B_的列数相同,且与n_相同才有效)。

  3. 现在获取第一列中每个值的第一个非NA值。

代码

  1. library(data.table)
  2. # 根据下划线后的内容拆分
  3. L <- split.default(df, f = gsub("(.*)_.*", "\\1", names(df)))
  4. # 再次合并
  5. DT <- rbindlist(L, use.names = FALSE)
  6. # 提取第一个非NA的值
  7. DT[!is.na(A_No), .(No = A_No[1]), keyby = .(ID = A_ID)]
  8. # ID No
  9. # 1: A 11
  10. # 2: D 12
  11. # 3: F 13
  12. # 4: G 14
  13. # 5: I 15
  14. # 6: M 40
  15. # 7: P 20
  16. # 8: W 50
  17. # 9: X 32
  18. # 10: Y 48
  19. # 11: Z 40

希望这对你有所帮助。

英文:

Here is a data.table solution that should scale well to a (very) large dataset.

functionally

  1. split the data.frame to a list of chunks of columns, based on their
    names. So all columns startting with A_ go to
    the first element, all colums startting with B_ to the second

  2. Then, put these list elements on top of each other, using
    data.table::rbindlist. Ignure the column-namaes (this only works if
    A_ has the same number of columns as B_ has the same number of cols
    as n_)

  3. Now get the first non-NA value of each value in the first column

code

  1. library(data.table)
  2. # split based on what comes after the underscore
  3. L &lt;- split.default(df, f = gsub(&quot;(.*)_.*&quot;, &quot;\\1&quot;, names(df)))
  4. # bind together again
  5. DT &lt;- rbindlist(L, use.names = FALSE)
  6. # extract the first value of the non-NA
  7. DT[!is.na(A_No), .(No = A_No[1]), keyby = .(ID = A_ID)]
  8. # ID No
  9. # 1: A 11
  10. # 2: D 12
  11. # 3: F 13
  12. # 4: G 14
  13. # 5: I 15
  14. # 6: M 40
  15. # 7: P 20
  16. # 8: W 50
  17. # 9: X 32
  18. #10: Y 48
  19. #11: Z 40

huangapple
  • 本文由 发表于 2023年2月8日 12:59:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381531.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定