提取多列中的第一个非NA值。

huangapple go评论62阅读模式
英文:

Extract first Non NA value over multiple columns

问题

我仍在学习R,想知道是否有一种优雅的方法来操作下面的数据框(df)以获得df2。

我不确定是否应该使用循环来实现这一目标,但基本上我想提取第一行中如果“X_No”值为NA,则提取第一个非NA的“X_No”值。最好通过从df到期望的df2的示例来描述这一点。

希望对此有一个优雅的解决方案,因为有超过1000列类似于提供的示例。我已经在网上搜索了类似的示例,但未找到能够生成期望结果的示例。

非常感谢您的帮助。感谢。

英文:

I'm still learning R and was wondering if I there was an elegant way of manipulating the below df to achieve df2.

I'm not sure if it's a loop that is supposed to be used for this, but basically I want to extract the first Non NA "X_No" Value if the "X_No" value is NA in the first row. This would perhaps be best described through an example from df to the desired df2.

A_ID <- c('A','B','I','N')
A_No <- c(11,NA,15,NA)
B_ID <- c('B','C','D','J')
B_No <- c(NA,NA,12,NA)
C_ID <- c('E','F','G','P')
C_No <- c(NA,13,14,20)
D_ID <- c('J','K','L','M')
D_No <- c(NA,NA,NA,40)
E_ID <- c('W','X','Y','Z')
E_No <- c(50,32,48,40)
df <- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)

ID <- c('A','D','F','M','W')
No <- c(11,12,13,40,50)
df2 <- data.frame(ID,No)


I'm hoping for an elegant solution to this as there are over a 1000 columns similar to the example provided.
I've looked all over the web for a similar example however to no avail that would reproduce the expected result.

Your help is very much appreciated.
Thankyou

答案1

得分: 3

我不确定是否应该称其为“优雅”,但这里是一个潜在的解决方案:

library(tidyverse)

A_ID <- c('A','B','I','N')
A_No <- c(11,NA,15,NA)
B_ID <- c('B','C','D','J')
B_No <- c(NA,NA,12,NA)
C_ID <- c('E','F','G','P')
C_No <- c(NA,13,14,20)
D_ID <- c('J','K','L','M')
D_No <- c(NA,NA,NA,40)
E_ID <- c('W','X','Y','Z')
E_No <- c(50,32,48,40)
df <- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)

ID <- c('A','D','F','M','W')
No <- c(11,12,13,40,50)
df2 <- data.frame(ID,No)

output <- df %>%
  pivot_longer(everything(),
               names_sep = "_",
               names_to = c("Col", ".value")) %>%
  drop_na() %>%
  group_by(Col) %>%
  slice_head(n = 1) %>%
  ungroup() %>%
  select(-Col)

df2
#>   ID No
#> 1  A 11
#> 2  D 12
#> 3  F 13
#> 4  M 40
#> 5  W 50

output
#> # A tibble: 5 × 2
#>   ID       No
#>   <chr> <dbl>
#> 1 A        11
#> 2 D        12
#> 3 F        13
#> 4 M        40
#> 5 W        50

all_equal(df2, output)
#> [1] TRUE

2023-02-08创建,使用reprex v2.0.2

英文:

I don't know if I'd call it "elegant", but here is a potential solution:

library(tidyverse)

A_ID &lt;- c(&#39;A&#39;,&#39;B&#39;,&#39;I&#39;,&#39;N&#39;)
A_No &lt;- c(11,NA,15,NA)
B_ID &lt;- c(&#39;B&#39;,&#39;C&#39;,&#39;D&#39;,&#39;J&#39;)
B_No &lt;- c(NA,NA,12,NA)
C_ID &lt;- c(&#39;E&#39;,&#39;F&#39;,&#39;G&#39;,&#39;P&#39;)
C_No &lt;- c(NA,13,14,20)
D_ID &lt;- c(&#39;J&#39;,&#39;K&#39;,&#39;L&#39;,&#39;M&#39;)
D_No &lt;- c(NA,NA,NA,40)
E_ID &lt;- c(&#39;W&#39;,&#39;X&#39;,&#39;Y&#39;,&#39;Z&#39;)
E_No &lt;- c(50,32,48,40)
df &lt;- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)

ID &lt;- c(&#39;A&#39;,&#39;D&#39;,&#39;F&#39;,&#39;M&#39;,&#39;W&#39;)
No &lt;- c(11,12,13,40,50)
df2 &lt;- data.frame(ID,No)

output &lt;- df %&gt;%
  pivot_longer(everything(),
               names_sep = &quot;_&quot;,
               names_to = c(&quot;Col&quot;, &quot;.value&quot;)) %&gt;%
  drop_na() %&gt;%
  group_by(Col) %&gt;%
  slice_head(n = 1) %&gt;%
  ungroup() %&gt;%
  select(-Col)

df2
#&gt;   ID No
#&gt; 1  A 11
#&gt; 2  D 12
#&gt; 3  F 13
#&gt; 4  M 40
#&gt; 5  W 50

output
#&gt; # A tibble: 5 &#215; 2
#&gt;   ID       No
#&gt;   &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 A        11
#&gt; 2 D        12
#&gt; 3 F        13
#&gt; 4 M        40
#&gt; 5 W        50

all_equal(df2, output)
#&gt; [1] TRUE

<sup>Created on 2023-02-08 with reprex v2.0.2</sup>

答案2

得分: 2

使用 `base R` 和 `max.col`(假设列是交替的,包括ID和No)

```R
ind <- max.col(!is.na(t(df[c(FALSE, TRUE)])), "first")
m1 <- cbind(seq_along(ind), ind)
data.frame(ID = t(df[c(TRUE, FALSE)])[m1], No = t(df[c(FALSE, TRUE)])[m1])
  ID No
1  A 11
2  D 12
3  F 13
4  M 40
5  W 50

<details>
<summary>英文:</summary>
Using `base R` with `max.col` (assuming the columns are alternating with ID, No)

ind <- max.col(!is.na(t(df[c(FALSE, TRUE)])), "first")
m1 <- cbind(seq_along(ind), ind)
data.frame(ID = t(df[c(TRUE, FALSE)])[m1], No = t(df[c(FALSE, TRUE)])[m1])
ID No
1 A 11
2 D 12
3 F 13
4 M 40
5 W 50

答案3

得分: 1

这是一个可扩展到(非常)大型数据集的data.table解决方案。

功能性

  1. 根据列名,将数据框拆分为列块的列表。因此,所有以A_开头的列都进入第一个元素,所有以B_开头的列都进入第二个元素。

  2. 然后,将这些列表元素叠加在一起,使用data.table::rbindlist。忽略列名(只有当A_的列数与B_的列数相同,且与n_相同才有效)。

  3. 现在获取第一列中每个值的第一个非NA值。

代码

library(data.table)
# 根据下划线后的内容拆分
L <- split.default(df, f = gsub("(.*)_.*", "\\1", names(df)))
# 再次合并
DT <- rbindlist(L, use.names = FALSE)
# 提取第一个非NA的值
DT[!is.na(A_No), .(No = A_No[1]), keyby = .(ID = A_ID)]
#    ID No
# 1:  A 11
# 2:  D 12
# 3:  F 13
# 4:  G 14
# 5:  I 15
# 6:  M 40
# 7:  P 20
# 8:  W 50
# 9:  X 32
# 10: Y 48
# 11: Z 40

希望这对你有所帮助。

英文:

Here is a data.table solution that should scale well to a (very) large dataset.

functionally

  1. split the data.frame to a list of chunks of columns, based on their
    names. So all columns startting with A_ go to
    the first element, all colums startting with B_ to the second

  2. Then, put these list elements on top of each other, using
    data.table::rbindlist. Ignure the column-namaes (this only works if
    A_ has the same number of columns as B_ has the same number of cols
    as n_)

  3. Now get the first non-NA value of each value in the first column

code

library(data.table)
# split based on what comes after the underscore
L &lt;- split.default(df, f = gsub(&quot;(.*)_.*&quot;, &quot;\\1&quot;, names(df)))
# bind together again
DT &lt;- rbindlist(L, use.names = FALSE)
# extract the first value of the non-NA
DT[!is.na(A_No), .(No = A_No[1]), keyby = .(ID = A_ID)]
#    ID No
# 1:  A 11
# 2:  D 12
# 3:  F 13
# 4:  G 14
# 5:  I 15
# 6:  M 40
# 7:  P 20
# 8:  W 50
# 9:  X 32
#10:  Y 48
#11:  Z 40

huangapple
  • 本文由 发表于 2023年2月8日 12:59:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75381531.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定