2023年2月8日 12:59:15go评论85阅读模式

英文:

Extract first Non NA value over multiple columns

问题

我仍在学习R，想知道是否有一种优雅的方法来操作下面的数据框（df）以获得df2。

我不确定是否应该使用循环来实现这一目标，但基本上我想提取第一行中如果“X_No”值为NA，则提取第一个非NA的“X_No”值。最好通过从df到期望的df2的示例来描述这一点。

希望对此有一个优雅的解决方案，因为有超过1000列类似于提供的示例。我已经在网上搜索了类似的示例，但未找到能够生成期望结果的示例。

非常感谢您的帮助。感谢。

英文:

I'm still learning R and was wondering if I there was an elegant way of manipulating the below df to achieve df2.

I'm not sure if it's a loop that is supposed to be used for this, but basically I want to extract the first Non NA "X_No" Value if the "X_No" value is NA in the first row. This would perhaps be best described through an example from df to the desired df2.

A_ID &lt;- c(&#39;A&#39;,&#39;B&#39;,&#39;I&#39;,&#39;N&#39;)
A_No &lt;- c(11,NA,15,NA)
B_ID &lt;- c(&#39;B&#39;,&#39;C&#39;,&#39;D&#39;,&#39;J&#39;)
B_No &lt;- c(NA,NA,12,NA)
C_ID &lt;- c(&#39;E&#39;,&#39;F&#39;,&#39;G&#39;,&#39;P&#39;)
C_No &lt;- c(NA,13,14,20)
D_ID &lt;- c(&#39;J&#39;,&#39;K&#39;,&#39;L&#39;,&#39;M&#39;)
D_No &lt;- c(NA,NA,NA,40)
E_ID &lt;- c(&#39;W&#39;,&#39;X&#39;,&#39;Y&#39;,&#39;Z&#39;)
E_No &lt;- c(50,32,48,40)
df &lt;- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)
ID &lt;- c(&#39;A&#39;,&#39;D&#39;,&#39;F&#39;,&#39;M&#39;,&#39;W&#39;)
No &lt;- c(11,12,13,40,50)
df2 &lt;- data.frame(ID,No)

I'm hoping for an elegant solution to this as there are over a 1000 columns similar to the example provided.
I've looked all over the web for a similar example however to no avail that would reproduce the expected result.

Your help is very much appreciated.
Thankyou

答案1

得分: 3

我不确定是否应该称其为“优雅”，但这里是一个潜在的解决方案：

library(tidyverse)
A_ID <- c('A','B','I','N')
A_No <- c(11,NA,15,NA)
B_ID <- c('B','C','D','J')
B_No <- c(NA,NA,12,NA)
C_ID <- c('E','F','G','P')
C_No <- c(NA,13,14,20)
D_ID <- c('J','K','L','M')
D_No <- c(NA,NA,NA,40)
E_ID <- c('W','X','Y','Z')
E_No <- c(50,32,48,40)
df <- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)
ID <- c('A','D','F','M','W')
No <- c(11,12,13,40,50)
df2 <- data.frame(ID,No)
output <- df %>%
  pivot_longer(everything(),
               names_sep = "_",
               names_to = c("Col", ".value")) %>%
  drop_na() %>%
  group_by(Col) %>%
  slice_head(n = 1) %>%
  ungroup() %>%
  select(-Col)
df2
#>   ID No
#> 1  A 11
#> 2  D 12
#> 3  F 13
#> 4  M 40
#> 5  W 50
output
#> # A tibble: 5 × 2
#>   ID       No
#>   <chr> <dbl>
#> 1 A        11
#> 2 D        12
#> 3 F        13
#> 4 M        40
#> 5 W        50
all_equal(df2, output)
#> [1] TRUE

^{2023-02-08创建，使用reprex v2.0.2}

英文:

I don't know if I'd call it "elegant", but here is a potential solution:

library(tidyverse)
A_ID &lt;- c(&#39;A&#39;,&#39;B&#39;,&#39;I&#39;,&#39;N&#39;)
A_No &lt;- c(11,NA,15,NA)
B_ID &lt;- c(&#39;B&#39;,&#39;C&#39;,&#39;D&#39;,&#39;J&#39;)
B_No &lt;- c(NA,NA,12,NA)
C_ID &lt;- c(&#39;E&#39;,&#39;F&#39;,&#39;G&#39;,&#39;P&#39;)
C_No &lt;- c(NA,13,14,20)
D_ID &lt;- c(&#39;J&#39;,&#39;K&#39;,&#39;L&#39;,&#39;M&#39;)
D_No &lt;- c(NA,NA,NA,40)
E_ID &lt;- c(&#39;W&#39;,&#39;X&#39;,&#39;Y&#39;,&#39;Z&#39;)
E_No &lt;- c(50,32,48,40)
df &lt;- data.frame(A_ID,A_No,B_ID,B_No,C_ID,C_No,D_ID,D_No,E_ID,E_No)
ID &lt;- c(&#39;A&#39;,&#39;D&#39;,&#39;F&#39;,&#39;M&#39;,&#39;W&#39;)
No &lt;- c(11,12,13,40,50)
df2 &lt;- data.frame(ID,No)
output &lt;- df %&gt;%
  pivot_longer(everything(),
               names_sep = &quot;_&quot;,
               names_to = c(&quot;Col&quot;, &quot;.value&quot;)) %&gt;%
  drop_na() %&gt;%
  group_by(Col) %&gt;%
  slice_head(n = 1) %&gt;%
  ungroup() %&gt;%
  select(-Col)
df2
#&gt;   ID No
#&gt; 1  A 11
#&gt; 2  D 12
#&gt; 3  F 13
#&gt; 4  M 40
#&gt; 5  W 50
output
#&gt; # A tibble: 5 &#215; 2
#&gt;   ID       No
#&gt;   &lt;chr&gt; &lt;dbl&gt;
#&gt; 1 A        11
#&gt; 2 D        12
#&gt; 3 F        13
#&gt; 4 M        40
#&gt; 5 W        50
all_equal(df2, output)
#&gt; [1] TRUE

<sup>Created on 2023-02-08 with reprex v2.0.2</sup>

答案2

得分: 2

使用 `base R` 和 `max.col`（假设列是交替的，包括ID和No）
```R
ind <- max.col(!is.na(t(df[c(FALSE, TRUE)])), "first")
m1 <- cbind(seq_along(ind), ind)
data.frame(ID = t(df[c(TRUE, FALSE)])[m1], No = t(df[c(FALSE, TRUE)])[m1])
  ID No
1  A 11
2  D 12
3  F 13
4  M 40
5  W 50


<details>
<summary>英文:</summary>
Using `base R` with `max.col` (assuming the columns are alternating with ID, No)

ind <- max.col(!is.na(t(df[c(FALSE, TRUE)])), "first")
m1 <- cbind(seq_along(ind), ind)
data.frame(ID = t(df[c(TRUE, FALSE)])[m1], No = t(df[c(FALSE, TRUE)])[m1])
ID No
1 A 11
2 D 12
3 F 13
4 M 40
5 W 50

答案3

得分: 1

这是一个可扩展到（非常）大型数据集的data.table解决方案。

功能性

根据列名，将数据框拆分为列块的列表。因此，所有以A_开头的列都进入第一个元素，所有以B_开头的列都进入第二个元素。
然后，将这些列表元素叠加在一起，使用data.table::rbindlist。忽略列名（只有当A_的列数与B_的列数相同，且与n_相同才有效）。
现在获取第一列中每个值的第一个非NA值。

代码

library(data.table)
# 根据下划线后的内容拆分
L <- split.default(df, f = gsub("(.*)_.*", "\\1", names(df)))
# 再次合并
DT <- rbindlist(L, use.names = FALSE)
# 提取第一个非NA的值
DT[!is.na(A_No), .(No = A_No[1]), keyby = .(ID = A_ID)]
#    ID No
# 1:  A 11
# 2:  D 12
# 3:  F 13
# 4:  G 14
# 5:  I 15
# 6:  M 40
# 7:  P 20
# 8:  W 50
# 9:  X 32
# 10: Y 48
# 11: Z 40

希望这对你有所帮助。

英文:

Here is a data.table solution that should scale well to a (very) large dataset.

functionally

split the data.frame to a list of chunks of columns, based on their
names. So all columns startting with A_ go to
the first element, all colums startting with B_ to the second
Then, put these list elements on top of each other, using
data.table::rbindlist. Ignure the column-namaes (this only works if
A_ has the same number of columns as B_ has the same number of cols
as n_)
Now get the first non-NA value of each value in the first column

code

library(data.table)
# split based on what comes after the underscore
L &lt;- split.default(df, f = gsub(&quot;(.*)_.*&quot;, &quot;\\1&quot;, names(df)))
# bind together again
DT &lt;- rbindlist(L, use.names = FALSE)
# extract the first value of the non-NA
DT[!is.na(A_No), .(No = A_No[1]), keyby = .(ID = A_ID)]
#    ID No
# 1:  A 11
# 2:  D 12
# 3:  F 13
# 4:  G 14
# 5:  I 15
# 6:  M 40
# 7:  P 20
# 8:  W 50
# 9:  X 32
#10:  Y 48
#11:  Z 40

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

提取多列中的第一个非NA值。

问题

答案1

答案2

答案3

在R中的表格中超过阈值的颜色值

如何将两个ggplot2自定义函数合并，使图例分开。

反转图例和数据顺序以用于geom_bar

从R中的一个较大数据框中提取按国家名称筛选的数据框。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。