将表格从纵向整合到横向,高效地完成。

huangapple go评论125阅读模式
英文:

Consolidate table from vertical to horizontal efficiently

问题

在多个ID上具有唯一特征的大表格(表A)。是否有巧妙的方法可以横向合并这些值,以便在第二个表B中,行中有唯一的ID,并且列中包含出现的特征(每个ID中也可能以不同数量出现)?我希望在ID行中缺少特征的字段填充为NA。由于每个ID最多具有22个唯一特征,所以最大的列数应该是23(包括ID)。

使用循环可以实现,但需要很长时间。

我尝试了https://stackoverflow.com/q/5890584 中的所有解决方案都没有成功。

例如,对于reshapecastdcast和其他函数,向量太大,导致以下错误:
Error: cannot allocate vector of size ...

将表格从纵向整合到横向,高效地完成。
将表格从纵向整合到横向,高效地完成。

英文:

I have a large table with unique characteristics that occur on multiple IDs (table A).
Is there a clever workaround in which I could horizontally consolidate the values so that in the second table B I have unique IDs in the rows and in the columns occurring characteristics (which also occur in different numbers per ID)? The fields for missing features in an ID row I want to fill with NA. Since I have a maximum of 22 unique characteristics per ID, the maximum number of columns should be 23 (with ID).

With the loop it works, but it takes forever.

I tried all solutions from https://stackoverflow.com/q/5890584 without success.

E.g., for reshape, cast, dcast, and other functions the vector
is too large giving:
Error: cannot allocate vector of size ...

将表格从纵向整合到横向,高效地完成。
将表格从纵向整合到横向,高效地完成。

答案1

得分: 1

如果您在表A中创建新列,那么您可以很容易地使用 pivot_wider

  1. library(tidyverse)
  2. table_a <- tibble(
  3. id = c(1, 1, 2, 2, 2, 2, 3, 3, 3),
  4. feature = c("df", "ftv", "ed", "wed", "rfc", "dtb", "bes", "xrd", "yws")
  5. )
  6. table_b <- table_a %>%
  7. group_by(id) %>%
  8. mutate(feature_name = paste0("feature", row_number())) %>%
  9. pivot_wider(names_from = feature_name, values_from = feature)
  10. table_b
  11. # A tibble: 3 x 5
  12. # Groups: id [3]
  13. id feature1 feature2 feature3 feature4
  14. <dbl> <chr> <chr> <chr> <chr>
  15. 1 1 df ftv NA NA
  16. 2 2 ed wed rfc dtb
  17. 3 3 bes xrd yws NA
英文:

If you create a new column in Table A then you can use pivot_wider quite easily:

  1. library(tidyverse)
  2. table_a &lt;- tibble(
  3. id = c(1, 1, 2, 2, 2, 2, 3, 3, 3),
  4. feature = c(&quot;df&quot;, &quot;ftv&quot;, &quot;ed&quot;, &quot;wed&quot;, &quot;rfc&quot;, &quot;dtb&quot;, &quot;bes&quot;, &quot;xrd&quot;, &quot;yws&quot;)
  5. )
  6. table_b &lt;- table_a %&gt;%
  7. group_by(id) %&gt;%
  8. mutate(feature_name = paste0(&quot;feature&quot;, row_number())) %&gt;%
  9. pivot_wider(names_from = feature_name, values_from = feature)
  10. table_b
  11. # A tibble: 3 &#215; 5
  12. # Groups: id [3]
  13. id feature1 feature2 feature3 feature4
  14. &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  15. 1 1 df ftv NA NA
  16. 2 2 ed wed rfc dtb
  17. 3 3 bes xrd yws NA

huangapple
  • 本文由 发表于 2023年6月5日 22:44:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76407610.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定