英文:
Consolidate table from vertical to horizontal efficiently
问题
在多个ID上具有唯一特征的大表格(表A)。是否有巧妙的方法可以横向合并这些值,以便在第二个表B中,行中有唯一的ID,并且列中包含出现的特征(每个ID中也可能以不同数量出现)?我希望在ID行中缺少特征的字段填充为NA。由于每个ID最多具有22个唯一特征,所以最大的列数应该是23(包括ID)。
使用循环可以实现,但需要很长时间。
我尝试了https://stackoverflow.com/q/5890584 中的所有解决方案都没有成功。
例如,对于reshape
、cast
、dcast
和其他函数,向量太大,导致以下错误:
Error: cannot allocate vector of size ...
英文:
I have a large table with unique characteristics that occur on multiple IDs (table A).
Is there a clever workaround in which I could horizontally consolidate the values so that in the second table B I have unique IDs in the rows and in the columns occurring characteristics (which also occur in different numbers per ID)? The fields for missing features in an ID row I want to fill with NA. Since I have a maximum of 22 unique characteristics per ID, the maximum number of columns should be 23 (with ID).
With the loop it works, but it takes forever.
I tried all solutions from https://stackoverflow.com/q/5890584 without success.
E.g., for reshape
, cast
, dcast
, and other functions the vector
is too large giving:
Error: cannot allocate vector of size ...
答案1
得分: 1
如果您在表A中创建新列,那么您可以很容易地使用 pivot_wider
:
library(tidyverse)
table_a <- tibble(
id = c(1, 1, 2, 2, 2, 2, 3, 3, 3),
feature = c("df", "ftv", "ed", "wed", "rfc", "dtb", "bes", "xrd", "yws")
)
table_b <- table_a %>%
group_by(id) %>%
mutate(feature_name = paste0("feature", row_number())) %>%
pivot_wider(names_from = feature_name, values_from = feature)
table_b
# A tibble: 3 x 5
# Groups: id [3]
id feature1 feature2 feature3 feature4
<dbl> <chr> <chr> <chr> <chr>
1 1 df ftv NA NA
2 2 ed wed rfc dtb
3 3 bes xrd yws NA
英文:
If you create a new column in Table A then you can use pivot_wider
quite easily:
library(tidyverse)
table_a <- tibble(
id = c(1, 1, 2, 2, 2, 2, 3, 3, 3),
feature = c("df", "ftv", "ed", "wed", "rfc", "dtb", "bes", "xrd", "yws")
)
table_b <- table_a %>%
group_by(id) %>%
mutate(feature_name = paste0("feature", row_number())) %>%
pivot_wider(names_from = feature_name, values_from = feature)
table_b
# A tibble: 3 × 5
# Groups: id [3]
id feature1 feature2 feature3 feature4
<dbl> <chr> <chr> <chr> <chr>
1 1 df ftv NA NA
2 2 ed wed rfc dtb
3 3 bes xrd yws NA
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论