2023年6月5日 22:44:59go评论125阅读模式

英文:

Consolidate table from vertical to horizontal efficiently

问题

在多个ID上具有唯一特征的大表格（表A）。是否有巧妙的方法可以横向合并这些值，以便在第二个表B中，行中有唯一的ID，并且列中包含出现的特征（每个ID中也可能以不同数量出现）？我希望在ID行中缺少特征的字段填充为NA。由于每个ID最多具有22个唯一特征，所以最大的列数应该是23（包括ID）。

使用循环可以实现，但需要很长时间。

我尝试了https://stackoverflow.com/q/5890584 中的所有解决方案都没有成功。

例如，对于reshape、cast、dcast和其他函数，向量太大，导致以下错误：
Error: cannot allocate vector of size ...

英文:

I have a large table with unique characteristics that occur on multiple IDs (table A).
Is there a clever workaround in which I could horizontally consolidate the values so that in the second table B I have unique IDs in the rows and in the columns occurring characteristics (which also occur in different numbers per ID)? The fields for missing features in an ID row I want to fill with NA. Since I have a maximum of 22 unique characteristics per ID, the maximum number of columns should be 23 (with ID).

With the loop it works, but it takes forever.

I tried all solutions from https://stackoverflow.com/q/5890584 without success.

E.g., for reshape, cast, dcast, and other functions the vector
is too large giving:
Error: cannot allocate vector of size ...

答案1

得分: 1

如果您在表A中创建新列，那么您可以很容易地使用 pivot_wider：

library(tidyverse)
table_a <- tibble(
  id = c(1, 1, 2, 2, 2, 2, 3, 3, 3), 
  feature = c("df", "ftv", "ed", "wed", "rfc", "dtb", "bes", "xrd", "yws")
)
table_b <- table_a %>%
  group_by(id) %>%
  mutate(feature_name = paste0("feature", row_number())) %>%
  pivot_wider(names_from = feature_name, values_from = feature)
  
table_b
# A tibble: 3 x 5
# Groups:   id [3]
     id feature1 feature2 feature3 feature4
  <dbl> <chr>    <chr>    <chr>    <chr>   
1     1 df       ftv      NA       NA      
2     2 ed       wed      rfc      dtb     
3     3 bes      xrd      yws      NA

英文:

If you create a new column in Table A then you can use pivot_wider quite easily:

library(tidyverse)
table_a &lt;- tibble(
  id = c(1, 1, 2, 2, 2, 2, 3, 3, 3), 
  feature = c(&quot;df&quot;, &quot;ftv&quot;, &quot;ed&quot;, &quot;wed&quot;, &quot;rfc&quot;, &quot;dtb&quot;, &quot;bes&quot;, &quot;xrd&quot;, &quot;yws&quot;)
)
table_b &lt;- table_a %&gt;%
  group_by(id) %&gt;%
  mutate(feature_name = paste0(&quot;feature&quot;, row_number())) %&gt;%
  pivot_wider(names_from = feature_name, values_from = feature)
  
table_b
# A tibble: 3 &#215; 5
# Groups:   id [3]
     id feature1 feature2 feature3 feature4
  &lt;dbl&gt; &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;    &lt;chr&gt;   
1     1 df       ftv      NA       NA      
2     2 ed       wed      rfc      dtb     
3     3 bes      xrd      yws      NA

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将表格从纵向整合到横向，高效地完成。

问题

答案1

使用两个其他远距离观察的平均值来替换多个缺失的观测数据点。

strip.text在labeller中无法正常工作 – facet_wrap ggplot2

两个长度不同的数据集之间的地理空间距离的平均值。

在tidyverse中按组计算滚动均值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。