2023年3月10日 01:30:27go评论116阅读模式

英文:

pivot_wider causing "! Can't subset columns that don't exist." Error in R

问题

我试图在R中进行数据透视，但当我运行我的代码时，出现"无法选择不存在的列"错误。我的数据目前看起来像这样，尽管实际上有数百个不同的"Title"，而不只是下面简化的数据中的三个。

ID    Title         Training_Time    Percent_Complete
1     New             1                    100
1     Hazmat          5                    100
1     Management      12                   100

我希望它看起来像这样，如果可能的话，还包括"Percent Complete"列，尽管只包含完成培训所需的天数也可以，就像下面的示例一样。

ID    Training_New     Training_Hazmat          Training_Management
1         1                   5                        12

我尝试过不同版本的以下代码，基于类似问题的stackoverflow回答。

LMS_df <- LMS_df %>%
  tidyr::pivot_wider(-ID,
    names_from = LMS_df$Title,
    values_from = LMS_df$Training_Time
  )

有关消除此错误的建议吗？我是否需要为数据集中的每个培训标题创建新列？

英文:

I have data that I'm trying to pivot in R. But when I run my code, I get a Can't subset columns that don't exist error. My data currently looks like this, however, there are hundreds of different Titles, not just three as my simplified data below looks like.

ID    Title         Training_Time    Percent_Complete
1     New             1                    100
1     Hazmat          5                    100
1     Management      12                   100

And I would like for it to look like this, and if possible also include Columns for Percent Complete, though I can make do with just having the days it took to complete training, like I have below.

ID    Training_New     Training_Hazmat          Training_Managerment
1         1                   5                        12

I've tried several different versions of the code below, based on stackoverflow responses to similar questions.

LMS_df &lt;- LMS_df %&gt;%
tidyr::pivot_wider(-ID,
  names_from = LMS_df$Title,
  values_from = LMS_df$Training_Time
)

Any advice on getting rid of this error? Do I need to create new columns for every training title that I have in my dataset?

答案1

得分: 2

以下是您要求的代码部分的中文翻译：

library(dplyr)
library(tidyr)
df <- tribble(
  ~ID,    ~Title,       ~Training_Time,  ~Percent_Complete,
  1,     "New",         1,                100,
  1,     "Hazmat",      5,                100,
  1,     "Management",  12,               100
)
df %>%
  pivot_wider(-ID,
    names_from = Title,
    values_from = Training_Time,
    names_prefix = "Training_"
  )
#> # A tibble: 1 × 4
#>   Percent_Complete Training_New Training_Hazmat Training_Management
#>              <dbl>        <dbl>           <dbl>               <dbl>
#> 1              100            1               5                  12

^{创建于2023年3月9日，使用 reprex v2.0.2}

最后的注意事项：这是一个特殊情况，因为所有的"Title"都具有相同的"Percent_Complete"值，否则，您将为每个百分比获得一个单独的行，对于所有不匹配的情况，将获得"NA"。

英文:

Here a small reproducible example from the data provided by the original poster. The name of the data frame in dollar syntax (i.e. LMS_df$) is not needed. Without, it works so far. A name prefix can be added with names_prefix = "Training_":

library(dplyr)
library(tidyr)
df &lt;- tribble(
~ID,    ~Title,       ~Training_Time,  ~Percent_Complete,
1,     &quot;New&quot;,         1,                100,
1,     &quot;Hazmat&quot;,      5,                100,
1,     &quot;Management&quot;,  12,               100
)
df %&gt;%
  pivot_wider(-ID,
    names_from = Title,
    values_from = Training_Time,
    names_prefix = &quot;Training_&quot;
  )
#&gt; # A tibble: 1 &#215; 4
#&gt;   Percent_Complete Training_New Training_Hazmat Training_Management
#&gt;              &lt;dbl&gt;        &lt;dbl&gt;           &lt;dbl&gt;               &lt;dbl&gt;
#&gt; 1              100            1               5                  12

<sup>Created on 2023-03-09 with reprex v2.0.2</sup>

A final note: This is a special case as Percent_Complete is equally 100 for all Titles. Otherwise, you would get an individual row for each percentage and an NA for all non-matching cases.

答案2

得分: 1

一种可能的方法是：

library(dplyr)
library(tidyr)
df %>% 
  pivot_wider(names_from = Title,
              values_from = Training_Time, 
              names_glue = "Training_{Title}") %>% 
  select(-ID)

  Percent_Complete Training_New Training_Hazmat Training_Management
             <int>        <int>           <int>               <int>
1              100            1               5                  12

英文:

One possible way is:

library(dplyr)
library(tidyr)
df %&gt;% 
  pivot_wider(names_from = Title,
              values_from = Training_Time, 
              names_glue = &quot;Training_{Title}&quot;) %&gt;% 
  select(-ID)

 Percent_Complete Training_New Training_Hazmat Training_Management
             &lt;int&gt;        &lt;int&gt;           &lt;int&gt;               &lt;int&gt;
1              100            1               5                  12

答案3

得分: 1

An option with data.table

library(data.table)
dcast(setDT(df), Percent_Complete ~ paste0("Training_", Title), value.var = "Training_Time")

-output

   Percent_Complete Training_Hazmat Training_Management Training_New
1:              100               5                  12            1

英文:

An option with data.table

library(data.table)
 dcast(setDT(df), Percent_Complete ~ paste0(&quot;Training_&quot;, Title), value.var = &quot;Training_Time&quot;)

-output

   Percent_Complete Training_Hazmat Training_Management Training_New
1:              100               5                  12            1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

`pivot_wider` 在 R 中引发 “! 无法子集化不存在的列。” 错误。

问题

答案1

答案2

答案3

如何使用逗号分隔的数字作为数值变量

根据另一张表中的两列选择R表中的行。

创建一个应用于多个不同数据框每一行的函数在R中。

在另一个数据框中匹配行和列中的数值。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。