`pivot_wider` 在 R 中引发 “! 无法子集化不存在的列。” 错误。

huangapple go评论83阅读模式
英文:

pivot_wider causing "! Can't subset columns that don't exist." Error in R

问题

我试图在R中进行数据透视,但当我运行我的代码时,出现"无法选择不存在的列"错误。我的数据目前看起来像这样,尽管实际上有数百个不同的"Title",而不只是下面简化的数据中的三个。

ID    Title         Training_Time    Percent_Complete

1     New             1                    100
1     Hazmat          5                    100
1     Management      12                   100

我希望它看起来像这样,如果可能的话,还包括"Percent Complete"列,尽管只包含完成培训所需的天数也可以,就像下面的示例一样。

ID    Training_New     Training_Hazmat          Training_Management

1         1                   5                        12                                                      

我尝试过不同版本的以下代码,基于类似问题的stackoverflow回答。

LMS_df <- LMS_df %>%
  tidyr::pivot_wider(-ID,
    names_from = LMS_df$Title,
    values_from = LMS_df$Training_Time
  )

有关消除此错误的建议吗?我是否需要为数据集中的每个培训标题创建新列?

英文:

I have data that I'm trying to pivot in R. But when I run my code, I get a Can't subset columns that don't exist error. My data currently looks like this, however, there are hundreds of different Titles, not just three as my simplified data below looks like.

ID    Title         Training_Time    Percent_Complete

1     New             1                    100
1     Hazmat          5                    100
1     Management      12                   100

And I would like for it to look like this, and if possible also include Columns for Percent Complete, though I can make do with just having the days it took to complete training, like I have below.

ID    Training_New     Training_Hazmat          Training_Managerment

1         1                   5                        12                                                      

I've tried several different versions of the code below, based on stackoverflow responses to similar questions.

LMS_df &lt;- LMS_df %&gt;%
tidyr::pivot_wider(-ID,
  names_from = LMS_df$Title,
  values_from = LMS_df$Training_Time
)

Any advice on getting rid of this error? Do I need to create new columns for every training title that I have in my dataset?

答案1

得分: 2

以下是您要求的代码部分的中文翻译:

library(dplyr)
library(tidyr)

df <- tribble(
  ~ID,    ~Title,       ~Training_Time,  ~Percent_Complete,
  1,     "New",         1,                100,
  1,     "Hazmat",      5,                100,
  1,     "Management",  12,               100
)

df %>%
  pivot_wider(-ID,
    names_from = Title,
    values_from = Training_Time,
    names_prefix = "Training_"
  )
#> # A tibble: 1 × 4
#>   Percent_Complete Training_New Training_Hazmat Training_Management
#>              <dbl>        <dbl>           <dbl>               <dbl>
#> 1              100            1               5                  12

创建于2023年3月9日,使用 reprex v2.0.2

最后的注意事项:这是一个特殊情况,因为所有的"Title"都具有相同的"Percent_Complete"值,否则,您将为每个百分比获得一个单独的行,对于所有不匹配的情况,将获得"NA"。

英文:

Here a small reproducible example from the data provided by the original poster. The name of the data frame in dollar syntax (i.e. LMS_df$) is not needed. Without, it works so far. A name prefix can be added with names_prefix = &quot;Training_&quot;:

library(dplyr)
library(tidyr)

df &lt;- tribble(
~ID,    ~Title,       ~Training_Time,  ~Percent_Complete,
1,     &quot;New&quot;,         1,                100,
1,     &quot;Hazmat&quot;,      5,                100,
1,     &quot;Management&quot;,  12,               100
)

df %&gt;%
  pivot_wider(-ID,
    names_from = Title,
    values_from = Training_Time,
    names_prefix = &quot;Training_&quot;
  )
#&gt; # A tibble: 1 &#215; 4
#&gt;   Percent_Complete Training_New Training_Hazmat Training_Management
#&gt;              &lt;dbl&gt;        &lt;dbl&gt;           &lt;dbl&gt;               &lt;dbl&gt;
#&gt; 1              100            1               5                  12

<sup>Created on 2023-03-09 with reprex v2.0.2</sup>

A final note: This is a special case as Percent_Complete is equally 100 for all Titles. Otherwise, you would get an individual row for each percentage and an NA for all non-matching cases.

答案2

得分: 1

一种可能的方法是:
library(dplyr)
library(tidyr)
df %>% 
  pivot_wider(names_from = Title,
              values_from = Training_Time, 
              names_glue = "Training_{Title}") %>% 
  select(-ID)
  Percent_Complete Training_New Training_Hazmat Training_Management
             <int>        <int>           <int>               <int>
1              100            1               5                  12
英文:

One possible way is:

library(dplyr)
library(tidyr)
df %&gt;% 
  pivot_wider(names_from = Title,
              values_from = Training_Time, 
              names_glue = &quot;Training_{Title}&quot;) %&gt;% 
  select(-ID)
 Percent_Complete Training_New Training_Hazmat Training_Management
             &lt;int&gt;        &lt;int&gt;           &lt;int&gt;               &lt;int&gt;
1              100            1               5                  12

答案3

得分: 1

An option with data.table

library(data.table)
dcast(setDT(df), Percent_Complete ~ paste0("Training_", Title), value.var = "Training_Time")

-output

   Percent_Complete Training_Hazmat Training_Management Training_New
1:              100               5                  12            1
英文:

An option with data.table

library(data.table)
 dcast(setDT(df), Percent_Complete ~ paste0(&quot;Training_&quot;, Title), value.var = &quot;Training_Time&quot;)

-output

   Percent_Complete Training_Hazmat Training_Management Training_New
1:              100               5                  12            1

huangapple
  • 本文由 发表于 2023年3月10日 01:30:27
  • 转载请务必保留本文链接:https://go.coder-hub.com/75688107.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定