英文:
pivot_wider causing "! Can't subset columns that don't exist." Error in R
问题
我试图在R中进行数据透视,但当我运行我的代码时,出现"无法选择不存在的列"错误。我的数据目前看起来像这样,尽管实际上有数百个不同的"Title",而不只是下面简化的数据中的三个。
ID Title Training_Time Percent_Complete
1 New 1 100
1 Hazmat 5 100
1 Management 12 100
我希望它看起来像这样,如果可能的话,还包括"Percent Complete"列,尽管只包含完成培训所需的天数也可以,就像下面的示例一样。
ID Training_New Training_Hazmat Training_Management
1 1 5 12
我尝试过不同版本的以下代码,基于类似问题的stackoverflow回答。
LMS_df <- LMS_df %>%
tidyr::pivot_wider(-ID,
names_from = LMS_df$Title,
values_from = LMS_df$Training_Time
)
有关消除此错误的建议吗?我是否需要为数据集中的每个培训标题创建新列?
英文:
I have data that I'm trying to pivot in R. But when I run my code, I get a Can't subset columns that don't exist error. My data currently looks like this, however, there are hundreds of different Titles, not just three as my simplified data below looks like.
ID Title Training_Time Percent_Complete
1 New 1 100
1 Hazmat 5 100
1 Management 12 100
And I would like for it to look like this, and if possible also include Columns for Percent Complete, though I can make do with just having the days it took to complete training, like I have below.
ID Training_New Training_Hazmat Training_Managerment
1 1 5 12
I've tried several different versions of the code below, based on stackoverflow responses to similar questions.
LMS_df <- LMS_df %>%
tidyr::pivot_wider(-ID,
names_from = LMS_df$Title,
values_from = LMS_df$Training_Time
)
Any advice on getting rid of this error? Do I need to create new columns for every training title that I have in my dataset?
答案1
得分: 2
以下是您要求的代码部分的中文翻译:
library(dplyr)
library(tidyr)
df <- tribble(
~ID, ~Title, ~Training_Time, ~Percent_Complete,
1, "New", 1, 100,
1, "Hazmat", 5, 100,
1, "Management", 12, 100
)
df %>%
pivot_wider(-ID,
names_from = Title,
values_from = Training_Time,
names_prefix = "Training_"
)
#> # A tibble: 1 × 4
#> Percent_Complete Training_New Training_Hazmat Training_Management
#> <dbl> <dbl> <dbl> <dbl>
#> 1 100 1 5 12
创建于2023年3月9日,使用 reprex v2.0.2
最后的注意事项:这是一个特殊情况,因为所有的"Title"都具有相同的"Percent_Complete"值,否则,您将为每个百分比获得一个单独的行,对于所有不匹配的情况,将获得"NA"。
英文:
Here a small reproducible example from the data provided by the original poster. The name of the data frame in dollar syntax (i.e. LMS_df$
) is not needed. Without, it works so far. A name prefix can be added with names_prefix = "Training_"
:
library(dplyr)
library(tidyr)
df <- tribble(
~ID, ~Title, ~Training_Time, ~Percent_Complete,
1, "New", 1, 100,
1, "Hazmat", 5, 100,
1, "Management", 12, 100
)
df %>%
pivot_wider(-ID,
names_from = Title,
values_from = Training_Time,
names_prefix = "Training_"
)
#> # A tibble: 1 × 4
#> Percent_Complete Training_New Training_Hazmat Training_Management
#> <dbl> <dbl> <dbl> <dbl>
#> 1 100 1 5 12
<sup>Created on 2023-03-09 with reprex v2.0.2</sup>
A final note: This is a special case as Percent_Complete
is equally 100 for all Title
s. Otherwise, you would get an individual row for each percentage and an NA
for all non-matching cases.
答案2
得分: 1
一种可能的方法是:
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = Title,
values_from = Training_Time,
names_glue = "Training_{Title}") %>%
select(-ID)
Percent_Complete Training_New Training_Hazmat Training_Management
<int> <int> <int> <int>
1 100 1 5 12
英文:
One possible way is:
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = Title,
values_from = Training_Time,
names_glue = "Training_{Title}") %>%
select(-ID)
Percent_Complete Training_New Training_Hazmat Training_Management
<int> <int> <int> <int>
1 100 1 5 12
答案3
得分: 1
An option with data.table
library(data.table)
dcast(setDT(df), Percent_Complete ~ paste0("Training_", Title), value.var = "Training_Time")
-output
Percent_Complete Training_Hazmat Training_Management Training_New
1: 100 5 12 1
英文:
An option with data.table
library(data.table)
dcast(setDT(df), Percent_Complete ~ paste0("Training_", Title), value.var = "Training_Time")
-output
Percent_Complete Training_Hazmat Training_Management Training_New
1: 100 5 12 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论