英文:
Convert factor levels into columns and columns into factor levels
问题
我有一个关于排名博士原因的数据集(例如)。
df <- data.frame(
id = c(1:4),
rank1 = c("Salary", "Interest", "Career", "Title"),
rank2 = c("Title", "Career", "Salary", NA),
rank3 = c("Interest", "Title", NA, NA),
rank4 = c("Career", NA, NA, NA))
> df
id rank1 rank2 rank3 rank4
1 1 Salary Title Interest Career
2 2 Interest Career Title <NA>
3 3 Career Salary <NA> <NA>
4 4 Title <NA> <NA> <NA>
具有id 1的人将“Salary”评为最重要的原因,然后是“Title”,依此类推...
然而,我正在尝试将变量的因子级别转换为列,并将列作为变量的因子级别,以获得如下结果:
id Salary Title Interest Career
1 1 rank1 rank2 rank3 rank4
2 2 <NA> rank3 rank1 rank2
3 3 rank2 <NA> <NA> rank1
4 4 <NA> rank1 <NA> <NA>
在R中是否有一种方法可以做到这一点?我尝试了tidyr
中的spread()
,但这不是我想要的。
感谢任何帮助!
英文:
I have a dataset concerning a question for ranking the reasons for doing a PhD (for example).
df <- data.frame(
id = c(1:4),
rank1 = c("Salary", "Interest", "Career", "Title"),
rank2 = c("Title", "Career", "Salary", NA),
rank3 = c("Interest", "Title", NA, NA),
rank4 = c("Career", NA, NA, NA))
> df
id rank1 rank2 rank3 rank4
1 1 Salary Title Interest Career
2 2 Interest Career Title <NA>
3 3 Career Salary <NA> <NA>
4 4 Title <NA> <NA> <NA>
The Person with id 1 rated "Salary"
as most important reason, then "Title"
, etc...
However, I am trying to convert the factor levels of the variable to a column and the columns as the factor level of the variable in order to get this:
id Salary Title Interest Career
1 1 rank1 rank2 rank3 rank4
2 2 <NA> rank3 rank1 rank2
3 3 rank2 <NA> <NA> rank1
4 4 <NA> rank1 <NA> <NA>
Is there a way to do this in R? I have tried spread()
from tidyr
, but this is not what I am aiming for.
Any help is appreciated!
Thank you!
答案1
得分: 0
我相信 @Chamkrai 已经提供了你正在寻找的答案(目前已删除),但我在思考如何处理缺失值。在这个示例中,你可以用"Salary"替换id2的缺失值,因为这是该id唯一缺失的值。你也可以通过从"missing"值中抽样来填充其他缺失值。我还没有找到一个简洁的方法,但这有可能有助于你的实际用例:
library(tidyverse)
df <- data.frame(
id = c(1:4),
rank1 = c("Salary", "Interest", "Career", "Title"),
rank2 = c("Title", "Career", "Salary", NA),
rank3 = c("Interest", "Title", NA, NA),
rank4 = c("Career", NA, NA, NA))
df
#> id rank1 rank2 rank3 rank4
#> 1 1 Salary Title Interest Career
#> 2 2 Interest Career Title <NA>
#> 3 3 Career Salary <NA> <NA>
#> 4 4 Title <NA> <NA> <NA>
unique_values <- df %>%
select(-id) %>%
pivot_longer(everything()) %>%
na.omit() %>%
distinct(value) %>%
pull(value)
unique_values
#> [1] "Salary" "Title" "Interest" "Career"
df %>%
t %>%
as.data.frame %>%
mutate(across(everything(),
~ifelse(is.na(.x) & row_number() > 1,
unique_values[!(unique_values %in% .x)],
.x))) %>%
t %>%
as.data.frame %>%
pivot_longer(-id) %>%
pivot_wider(names_from = value,
values_from = name)
#> # A tibble: 4 × 5
#> id Salary Title Interest Career
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1 rank1 rank2 rank3 rank4
#> 2 2 rank4 rank3 rank1 rank2
#> 3 3 rank2 rank4 rank3 rank1
#> 4 4 rank3 rank1 rank4 rank2
创建于2023年6月15日,使用reprex v2.0.2
英文:
I believe @Chamkrai has the answer you're looking for (currently deleted) but I was thinking about what to do with the NA's. In this example you can replace the NA for id2 with "Salary", as this is the only one missing value for that id. You could also fill in the other NA's by sampling from the 'missing' values. I haven't been able to work out a succinct approach, but there's a small chance that this will help with your actual use-case:
library(tidyverse)
df <- data.frame(
id = c(1:4),
rank1 = c("Salary", "Interest", "Career", "Title"),
rank2 = c("Title", "Career", "Salary", NA),
rank3 = c("Interest", "Title", NA, NA),
rank4 = c("Career", NA, NA, NA))
df
#> id rank1 rank2 rank3 rank4
#> 1 1 Salary Title Interest Career
#> 2 2 Interest Career Title <NA>
#> 3 3 Career Salary <NA> <NA>
#> 4 4 Title <NA> <NA> <NA>
unique_values <- df %>%
select(-id) %>%
pivot_longer(everything()) %>%
na.omit() %>%
distinct(value) %>%
pull(value)
unique_values
#> [1] "Salary" "Title" "Interest" "Career"
df %>%
t %>%
as.data.frame %>%
mutate(across(everything(),
~ifelse(is.na(.x) & row_number() > 1,
unique_values[!(unique_values %in% .x)],
.x))) %>%
t %>%
as.data.frame %>%
pivot_longer(-id) %>%
pivot_wider(names_from = value,
values_from = name)
#> # A tibble: 4 × 5
#> id Salary Title Interest Career
#> <chr> <chr> <chr> <chr> <chr>
#> 1 1 rank1 rank2 rank3 rank4
#> 2 2 rank4 rank3 rank1 rank2
#> 3 3 rank2 rank4 rank3 rank1
#> 4 4 rank3 rank1 rank4 rank2
<sup>Created on 2023-06-15 with reprex v2.0.2</sup>
答案2
得分: 0
library(tidyverse)
(df <- data.frame(
id = c(1:4),
rank1 = c("薪水", "兴趣", "职业", "职称"),
rank2 = c("职称", "职业", "薪水", NA),
rank3 = c("兴趣", "职称", NA, NA),
rank4 = c("职业", NA, NA, NA)))
(df_long <- pivot_longer(df,
cols=-id) %>% na.omit())
(df_rewide <- pivot_wider(data = df_long,
id_cols = "id",
names_from = "value",
values_from = "name"))
英文:
library(tidyverse)
(df <- data.frame(
id = c(1:4),
rank1 = c("Salary", "Interest", "Career", "Title"),
rank2 = c("Title", "Career", "Salary", NA),
rank3 = c("Interest", "Title", NA, NA),
rank4 = c("Career", NA, NA, NA)))
(df_long <- pivot_longer(df,
cols=-id) |> na.omit())
(df_rewide <- pivot_wider(data = df_long,
id_cols = "id",
names_from = "value",
values_from = "name"))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论