将因子水平转换为列,并将列转换为因子水平

huangapple go评论70阅读模式
英文:

Convert factor levels into columns and columns into factor levels

问题

我有一个关于排名博士原因的数据集(例如)。

df <- data.frame(
  id = c(1:4),
  rank1 = c("Salary", "Interest", "Career", "Title"),
  rank2 = c("Title", "Career", "Salary", NA),
  rank3 = c("Interest", "Title", NA, NA),
  rank4 = c("Career", NA, NA, NA))

> df
  id    rank1  rank2    rank3  rank4
1  1   Salary  Title Interest Career
2  2 Interest Career    Title   <NA>
3  3   Career Salary     <NA>   <NA>
4  4    Title   <NA>     <NA>   <NA>

具有id 1的人将“Salary”评为最重要的原因,然后是“Title”,依此类推...

然而,我正在尝试将变量的因子级别转换为列,并将列作为变量的因子级别,以获得如下结果:

  id Salary Title Interest Career
1  1  rank1 rank2    rank3  rank4
2  2   <NA> rank3    rank1  rank2
3  3  rank2  <NA>     <NA>  rank1
4  4   <NA> rank1     <NA>   <NA>

在R中是否有一种方法可以做到这一点?我尝试了tidyr中的spread(),但这不是我想要的。

感谢任何帮助!

英文:

I have a dataset concerning a question for ranking the reasons for doing a PhD (for example).

df &lt;- data.frame(
  id = c(1:4),
  rank1 = c(&quot;Salary&quot;, &quot;Interest&quot;, &quot;Career&quot;, &quot;Title&quot;),
  rank2 = c(&quot;Title&quot;, &quot;Career&quot;, &quot;Salary&quot;, NA),
  rank3 = c(&quot;Interest&quot;, &quot;Title&quot;, NA, NA),
  rank4 = c(&quot;Career&quot;, NA, NA, NA))

&gt; df
  id    rank1  rank2    rank3  rank4
1  1   Salary  Title Interest Career
2  2 Interest Career    Title   &lt;NA&gt;
3  3   Career Salary     &lt;NA&gt;   &lt;NA&gt;
4  4    Title   &lt;NA&gt;     &lt;NA&gt;   &lt;NA&gt;

The Person with id 1 rated &quot;Salary&quot; as most important reason, then &quot;Title&quot;, etc...

However, I am trying to convert the factor levels of the variable to a column and the columns as the factor level of the variable in order to get this:

  id Salary Title Interest Career
1  1  rank1 rank2    rank3  rank4
2  2   &lt;NA&gt; rank3    rank1  rank2
3  3  rank2  &lt;NA&gt;     &lt;NA&gt;  rank1
4  4   &lt;NA&gt; rank1     &lt;NA&gt;   &lt;NA&gt;

Is there a way to do this in R? I have tried spread() from tidyr, but this is not what I am aiming for.
Any help is appreciated!
Thank you!

答案1

得分: 0

我相信 @Chamkrai 已经提供了你正在寻找的答案(目前已删除),但我在思考如何处理缺失值。在这个示例中,你可以用"Salary"替换id2的缺失值,因为这是该id唯一缺失的值。你也可以通过从"missing"值中抽样来填充其他缺失值。我还没有找到一个简洁的方法,但这有可能有助于你的实际用例:

library(tidyverse)

df <- data.frame(
  id = c(1:4),
  rank1 = c("Salary", "Interest", "Career", "Title"),
  rank2 = c("Title", "Career", "Salary", NA),
  rank3 = c("Interest", "Title", NA, NA),
  rank4 = c("Career", NA, NA, NA))
df
#>   id    rank1  rank2    rank3  rank4
#> 1  1   Salary  Title Interest Career
#> 2  2 Interest Career    Title   <NA>
#> 3  3   Career Salary     <NA>   <NA>
#> 4  4    Title   <NA>     <NA>   <NA>

unique_values <- df %>%
  select(-id) %>%
  pivot_longer(everything()) %>%
  na.omit() %>%
  distinct(value) %>%
  pull(value)
unique_values
#> [1] "Salary"   "Title"    "Interest" "Career"

df %>%
  t %>%
  as.data.frame %>%
  mutate(across(everything(), 
                ~ifelse(is.na(.x) & row_number() > 1,
                        unique_values[!(unique_values %in% .x)],
                        .x))) %>%
  t %>%
  as.data.frame %>%
  pivot_longer(-id) %>%
  pivot_wider(names_from = value,
              values_from = name)
#> # A tibble: 4 × 5
#>   id    Salary Title Interest Career
#>   <chr> <chr>  <chr> <chr>    <chr> 
#> 1 1     rank1  rank2 rank3    rank4 
#> 2 2     rank4  rank3 rank1    rank2 
#> 3 3     rank2  rank4 rank3    rank1 
#> 4 4     rank3  rank1 rank4    rank2

创建于2023年6月15日,使用reprex v2.0.2

英文:

I believe @Chamkrai has the answer you're looking for (currently deleted) but I was thinking about what to do with the NA's. In this example you can replace the NA for id2 with "Salary", as this is the only one missing value for that id. You could also fill in the other NA's by sampling from the 'missing' values. I haven't been able to work out a succinct approach, but there's a small chance that this will help with your actual use-case:

library(tidyverse)

df &lt;- data.frame(
  id = c(1:4),
  rank1 = c(&quot;Salary&quot;, &quot;Interest&quot;, &quot;Career&quot;, &quot;Title&quot;),
  rank2 = c(&quot;Title&quot;, &quot;Career&quot;, &quot;Salary&quot;, NA),
  rank3 = c(&quot;Interest&quot;, &quot;Title&quot;, NA, NA),
  rank4 = c(&quot;Career&quot;, NA, NA, NA))
df
#&gt;   id    rank1  rank2    rank3  rank4
#&gt; 1  1   Salary  Title Interest Career
#&gt; 2  2 Interest Career    Title   &lt;NA&gt;
#&gt; 3  3   Career Salary     &lt;NA&gt;   &lt;NA&gt;
#&gt; 4  4    Title   &lt;NA&gt;     &lt;NA&gt;   &lt;NA&gt;

unique_values &lt;- df %&gt;%
  select(-id) %&gt;%
  pivot_longer(everything()) %&gt;%
  na.omit() %&gt;%
  distinct(value) %&gt;%
  pull(value)
unique_values
#&gt; [1] &quot;Salary&quot;   &quot;Title&quot;    &quot;Interest&quot; &quot;Career&quot;

df %&gt;%
  t %&gt;%
  as.data.frame %&gt;%
  mutate(across(everything(), 
                ~ifelse(is.na(.x) &amp; row_number() &gt; 1,
                        unique_values[!(unique_values %in% .x)],
                        .x))) %&gt;%
  t %&gt;%
  as.data.frame %&gt;%
  pivot_longer(-id) %&gt;%
  pivot_wider(names_from = value,
              values_from = name)
#&gt; # A tibble: 4 &#215; 5
#&gt;   id    Salary Title Interest Career
#&gt;   &lt;chr&gt; &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt; 
#&gt; 1 1     rank1  rank2 rank3    rank4 
#&gt; 2 2     rank4  rank3 rank1    rank2 
#&gt; 3 3     rank2  rank4 rank3    rank1 
#&gt; 4 4     rank3  rank1 rank4    rank2

<sup>Created on 2023-06-15 with reprex v2.0.2</sup>

答案2

得分: 0

library(tidyverse)
(df <- data.frame(
  id = c(1:4),
  rank1 = c("薪水", "兴趣", "职业", "职称"),
  rank2 = c("职称", "职业", "薪水", NA),
  rank3 = c("兴趣", "职称", NA, NA),
  rank4 = c("职业", NA, NA, NA)))

(df_long <- pivot_longer(df,
                        cols=-id) %>% na.omit())

(df_rewide <- pivot_wider(data = df_long,
                          id_cols = "id",
                          names_from = "value",
                          values_from = "name"))
英文:
library(tidyverse)
(df &lt;- data.frame(
  id = c(1:4),
  rank1 = c(&quot;Salary&quot;, &quot;Interest&quot;, &quot;Career&quot;, &quot;Title&quot;),
  rank2 = c(&quot;Title&quot;, &quot;Career&quot;, &quot;Salary&quot;, NA),
  rank3 = c(&quot;Interest&quot;, &quot;Title&quot;, NA, NA),
  rank4 = c(&quot;Career&quot;, NA, NA, NA)))

(df_long &lt;- pivot_longer(df,
                        cols=-id) |&gt; na.omit())

(df_rewide &lt;- pivot_wider(data = df_long,
                          id_cols = &quot;id&quot;,
                          names_from = &quot;value&quot;,
                          values_from = &quot;name&quot;))

huangapple
  • 本文由 发表于 2023年6月15日 17:31:51
  • 转载请务必保留本文链接:https://go.coder-hub.com/76481099.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定