2023年6月15日 17:31:51go评论86阅读模式

英文:

Convert factor levels into columns and columns into factor levels

问题

我有一个关于排名博士原因的数据集（例如）。

df <- data.frame(
  id = c(1:4),
  rank1 = c("Salary", "Interest", "Career", "Title"),
  rank2 = c("Title", "Career", "Salary", NA),
  rank3 = c("Interest", "Title", NA, NA),
  rank4 = c("Career", NA, NA, NA))
> df
  id    rank1  rank2    rank3  rank4
1  1   Salary  Title Interest Career
2  2 Interest Career    Title   <NA>
3  3   Career Salary     <NA>   <NA>
4  4    Title   <NA>     <NA>   <NA>

具有id 1的人将“Salary”评为最重要的原因，然后是“Title”，依此类推...

然而，我正在尝试将变量的因子级别转换为列，并将列作为变量的因子级别，以获得如下结果：

  id Salary Title Interest Career
1  1  rank1 rank2    rank3  rank4
2  2   <NA> rank3    rank1  rank2
3  3  rank2  <NA>     <NA>  rank1
4  4   <NA> rank1     <NA>   <NA>

在R中是否有一种方法可以做到这一点？我尝试了tidyr中的spread()，但这不是我想要的。

感谢任何帮助！

英文:

I have a dataset concerning a question for ranking the reasons for doing a PhD (for example).

df &lt;- data.frame(
  id = c(1:4),
  rank1 = c(&quot;Salary&quot;, &quot;Interest&quot;, &quot;Career&quot;, &quot;Title&quot;),
  rank2 = c(&quot;Title&quot;, &quot;Career&quot;, &quot;Salary&quot;, NA),
  rank3 = c(&quot;Interest&quot;, &quot;Title&quot;, NA, NA),
  rank4 = c(&quot;Career&quot;, NA, NA, NA))
&gt; df
  id    rank1  rank2    rank3  rank4
1  1   Salary  Title Interest Career
2  2 Interest Career    Title   &lt;NA&gt;
3  3   Career Salary     &lt;NA&gt;   &lt;NA&gt;
4  4    Title   &lt;NA&gt;     &lt;NA&gt;   &lt;NA&gt;

The Person with id 1 rated "Salary" as most important reason, then "Title", etc...

However, I am trying to convert the factor levels of the variable to a column and the columns as the factor level of the variable in order to get this:

  id Salary Title Interest Career
1  1  rank1 rank2    rank3  rank4
2  2   &lt;NA&gt; rank3    rank1  rank2
3  3  rank2  &lt;NA&gt;     &lt;NA&gt;  rank1
4  4   &lt;NA&gt; rank1     &lt;NA&gt;   &lt;NA&gt;

Is there a way to do this in R? I have tried spread() from tidyr, but this is not what I am aiming for.
Any help is appreciated!
Thank you!

答案1

得分: 0

我相信 @Chamkrai 已经提供了你正在寻找的答案（目前已删除），但我在思考如何处理缺失值。在这个示例中，你可以用"Salary"替换id2的缺失值，因为这是该id唯一缺失的值。你也可以通过从"missing"值中抽样来填充其他缺失值。我还没有找到一个简洁的方法，但这有可能有助于你的实际用例：

library(tidyverse)
df <- data.frame(
  id = c(1:4),
  rank1 = c("Salary", "Interest", "Career", "Title"),
  rank2 = c("Title", "Career", "Salary", NA),
  rank3 = c("Interest", "Title", NA, NA),
  rank4 = c("Career", NA, NA, NA))
df
#>   id    rank1  rank2    rank3  rank4
#> 1  1   Salary  Title Interest Career
#> 2  2 Interest Career    Title   <NA>
#> 3  3   Career Salary     <NA>   <NA>
#> 4  4    Title   <NA>     <NA>   <NA>
unique_values <- df %>%
  select(-id) %>%
  pivot_longer(everything()) %>%
  na.omit() %>%
  distinct(value) %>%
  pull(value)
unique_values
#> [1] "Salary"   "Title"    "Interest" "Career"
df %>%
  t %>%
  as.data.frame %>%
  mutate(across(everything(), 
                ~ifelse(is.na(.x) & row_number() > 1,
                        unique_values[!(unique_values %in% .x)],
                        .x))) %>%
  t %>%
  as.data.frame %>%
  pivot_longer(-id) %>%
  pivot_wider(names_from = value,
              values_from = name)
#> # A tibble: 4 × 5
#>   id    Salary Title Interest Career
#>   <chr> <chr>  <chr> <chr>    <chr> 
#> 1 1     rank1  rank2 rank3    rank4 
#> 2 2     rank4  rank3 rank1    rank2 
#> 3 3     rank2  rank4 rank3    rank1 
#> 4 4     rank3  rank1 rank4    rank2

^{创建于2023年6月15日，使用reprex v2.0.2}

英文:

I believe @Chamkrai has the answer you're looking for (currently deleted) but I was thinking about what to do with the NA's. In this example you can replace the NA for id2 with "Salary", as this is the only one missing value for that id. You could also fill in the other NA's by sampling from the 'missing' values. I haven't been able to work out a succinct approach, but there's a small chance that this will help with your actual use-case:

library(tidyverse)
df &lt;- data.frame(
  id = c(1:4),
  rank1 = c(&quot;Salary&quot;, &quot;Interest&quot;, &quot;Career&quot;, &quot;Title&quot;),
  rank2 = c(&quot;Title&quot;, &quot;Career&quot;, &quot;Salary&quot;, NA),
  rank3 = c(&quot;Interest&quot;, &quot;Title&quot;, NA, NA),
  rank4 = c(&quot;Career&quot;, NA, NA, NA))
df
#&gt;   id    rank1  rank2    rank3  rank4
#&gt; 1  1   Salary  Title Interest Career
#&gt; 2  2 Interest Career    Title   &lt;NA&gt;
#&gt; 3  3   Career Salary     &lt;NA&gt;   &lt;NA&gt;
#&gt; 4  4    Title   &lt;NA&gt;     &lt;NA&gt;   &lt;NA&gt;
unique_values &lt;- df %&gt;%
  select(-id) %&gt;%
  pivot_longer(everything()) %&gt;%
  na.omit() %&gt;%
  distinct(value) %&gt;%
  pull(value)
unique_values
#&gt; [1] &quot;Salary&quot;   &quot;Title&quot;    &quot;Interest&quot; &quot;Career&quot;
df %&gt;%
  t %&gt;%
  as.data.frame %&gt;%
  mutate(across(everything(), 
                ~ifelse(is.na(.x) &amp; row_number() &gt; 1,
                        unique_values[!(unique_values %in% .x)],
                        .x))) %&gt;%
  t %&gt;%
  as.data.frame %&gt;%
  pivot_longer(-id) %&gt;%
  pivot_wider(names_from = value,
              values_from = name)
#&gt; # A tibble: 4 &#215; 5
#&gt;   id    Salary Title Interest Career
#&gt;   &lt;chr&gt; &lt;chr&gt;  &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt; 
#&gt; 1 1     rank1  rank2 rank3    rank4 
#&gt; 2 2     rank4  rank3 rank1    rank2 
#&gt; 3 3     rank2  rank4 rank3    rank1 
#&gt; 4 4     rank3  rank1 rank4    rank2

<sup>Created on 2023-06-15 with reprex v2.0.2</sup>

答案2

得分: 0

library(tidyverse)
(df <- data.frame(
  id = c(1:4),
  rank1 = c("薪水", "兴趣", "职业", "职称"),
  rank2 = c("职称", "职业", "薪水", NA),
  rank3 = c("兴趣", "职称", NA, NA),
  rank4 = c("职业", NA, NA, NA)))
(df_long <- pivot_longer(df,
                        cols=-id) %>% na.omit())
(df_rewide <- pivot_wider(data = df_long,
                          id_cols = "id",
                          names_from = "value",
                          values_from = "name"))

英文:

library(tidyverse)
(df &lt;- data.frame(
  id = c(1:4),
  rank1 = c(&quot;Salary&quot;, &quot;Interest&quot;, &quot;Career&quot;, &quot;Title&quot;),
  rank2 = c(&quot;Title&quot;, &quot;Career&quot;, &quot;Salary&quot;, NA),
  rank3 = c(&quot;Interest&quot;, &quot;Title&quot;, NA, NA),
  rank4 = c(&quot;Career&quot;, NA, NA, NA)))
(df_long &lt;- pivot_longer(df,
                        cols=-id) |&gt; na.omit())
(df_rewide &lt;- pivot_wider(data = df_long,
                          id_cols = &quot;id&quot;,
                          names_from = &quot;value&quot;,
                          values_from = &quot;name&quot;))

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将因子水平转换为列，并将列转换为因子水平

问题

答案1

答案2

如何在R中以双对数刻度绘制宽范围数据？

Data frame indexing not working as it should be. Does not give error as well. Pandas-Python.

如何在Python中使用条件语句将不在特定范围内的列值替换为null值

在使用ggplotly()时，反转图例顺序。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。