将数据框变量转换为因子时指定级别。

huangapple go评论60阅读模式
英文:

Specify levels when `mutate`-ing dataframe variables to factors

问题

让我们假设我有名为 data 的以下 tibble 数据框:

library(tibble)

data <- tribble(
    ~"ID", ~"some factor", ~"some other factor", 
    1L, "low", "high",
    2L, "very high", "low",
    3L, "very low", "low",
    4L, "high", "very high",
    5L, "very low", "very low"
)

我使用 forcats 中的 fct() 函数来相应地转换我的两个因子变量:

library(dplyr)
library(forcats)

data <- data %>%
        mutate(across(starts_with("some"), fct))

这会给我以下结果:

# A tibble: 5 × 3
     ID `some factor` `some other factor`
  <int> <fct>         <fct>              
1     1 low           high               
2     2 very high     low                
3     3 very low      low                
4     4 high          very high          
5     5 very low      very low 

然而,当我以这种方式调用 fct 时,我不清楚如何指定这个有序变量的级别。我想要的顺序是:

order <- c("very low", "low", "high", "very high")

我该如何使用 dplyr 的函数集来实现这一点? 目标是创建 ggplot2 可视化图表以遵循这个顺序。

英文:

Let's say I have the following tibble dataframe called data:

library(tibble)

data &lt;- tribble(
    ~&quot;ID&quot;, ~&quot;some factor&quot;, ~&quot;some other factor&quot;, 
    1L, &quot;low&quot;, &quot;high&quot;,
    2L, &quot;very high&quot;, &quot;low&quot;,
    3L, &quot;very low&quot;, &quot;low&quot;,
    4L, &quot;high&quot;, &quot;very high&quot;,
    5L, &quot;very low&quot;, &quot;very low&quot;
)

I use the fct() function in forcats to convert my two factor variables accordingly:

library(dplyr)
library(forcats)

data &lt;- data %&gt;%
        mutate(across(starts_with(&quot;some&quot;), fct))

Which gives me:

# A tibble: 5 &#215; 3
     ID `some factor` `some other factor`
  &lt;int&gt; &lt;fct&gt;         &lt;fct&gt;              
1     1 low           high               
2     2 very high     low                
3     3 very low      low                
4     4 high          very high          
5     5 very low      very low 

However, when I call fct this way it's unclear to me how to specify the levels of this ordinal variable. The order I would like is:

order &lt;- c(&quot;very low&quot;, &quot;low&quot;, &quot;high&quot;, &quot;very high&quot;)

How should I do this with dplyr's set of functions? The goal is to have ggplot2 visualizations that respect this ordering.

答案1

得分: 3

order <- c("very low", "low", "high", "very high")

data <- data %>%
mutate(across(starts_with("some"), fct, order))

应该可以解决问题。

英文:
order &lt;- c(&quot;very low&quot;, &quot;low&quot;, &quot;high&quot;, &quot;very high&quot;)

data &lt;- data %&gt;%
  mutate(across(starts_with(&quot;some&quot;), fct, order))

should do the trick

答案2

得分: 3

当您使用across()时,可以通过across...参数将额外的参数传递给调用的函数。

data <- data %>%
  mutate(across(starts_with("some"), fct, levels = order))

这等同于

data <- data %>%
  mutate(across(starts_with("some"), function(x) fct(x, levels = order)))

(这是R中的一种常见范例,许多函数在应用函数时都有一个...参数,用于传递给应用的函数,还可以参考lapplysapplypurrr::map等函数。)

英文:

When you use across() you can pass extra arguments along to the called function through across's ....

data &lt;- data %&gt;%
  mutate(across(starts_with(&quot;some&quot;), fct, levels = order))

This is equivalent to

data &lt;- data %&gt;%
  mutate(across(starts_with(&quot;some&quot;), function(x) fct(x, levels = order)))

(This is a common paradigm in R, many functions where you are applying a function have a ... argument for arguments that will be passed along to the applied function, see also lapply, sapply, purrr::map, etc.)

huangapple
  • 本文由 发表于 2023年2月7日 01:50:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定