将数据框变量转换为因子时指定级别。

huangapple go评论83阅读模式
英文:

Specify levels when `mutate`-ing dataframe variables to factors

问题

让我们假设我有名为 data 的以下 tibble 数据框:

  1. library(tibble)
  2. data <- tribble(
  3. ~"ID", ~"some factor", ~"some other factor",
  4. 1L, "low", "high",
  5. 2L, "very high", "low",
  6. 3L, "very low", "low",
  7. 4L, "high", "very high",
  8. 5L, "very low", "very low"
  9. )

我使用 forcats 中的 fct() 函数来相应地转换我的两个因子变量:

  1. library(dplyr)
  2. library(forcats)
  3. data <- data %>%
  4. mutate(across(starts_with("some"), fct))

这会给我以下结果:

  1. # A tibble: 5 × 3
  2. ID `some factor` `some other factor`
  3. <int> <fct> <fct>
  4. 1 1 low high
  5. 2 2 very high low
  6. 3 3 very low low
  7. 4 4 high very high
  8. 5 5 very low very low

然而,当我以这种方式调用 fct 时,我不清楚如何指定这个有序变量的级别。我想要的顺序是:

  1. order <- c("very low", "low", "high", "very high")

我该如何使用 dplyr 的函数集来实现这一点? 目标是创建 ggplot2 可视化图表以遵循这个顺序。

英文:

Let's say I have the following tibble dataframe called data:

  1. library(tibble)
  2. data &lt;- tribble(
  3. ~&quot;ID&quot;, ~&quot;some factor&quot;, ~&quot;some other factor&quot;,
  4. 1L, &quot;low&quot;, &quot;high&quot;,
  5. 2L, &quot;very high&quot;, &quot;low&quot;,
  6. 3L, &quot;very low&quot;, &quot;low&quot;,
  7. 4L, &quot;high&quot;, &quot;very high&quot;,
  8. 5L, &quot;very low&quot;, &quot;very low&quot;
  9. )

I use the fct() function in forcats to convert my two factor variables accordingly:

  1. library(dplyr)
  2. library(forcats)
  3. data &lt;- data %&gt;%
  4. mutate(across(starts_with(&quot;some&quot;), fct))

Which gives me:

  1. # A tibble: 5 &#215; 3
  2. ID `some factor` `some other factor`
  3. &lt;int&gt; &lt;fct&gt; &lt;fct&gt;
  4. 1 1 low high
  5. 2 2 very high low
  6. 3 3 very low low
  7. 4 4 high very high
  8. 5 5 very low very low

However, when I call fct this way it's unclear to me how to specify the levels of this ordinal variable. The order I would like is:

  1. order &lt;- c(&quot;very low&quot;, &quot;low&quot;, &quot;high&quot;, &quot;very high&quot;)

How should I do this with dplyr's set of functions? The goal is to have ggplot2 visualizations that respect this ordering.

答案1

得分: 3

order <- c("very low", "low", "high", "very high")

data <- data %>%
mutate(across(starts_with("some"), fct, order))

应该可以解决问题。

英文:
  1. order &lt;- c(&quot;very low&quot;, &quot;low&quot;, &quot;high&quot;, &quot;very high&quot;)
  2. data &lt;- data %&gt;%
  3. mutate(across(starts_with(&quot;some&quot;), fct, order))

should do the trick

答案2

得分: 3

当您使用across()时,可以通过across...参数将额外的参数传递给调用的函数。

  1. data <- data %>%
  2. mutate(across(starts_with("some"), fct, levels = order))

这等同于

  1. data <- data %>%
  2. mutate(across(starts_with("some"), function(x) fct(x, levels = order)))

(这是R中的一种常见范例,许多函数在应用函数时都有一个...参数,用于传递给应用的函数,还可以参考lapplysapplypurrr::map等函数。)

英文:

When you use across() you can pass extra arguments along to the called function through across's ....

  1. data &lt;- data %&gt;%
  2. mutate(across(starts_with(&quot;some&quot;), fct, levels = order))

This is equivalent to

  1. data &lt;- data %&gt;%
  2. mutate(across(starts_with(&quot;some&quot;), function(x) fct(x, levels = order)))

(This is a common paradigm in R, many functions where you are applying a function have a ... argument for arguments that will be passed along to the applied function, see also lapply, sapply, purrr::map, etc.)

huangapple
  • 本文由 发表于 2023年2月7日 01:50:07
  • 转载请务必保留本文链接:https://go.coder-hub.com/75364840.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定