如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?

huangapple go评论73阅读模式
英文:

How to use external character values to select x and y variables in ggplot?

问题

我有一组具有多个可观察变量的数据(比如 xyz),通过另一组变量(sectitem)进行索引。每次我进行实验,我都会得到一组这样的观测值。因此,对于实验“A”,我会得到每个索引对(sectitem)的变量xyz的值。然后我进行另一个实验“B”,并得到一组全新的这些变量。

我想要做的很简单:将一个实验中的观测值与另一个实验中相应的值绘制在一起,按变量分面显示(因此,绘制A中的x与B中的x,以及yz也是如此)。我想以一种“整洁”的方式做到这一点,但我找到的唯一方法似乎比应该的复杂。

这里有一些模拟数据以进行说明:

library(tidyr)
library(dplyr)
library(ggplot2)
# 模拟一个实验的函数
simdata <- function(experiment_name) {
  n <- 3 # 节数
  m <- 7 # 每节的项目数
  tibble(
    # 数据点(节-项目对)
    sect = factor(rep(1:n, each = m)), item = factor(rep(1:m, n)),
    # 三个变量的模拟观测值
    x = (1:(n * m))^1.05 + rnorm(n * m),
    y = (1:(n * m))^1.15 + rnorm(n * m, sd = 2),
    z = (1:(n * m))^1.25 + rnorm(n * m, sd = 4),
    experiment = experiment_name
  )
}
# 创建一个包含来自命名为“A”,“B”和“C”的实验数据的示例数据集
set.seed(42)
d <- bind_rows(simdata("A"), simdata("B"), simdata("C"))

所以,d是一个包含来自三个实验的数据集。以下是前几行:

# A tibble: 63 × 6
   sect  item      x     y      z experiment
   <fct> <fct> <dbl> <dbl>  <dbl> <chr>     
 1 1     1      2.37 -2.56  4.03  A         
 2 1     2      1.51  1.88 -0.528 A         
 3 1     3      3.53  5.97 -1.52  A         
 4 1     4      4.92  8.71  7.39  A         
 5 1     5      5.82  5.50  4.23  A         
# … 有58行数据

现在,假设我想绘制来自实验A和实验B的观测值。我将它们称为控制和替代:

# 一个包含两个实验名称的列表,用于比较
exps <- list(control = "A", alternative = "B")

现在这是似乎过于复杂的部分。我能找到的最好方法是涉及两个数据重塑操作(看起来有点丑陋)。这将导致每个实验的列。然后我用 sym() 包装实验名称,然后立即解包(使用 !!)以按名称引用这些列,这似乎是必要的 整洁评估

这样做起来也可以,但是是否有更好的方法?

d_reshaped <- d |>
  ## 应该有一个更好的方法来进行这种重塑
  pivot_longer(
    cols = -c(experiment, sect, item), 
    names_to = "var", values_to = "value"
  ) |>
  pivot_wider(names_from = c("experiment"), values_from = "value")
d_reshaped |>
  ## 但是我主要是在寻找一个更好的方法来进行这种引用...
  ggplot(aes(
    !!sym(exps$control),
    !!sym(exps$alternative)
  )) +
  geom_point(alpha = 0.5) +
  facet_grid(~var) +
  coord_fixed() +
  labs(title = paste("实验", exps, collapse = " vs "))

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?

我可以看到,与其使用 aes_string(exps$control, exps$alternative) 进行包装/解包,我可以使用 aes() ,但这是软废弃的,所以我会收到警告:

警告信息:
`aes_string()` 在 ggplot2 3.0.0 中已弃用。
请使用与 `aes()` 结合的整洁评估模式。

所以我想我不应该使用它。无论如何,我最想知道的是是否有更好的方法来做整个事情,因为我认为我可能把它弄得过于复杂了,但我看不出如何做到。

英文:

I have data with multiple observable variables (say, x,y,z), indexed by another set of variables (sect, and item). Each time I run an experiment I get such a set of observations. So for experiment "A" I get a value for each variable x,y,z for each value of the index pair (sect, item). Then I run another experiment "B", and get a whole new set of these variables.

What I want to do is simple: plot the observed values in one experiment against their respective values in another experiment, faceted by variable (so, plot x from A against x from B, and likewise for y, and z). I would like to do this in a "tidy" way, but the only ways I can find seem more complicated than it should be.

Here's some simulated data to illustrate with:

library(tidyr)
library(dplyr)
library(ggplot2)
# Function to simulate an experiment
simdata &lt;- function(experiment_name) {
  n &lt;- 3 # number of sections
  m &lt;- 7 # number of items per section
  tibble(
    # data points (section-item pairs)
    sect = factor(rep(1:n, ea = m)), item = factor(rep(1:m, n)),
    # simulated observed values of three variables
    x = (1:(n * m))^1.05 + rnorm(n * m),
    y = (1:(n * m))^1.15 + rnorm(n * m, sd = 2),
    z = (1:(n * m))^1.25 + rnorm(n * m, sd = 4),
    experiment = experiment_name
  )
}
# Make an example dataset consisting of 
# data from experiments named &quot;A&quot;, &quot;B&quot;, and &quot;C&quot;
set.seed(42)
d &lt;- bind_rows(simdata(&quot;A&quot;), simdata(&quot;B&quot;), simdata(&quot;C&quot;))

So, d is a dataset with data from three experiments. Here's the first few rows.

r$&gt; d
# A tibble: 63 &#215; 6
   sect  item      x     y      z experiment
   &lt;fct&gt; &lt;fct&gt; &lt;dbl&gt; &lt;dbl&gt;  &lt;dbl&gt; &lt;chr&gt;     
 1 1     1      2.37 -2.56  4.03  A         
 2 1     2      1.51  1.88 -0.528 A         
 3 1     3      3.53  5.97 -1.52  A         
 4 1     4      4.92  8.71  7.39  A         
 5 1     5      5.82  5.50  4.23  A         
# … with 58 more rows

Now, say I want to plot the observations from experiment A against those from experiment B. I'll call these control and alternative:

# a list of two experiment names, to compare
exps &lt;- list(control = &quot;A&quot;, alternative = &quot;B&quot;)

Now here's the part that seems overcomplicated. The best way I can find of doing what I want to do involves two pivots (which seems ugly). This results in columns for each experiment. And then I wrap the experiment names (with sym()) and immediately unwrap (with !!) in order to refer to these columns by name, as seems necessary for tidy evaluation afaiu.

This works, but is there a better way of doing this?

d_reshaped &lt;- d |&gt;
  ## There must be a better way of doing this reshaping
  pivot_longer(
    cols = -c(experiment, sect, item), 
    names_to = &quot;var&quot;, values_to = &quot;value&quot;
  ) |&gt;
  pivot_wider(names_from = c(&quot;experiment&quot;), values_from = &quot;value&quot;)
d_reshaped |&gt;
  ## But I&#39;m mostly looking for a better way to do this dereferencing...
  ggplot(aes(
    !!sym(exps$control),
    !!sym(exps$alternative)
  )) +
  geom_point(alpha = 0.5) +
  facet_grid(~var) +
  coord_fixed() +
  labs(title = paste(&quot;Experiment&quot;, exps, collapse = &quot; vs &quot;))

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?


I can see that instead of the wrapping/unwrapping !!sym part I could use aes_string(exps$control, exps$alternative) but that is soft deprecated, so I get the warning

Warning message:
`aes_string()` was deprecated in ggplot2 3.0.0.
ℹ Please use tidy evaluation ideoms with `aes()`

and so I suppose I shouldn't use it. Anyway, the main thing I wonder is whether there's a better way of doing the whole thing, since I think I must be overcomplicating this, but can't see how.

答案1

得分: 3

一个替代使用!!sym()构造的方法是使用.data[[]]

你可以在ggplot vignette中了解更多相关信息。

library(tidyverse) # 加载整个 tidyverse 以使用 purrr 函数。
library(ggplot2)

d_reshaped |&gt;
  ggplot(aes(
    .data[[exps$control]],
    .data[[exps$alternative]]
  )) +
  geom_point(alpha = 0.5) +
  facet_grid(~var) +
  coord_fixed() +
  labs(title = paste(&quot;Experiment&quot;, exps, collapse = &quot; vs &quot;))  

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

我认为下一步应该是将ggplot代码封装成一个函数,以便重复使用。

假设你有一个包含控制和替代组合的数据框。

exps_tbl &lt;- tibble(
  control = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;),
  alternative = c(&quot;B&quot;, &quot;C&quot;, &quot;A&quot;)
)

你可以将你的代码转化为以下形式的函数:

compare_experiments &lt;- 
  function(exp1, exp2) d_reshaped |&gt;
  ggplot(aes(
    !!sym(exp1),
    !!sym(exp2)
  )) +
  geom_point(alpha = 0.5) +
  facet_grid(~var) +
  coord_fixed() +
  labs(title = paste(&quot;Experiment&quot;, c(exp1, exp2), collapse = &quot; vs &quot;))

然后,使用purrr::map2()来比较所有的组合:

map2(exps_tbl$control,
     exps_tbl$alternative,
     compare_experiments)
#&gt; [[1]]

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

#&gt; 
#&gt; [[2]]

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

#&gt; 
#&gt; [[3]]

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

在连续使用两次pivot_*()调用没有问题。这仍然是简洁的,而且反映了所需的操作:将d的一部分变得更长(var),将另一部分变得更宽(experiment)。

英文:

An alternative to using the !!sym() construct is to use .data[[]] instead.
You can read more about that in this ggplot vignette.

library(tidyverse) # Load all of tidyverse to have purrr functions available.
library(ggplot2)

d_reshaped |&gt;
  ggplot(aes(
    .data[[exps$control]],
    .data[[exps$alternative]]
  )) +
  geom_point(alpha = 0.5) +
  facet_grid(~var) +
  coord_fixed() +
  labs(title = paste(&quot;Experiment&quot;, exps, collapse = &quot; vs &quot;))  

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

I think the next step here would be to wrap the ggplot code in a function,
so you can re-use it.

Let’s say you have a dataframe of control and alternative combinations you
want to compare.

exps_tbl &lt;- tibble(
  control = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;),
  alternative = c(&quot;B&quot;, &quot;C&quot;, &quot;A&quot;)
)

You can turn your code into a function like this:

compare_experiments &lt;- 
  function(exp1, exp2) d_reshaped |&gt;
  ggplot(aes(
    !!sym(exp1),
    !!sym(exp2)
  )) +
  geom_point(alpha = 0.5) +
  facet_grid(~var) +
  coord_fixed() +
  labs(title = paste(&quot;Experiment&quot;, c(exp1, exp2), collapse = &quot; vs &quot;))

Then compare all the combinations with purrr::map2():

map2(exps_tbl$control,
     exps_tbl$alternative,
     compare_experiments)
#&gt; [[1]]

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

#&gt; 
#&gt; [[2]]

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

#&gt; 
#&gt; [[3]]

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?<!-- -->

Nothing is wrong with having two pivot_*() calls
in succession. It’s still concise and it reflects the operation that is needed:
Make one part of d longer (var) and another wider (experiment).

答案2

得分: 2

我认为你目前的做法是合理的。从你现有的数据布局到需要绘制的布局,这是一个适度复杂的数据整理任务。

然而,你只完成了一半,将数据转化为正确的格式,这导致你需要在外部变量中指定实验对,并使用 !!sym(var) 语法。虽然需要付出一些努力,但我认为将数据整理成完美的绘图格式是值得的:

plot_df <- combn(unique(d$experiment), 2) %>%
  apply(2, \(v) filter(d, experiment %in% v)) %>%
  lapply(\(x) split(x, x$experiment)) %>%
  lapply(\(x) cbind(
    x[[1]] |> rename_with(~ paste0(.x, 1)),
    x[[2]] |> rename_with(~ paste0(.x, 2))
  )) %>%
  bind_rows() %>%
  mutate(pair_experiments = paste(experiment1, experiment2, sep = " vs ")) %>%
  select(!matches("^(sect|item|experiment)")) %>%
  pivot_longer(-pair_experiments,
    names_pattern = "(.)(\\d)",
    names_to = c("var", ".value")
  ) %>%
  rename(xvar = `1`, yvar = `2`)

plot_df
#> # A tibble: 189 x 4
#>    pair_experiments var     xvar   yvar
#>    <chr>            <chr>  <dbl>  <dbl>
#>  1 A vs B           x      2.37   2.40 
#>  2 A vs B           y     -2.56  -1.39 
#>  3 A vs B           z      4.03   1.42 
#>  4 A vs B           x      1.51   1.34 
#>  5 A vs B           y      1.88   3.44 
#>  6 A vs B           z     -0.528  0.689
#>  7 A vs B           x      3.53   4.47 
#>  8 A vs B           y      5.97   3.10 
#>  9 A vs B           z     -1.52   3.46 
#> 10 A vs B           x      4.92   4.62 
#> # i 179 more rows
#> # i Use `print(n = ...)` to see more rows

现在你可以在一个多面板图中获得所有的组合:

ggplot(plot_df, aes(xvar, yvar)) +
  geom_point(alpha = 0.5) +
  facet_grid(pair_experiments ~ var, switch = "y") +
  coord_fixed() +
  labs(x = NULL, y = NULL)

即使你不想在一个图中显示所有的对,也可以轻松地过滤以绘制任何你想要的对:

ggplot(plot_df %>% filter(pair_experiments == "A vs B"), aes(xvar, yvar)) +
  geom_point(alpha = 0.5) +
  facet_grid(. ~ var) +
  coord_fixed() +
  labs(x = "A", y = "B")
英文:

I think the way you are doing things is reasonable. It is a moderately complex data wrangling task to go from your existing data layout to the layout you need to plot.

However, you have only gone halfway in getting the data into the correct format, and that leads to you needing to specify pairs of experiments in an external variable and using the !!sym(var) syntax. Although it takes a bit of effort, I think it is worth wrangling your data into the perfect plotting format:

plot_df &lt;- combn(unique(d$experiment), 2) |&gt;
  apply(2, \(v) filter(d, experiment %in% v)) |&gt;
  lapply(\(x) split(x, x$experiment)) |&gt;
  lapply(\(x) cbind(
    x[[1]] |&gt; rename_with(~ paste0(.x, 1)),
    x[[2]] |&gt; rename_with(~ paste0(.x, 2))
  )) |&gt;
  bind_rows() |&gt;
  mutate(pair_experiments = paste(experiment1, experiment2, sep = &quot; vs &quot;)) |&gt;
  select(!matches(&quot;^(sect|item|experiment)&quot;)) |&gt;
  pivot_longer(-pair_experiments,
    names_pattern = &quot;(.)(\\d)&quot;,
    names_to = c(&quot;var&quot;, &quot;.value&quot;)
  ) |&gt;
  rename(xvar = `1`, yvar = `2`)

plot_df
#&gt; # A tibble: 189 x 4
#&gt;    pair_experiments var     xvar   yvar
#&gt;    &lt;chr&gt;            &lt;chr&gt;  &lt;dbl&gt;  &lt;dbl&gt;
#&gt;  1 A vs B           x      2.37   2.40 
#&gt;  2 A vs B           y     -2.56  -1.39 
#&gt;  3 A vs B           z      4.03   1.42 
#&gt;  4 A vs B           x      1.51   1.34 
#&gt;  5 A vs B           y      1.88   3.44 
#&gt;  6 A vs B           z     -0.528  0.689
#&gt;  7 A vs B           x      3.53   4.47 
#&gt;  8 A vs B           y      5.97   3.10 
#&gt;  9 A vs B           z     -1.52   3.46 
#&gt; 10 A vs B           x      4.92   4.62 
#&gt; # i 179 more rows
#&gt; # i Use `print(n = ...)` to see more rows

So now you can get all your combinations in a single faceted plot:

ggplot(plot_df, aes(xvar, yvar)) +
  geom_point(alpha = 0.5) +
  facet_grid(pair_experiments ~ var, switch = &quot;y&quot;) +
  coord_fixed() +
  labs(x = NULL, y = NULL)

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?

And even if you don't want all pairs in one plot, it's trivial to filter to plot any pair you want:

ggplot(plot_df %&gt;% filter(pair_experiments == &quot;A vs B&quot;), aes(xvar, yvar)) +
  geom_point(alpha = 0.5) +
  facet_grid(. ~ var) +
  coord_fixed() +
  labs(x = &quot;A&quot;, y = &quot;B&quot;)

如何使用外部字符值来选择 ggplot 中的 x 和 y 变量?

huangapple
  • 本文由 发表于 2023年7月28日 04:34:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76783241.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定