将 “large” 转换为 “long”。

huangapple go评论79阅读模式
英文:

convert large to long

问题

我想将数据转换为长格式。这是一个重复测量设计,有3个条件。

这是我目前的数据:

参与者编号 测量1(条件1) 测量1(条件2) 测量1(条件3) 测量2(条件1) 测量2(条件2) 测量2(条件3) 测量3(条件1) 测量3(条件2) 测量3(条件3) 年龄 性别
1
2
3

我希望得到的结果是:

参与者编号 条件 测量1 测量2 测量3 年龄 性别
1 1
1 2
1 3

我尝试了以下代码:data_long <- gather(df, condition_measure1, measure1, measure1_cond1, measure1_cond2, measure1_cond3, factor_key=TRUE)

这个方法可以工作,但是我不知道如何将所有3个条件都转换成长格式。我尝试重复相同的代码来处理测量2,但是它没有起作用,它为每个参与者添加了另外3行。

你能帮助我吗?我对R非常陌生,所以请原谅我,我只是想转换我的数据并回到Jamovi ^^
谢谢!

编辑:这是数据的结构:

structure(list(num_pp = c(1, 2, 3, 4, 5, 6), nombre_dp1 = c(24, 
14, 2, 6, 6, 21), nombre_dp05 = c(20, 28, 2, 9, 8, 21), nombre_dp0 = c(24, 
20, 4, 11, 8, 20), jugement_causal_dp1 = c("Oui", "Oui", "Oui", 
"Oui", "Oui", "Oui"), jugement_causal_dp05 = c("Non", "Oui", 
"Non", "Non", "Oui", "Non"), jugement_causal_dp0 = c("Non", "Non", 
"Oui", "Non", "Non", "Non"), confiance_dp1 = c(90, 80, 63, 80, 
90, 80), confiance_dp05 = c(60, 50, 86, 65, 50, 90), confiance_dp0 = c(65, 
60, 55, 43, 50, 80), age = c(33, 22, 20, 20, 18, 18), genre = c("Masculin", 
"Feminin", "Feminin", "Feminin", "Feminin", "Feminin"), etude = c("L1", 
"L1", "L1", "L1", "L1", "L1"), ordre = c("dp_05|dp_1|dp_0", "dp_0|dp_1|dp_05", 
"dp_0|dp_1|dp_05", "dp_0|dp_05|dp_1", "dp_1|dp_05|dp_0", "dp_1|dp_05|dp_0"
), wdif_dp1dp05 = c(-4, 14, 0, 3, 2, 0)), row.names = c(NA, -6L
), class = c("tbl_df", "tbl", "data.frame"))
英文:

I want to convert data to a long format. It's a repeated measure design, with 3 conditions.
This is what I have :

participant id measure 1 (cond1) measure 1 (cond2) measure 1 (cond3) measure2 (cond1) measure2 (cond2) measure2 (cond3) measure3 (cond1) measure3(cond2) measure3 (cond3) age gender
1
2
3

And this is what I would like:

participant id condition measure1 measure2 measure3 age gender
1 1
1 2
1 3

I Tried data_long &lt;- gather(df, condition_measure1, measure1, measure1_cond1, measure1_cond2, measure1_cond3, factor_key=TRUE)

It works, but if I don't know how to put all 3 conditions in long format. I tried repeating the same code but for measure2, it did not work, it added another 3 rows for each participant.
Can you hep me ? I a very new to R, so forgive me, I just want to convert my data and go back to Jamovi ^^
Thank you!

edit: here is the data

structure(list(num_pp = c(1, 2, 3, 4, 5, 6), nombre_dp1 = c(24, 
14, 2, 6, 6, 21), nombre_dp05 = c(20, 28, 2, 9, 8, 21), nombre_dp0 = c(24, 
20, 4, 11, 8, 20), jugement_causal_dp1 = c(&quot;Oui&quot;, &quot;Oui&quot;, &quot;Oui&quot;, 
&quot;Oui&quot;, &quot;Oui&quot;, &quot;Oui&quot;), jugement_causal_dp05 = c(&quot;Non&quot;, &quot;Oui&quot;, 
&quot;Non&quot;, &quot;Non&quot;, &quot;Oui&quot;, &quot;Non&quot;), jugement_causal_dp0 = c(&quot;Non&quot;, &quot;Non&quot;, 
&quot;Oui&quot;, &quot;Non&quot;, &quot;Non&quot;, &quot;Non&quot;), confiance_dp1 = c(90, 80, 63, 80, 
90, 80), confiance_dp05 = c(60, 50, 86, 65, 50, 90), confiance_dp0 = c(65, 
60, 55, 43, 50, 80), age = c(33, 22, 20, 20, 18, 18), genre = c(&quot;Masculin&quot;, 
&quot;Feminin&quot;, &quot;Feminin&quot;, &quot;Feminin&quot;, &quot;Feminin&quot;, &quot;Feminin&quot;), etude = c(&quot;L1&quot;, 
&quot;L1&quot;, &quot;L1&quot;, &quot;L1&quot;, &quot;L1&quot;, &quot;L1&quot;), ordre = c(&quot;dp_05|dp_1|dp_0&quot;, &quot;dp_0|dp_1|dp_05&quot;, 
&quot;dp_0|dp_1|dp_05&quot;, &quot;dp_0|dp_05|dp_1&quot;, &quot;dp_1|dp_05|dp_0&quot;, &quot;dp_1|dp_05|dp_0&quot;
), wdif_dp1dp05 = c(-4, 14, 0, 3, 2, 0)), row.names = c(NA, -6L
), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))

答案1

得分: 2

使用OP提供的数据,您可以直接使用pivot_longer函数,如下所示:

df %>%
  pivot_longer(matches('_dp\\d+$'), names_to = c('.value', 'dp'), 
                names_pattern = '(.*)_(\\w+)')
# A tibble: 18 × 10
   num_pp   age genre    etude ordre  wdif_dp1dp05 dp    nombre jugement_causal confiance
    <dbl> <dbl> <chr>    <chr> <chr>         <dbl> <chr>  <dbl> <chr>               <dbl>
 1      1    33 Masculin L1    dp_05…           -4 dp1       24 Oui                    90
 2      1    33 Masculin L1    dp_05…           -4 dp05      20 Non                    60
 3      1    33 Masculin L1    dp_05…           -4 dp0       24 Non                    65
 4      2    22 Feminin  L1    dp_0|14 dp1       14 Oui                    80
 5      2    22 Feminin  L1    dp_0|14 dp05      28 Oui                    50
 6      2    22 Feminin  L1    dp_0|14 dp0       20 Non                    60
 7      3    20 Feminin  L1    dp_0|0 dp1        2 Oui                    63
 8      3    20 Feminin  L1    dp_0|0 dp05       2 Non                    86
 9      3    20 Feminin  L1    dp_0|0 dp0        4 Oui                    55
10      4    20 Feminin  L1    dp_0|3 dp1        6 Oui                    80
11      4    20 Feminin  L1    dp_0|3 dp05       9 Non                    65
12      4    20 Feminin  L1    dp_0|3 dp0       11 Non                    43
13      5    18 Feminin  L1    dp_1|2 dp1        6 Oui                    90
14      5    18 Feminin  L1    dp_1|2 dp05       8 Oui                    50
15      5    18 Feminin  L1    dp_1|2 dp0        8 Non                    50
16      6    18 Feminin  L1    dp_1|0 dp1       21 Oui                    80
17      6    18 Feminin  L1    dp_1|0 dp05      21 Non                    90
18      6    18 Feminin  L1    dp_1|0 dp0       20 Non                    80

我们在names_to中使用.value,以确保这3个测量值分布在不同的列中。

英文:

Using the data OP provided, you could directly use pivot_longer as shown below:

df %&gt;%
pivot_longer(matches(&#39;_dp\\d+$&#39;), names_to = c(&#39;.value&#39;, &#39;dp&#39;), 
names_pattern = &#39;(.*)_(\\w+)&#39;)
# A tibble: 18 &#215; 10
num_pp   age genre    etude ordre  wdif_dp1dp05 dp    nombre jugement_causal confiance
&lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;    &lt;chr&gt; &lt;chr&gt;         &lt;dbl&gt; &lt;chr&gt;  &lt;dbl&gt; &lt;chr&gt;               &lt;dbl&gt;
1      1    33 Masculin L1    dp_05…           -4 dp1       24 Oui                    90
2      1    33 Masculin L1    dp_05…           -4 dp05      20 Non                    60
3      1    33 Masculin L1    dp_05…           -4 dp0       24 Non                    65
4      2    22 Feminin  L1    dp_0|…           14 dp1       14 Oui                    80
5      2    22 Feminin  L1    dp_0|…           14 dp05      28 Oui                    50
6      2    22 Feminin  L1    dp_0|…           14 dp0       20 Non                    60
7      3    20 Feminin  L1    dp_0|…            0 dp1        2 Oui                    63
8      3    20 Feminin  L1    dp_0|…            0 dp05       2 Non                    86
9      3    20 Feminin  L1    dp_0|…            0 dp0        4 Oui                    55
10      4    20 Feminin  L1    dp_0|…            3 dp1        6 Oui                    80
11      4    20 Feminin  L1    dp_0|…            3 dp05       9 Non                    65
12      4    20 Feminin  L1    dp_0|…            3 dp0       11 Non                    43
13      5    18 Feminin  L1    dp_1|…            2 dp1        6 Oui                    90
14      5    18 Feminin  L1    dp_1|…            2 dp05       8 Oui                    50
15      5    18 Feminin  L1    dp_1|…            2 dp0        8 Non                    50
16      6    18 Feminin  L1    dp_1|…            0 dp1       21 Oui                    80
17      6    18 Feminin  L1    dp_1|…            0 dp05      21 Non                    90
18      6    18 Feminin  L1    dp_1|…            0 dp0       20 Non                    80

We use .value within the names_to to ensure that the 3 measure values are spread across different columns.

答案2

得分: 1

假设数据如下所示:

df <- structure(list(`participant id` = c(1, 2, 3), `measure 1 (cond1)` = c(10, 
12, 8), `measure 1 (cond2)` = c(15, 14, 9), `measure 1 (cond3)` = c(20, 
18, 10), `measure2 (cond1)` = c(25, 22, 15), `measure2 (cond2)` = c(30, 
28, 19), `measure2 (cond3)` = c(35, 30, 22), `measure3 (cond1)` = c(40, 
38, 25), `measure3 (cond2)` = c(45, 42, 28), `measure3 (cond3)` = c(50, 
48, 32), age = c(25, 30, 27), gender = c("Male", "Female", "Male"
)), class = "data.frame", row.names = c(NA, -3L))

你可以这样操作:

library(dplyr)
library(tidyr)

df <- pivot_longer(df,
             cols = starts_with("measure"), 
             names_pattern = "measure ?(\\d+) \\(cond(\\d+)\\)",
             names_to = c("measure", "condition")) %>%
    mutate(condition = as.integer(condition),
           measure = as.integer(measure))

# 输出结果:
   `participant id`   age gender measure condition value
              <dbl> <dbl> <chr>    <int>     <int> <dbl>
 1                1    25 Male         1         1    10
 2                1    25 Male         1         2    15
 3                1    25 Male         1         3    20
 4                1    25 Male         2         1    25
 5                1    25 Male         2         2    30
 6                1    25 Male         2         3    35
 7                1    25 Male         3         1    40
 8                1    25 Male         3         2    45
 9                1    25 Male         3         3    50
10                2    30 Female       1         1    12
# ℹ 还有17行数据

要恢复为宽格式:

df %>% pivot_wider(names_from = measure, values_from = value, names_prefix = "measure")
英文:

Assuming the data is something like this:

df &lt;- structure(list(`participant id` = c(1, 2, 3), `measure 1 (cond1)` = c(10, 
12, 8), `measure 1 (cond2)` = c(15, 14, 9), `measure 1 (cond3)` = c(20, 
18, 10), `measure2 (cond1)` = c(25, 22, 15), `measure2 (cond2)` = c(30, 
28, 19), `measure2 (cond3)` = c(35, 30, 22), `measure3 (cond1)` = c(40, 
38, 25), `measure3 (cond2)` = c(45, 42, 28), `measure3 (cond3)` = c(50, 
48, 32), age = c(25, 30, 27), gender = c(&quot;Male&quot;, &quot;Female&quot;, &quot;Male&quot;
)), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

You can do something like this:

library(dplyr)
library(tidyr)
df &lt;- pivot_longer(df,
cols = starts_with(&quot;measure&quot;), 
names_pattern = &quot;measure ?(\\d+) \\(cond(\\d+)\\)&quot;,
names_to = c(&quot;measure&quot;, &quot;condition&quot;)) |&gt;
mutate(condition = as.integer(condition),
measure = as.integer(measure))
# Output:
`participant id`   age gender measure condition value
&lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt;    &lt;int&gt;     &lt;int&gt; &lt;dbl&gt;
1                1    25 Male         1         1    10
2                1    25 Male         1         2    15
3                1    25 Male         1         3    20
4                1    25 Male         2         1    25
5                1    25 Male         2         2    30
6                1    25 Male         2         3    35
7                1    25 Male         3         1    40
8                1    25 Male         3         2    45
9                1    25 Male         3         3    50
10                2    30 Female       1         1    12
# ℹ 17 more rows

To make it wide again:

df |&gt; pivot_wider(names_from = measure, values_from = value, names_prefix = &quot;measure&quot;)

huangapple
  • 本文由 发表于 2023年7月31日 18:17:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76802640.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定