将 “large” 转换为 “long”。

huangapple go评论115阅读模式
英文:

convert large to long

问题

我想将数据转换为长格式。这是一个重复测量设计,有3个条件。

这是我目前的数据:

参与者编号 测量1(条件1) 测量1(条件2) 测量1(条件3) 测量2(条件1) 测量2(条件2) 测量2(条件3) 测量3(条件1) 测量3(条件2) 测量3(条件3) 年龄 性别
1
2
3

我希望得到的结果是:

参与者编号 条件 测量1 测量2 测量3 年龄 性别
1 1
1 2
1 3

我尝试了以下代码:data_long <- gather(df, condition_measure1, measure1, measure1_cond1, measure1_cond2, measure1_cond3, factor_key=TRUE)

这个方法可以工作,但是我不知道如何将所有3个条件都转换成长格式。我尝试重复相同的代码来处理测量2,但是它没有起作用,它为每个参与者添加了另外3行。

你能帮助我吗?我对R非常陌生,所以请原谅我,我只是想转换我的数据并回到Jamovi ^^
谢谢!

编辑:这是数据的结构:

  1. structure(list(num_pp = c(1, 2, 3, 4, 5, 6), nombre_dp1 = c(24,
  2. 14, 2, 6, 6, 21), nombre_dp05 = c(20, 28, 2, 9, 8, 21), nombre_dp0 = c(24,
  3. 20, 4, 11, 8, 20), jugement_causal_dp1 = c("Oui", "Oui", "Oui",
  4. "Oui", "Oui", "Oui"), jugement_causal_dp05 = c("Non", "Oui",
  5. "Non", "Non", "Oui", "Non"), jugement_causal_dp0 = c("Non", "Non",
  6. "Oui", "Non", "Non", "Non"), confiance_dp1 = c(90, 80, 63, 80,
  7. 90, 80), confiance_dp05 = c(60, 50, 86, 65, 50, 90), confiance_dp0 = c(65,
  8. 60, 55, 43, 50, 80), age = c(33, 22, 20, 20, 18, 18), genre = c("Masculin",
  9. "Feminin", "Feminin", "Feminin", "Feminin", "Feminin"), etude = c("L1",
  10. "L1", "L1", "L1", "L1", "L1"), ordre = c("dp_05|dp_1|dp_0", "dp_0|dp_1|dp_05",
  11. "dp_0|dp_1|dp_05", "dp_0|dp_05|dp_1", "dp_1|dp_05|dp_0", "dp_1|dp_05|dp_0"
  12. ), wdif_dp1dp05 = c(-4, 14, 0, 3, 2, 0)), row.names = c(NA, -6L
  13. ), class = c("tbl_df", "tbl", "data.frame"))
英文:

I want to convert data to a long format. It's a repeated measure design, with 3 conditions.
This is what I have :

participant id measure 1 (cond1) measure 1 (cond2) measure 1 (cond3) measure2 (cond1) measure2 (cond2) measure2 (cond3) measure3 (cond1) measure3(cond2) measure3 (cond3) age gender
1
2
3

And this is what I would like:

participant id condition measure1 measure2 measure3 age gender
1 1
1 2
1 3

I Tried data_long &lt;- gather(df, condition_measure1, measure1, measure1_cond1, measure1_cond2, measure1_cond3, factor_key=TRUE)

It works, but if I don't know how to put all 3 conditions in long format. I tried repeating the same code but for measure2, it did not work, it added another 3 rows for each participant.
Can you hep me ? I a very new to R, so forgive me, I just want to convert my data and go back to Jamovi ^^
Thank you!

edit: here is the data

  1. structure(list(num_pp = c(1, 2, 3, 4, 5, 6), nombre_dp1 = c(24,
  2. 14, 2, 6, 6, 21), nombre_dp05 = c(20, 28, 2, 9, 8, 21), nombre_dp0 = c(24,
  3. 20, 4, 11, 8, 20), jugement_causal_dp1 = c(&quot;Oui&quot;, &quot;Oui&quot;, &quot;Oui&quot;,
  4. &quot;Oui&quot;, &quot;Oui&quot;, &quot;Oui&quot;), jugement_causal_dp05 = c(&quot;Non&quot;, &quot;Oui&quot;,
  5. &quot;Non&quot;, &quot;Non&quot;, &quot;Oui&quot;, &quot;Non&quot;), jugement_causal_dp0 = c(&quot;Non&quot;, &quot;Non&quot;,
  6. &quot;Oui&quot;, &quot;Non&quot;, &quot;Non&quot;, &quot;Non&quot;), confiance_dp1 = c(90, 80, 63, 80,
  7. 90, 80), confiance_dp05 = c(60, 50, 86, 65, 50, 90), confiance_dp0 = c(65,
  8. 60, 55, 43, 50, 80), age = c(33, 22, 20, 20, 18, 18), genre = c(&quot;Masculin&quot;,
  9. &quot;Feminin&quot;, &quot;Feminin&quot;, &quot;Feminin&quot;, &quot;Feminin&quot;, &quot;Feminin&quot;), etude = c(&quot;L1&quot;,
  10. &quot;L1&quot;, &quot;L1&quot;, &quot;L1&quot;, &quot;L1&quot;, &quot;L1&quot;), ordre = c(&quot;dp_05|dp_1|dp_0&quot;, &quot;dp_0|dp_1|dp_05&quot;,
  11. &quot;dp_0|dp_1|dp_05&quot;, &quot;dp_0|dp_05|dp_1&quot;, &quot;dp_1|dp_05|dp_0&quot;, &quot;dp_1|dp_05|dp_0&quot;
  12. ), wdif_dp1dp05 = c(-4, 14, 0, 3, 2, 0)), row.names = c(NA, -6L
  13. ), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;))

答案1

得分: 2

使用OP提供的数据,您可以直接使用pivot_longer函数,如下所示:

  1. df %>%
  2. pivot_longer(matches('_dp\\d+$'), names_to = c('.value', 'dp'),
  3. names_pattern = '(.*)_(\\w+)')
  4. # A tibble: 18 × 10
  5. num_pp age genre etude ordre wdif_dp1dp05 dp nombre jugement_causal confiance
  6. <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <dbl> <chr> <dbl>
  7. 1 1 33 Masculin L1 dp_05 -4 dp1 24 Oui 90
  8. 2 1 33 Masculin L1 dp_05 -4 dp05 20 Non 60
  9. 3 1 33 Masculin L1 dp_05 -4 dp0 24 Non 65
  10. 4 2 22 Feminin L1 dp_0| 14 dp1 14 Oui 80
  11. 5 2 22 Feminin L1 dp_0| 14 dp05 28 Oui 50
  12. 6 2 22 Feminin L1 dp_0| 14 dp0 20 Non 60
  13. 7 3 20 Feminin L1 dp_0| 0 dp1 2 Oui 63
  14. 8 3 20 Feminin L1 dp_0| 0 dp05 2 Non 86
  15. 9 3 20 Feminin L1 dp_0| 0 dp0 4 Oui 55
  16. 10 4 20 Feminin L1 dp_0| 3 dp1 6 Oui 80
  17. 11 4 20 Feminin L1 dp_0| 3 dp05 9 Non 65
  18. 12 4 20 Feminin L1 dp_0| 3 dp0 11 Non 43
  19. 13 5 18 Feminin L1 dp_1| 2 dp1 6 Oui 90
  20. 14 5 18 Feminin L1 dp_1| 2 dp05 8 Oui 50
  21. 15 5 18 Feminin L1 dp_1| 2 dp0 8 Non 50
  22. 16 6 18 Feminin L1 dp_1| 0 dp1 21 Oui 80
  23. 17 6 18 Feminin L1 dp_1| 0 dp05 21 Non 90
  24. 18 6 18 Feminin L1 dp_1| 0 dp0 20 Non 80

我们在names_to中使用.value,以确保这3个测量值分布在不同的列中。

英文:

Using the data OP provided, you could directly use pivot_longer as shown below:

  1. df %&gt;%
  2. pivot_longer(matches(&#39;_dp\\d+$&#39;), names_to = c(&#39;.value&#39;, &#39;dp&#39;),
  3. names_pattern = &#39;(.*)_(\\w+)&#39;)
  4. # A tibble: 18 &#215; 10
  5. num_pp age genre etude ordre wdif_dp1dp05 dp nombre jugement_causal confiance
  6. &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt; &lt;dbl&gt;
  7. 1 1 33 Masculin L1 dp_05 -4 dp1 24 Oui 90
  8. 2 1 33 Masculin L1 dp_05 -4 dp05 20 Non 60
  9. 3 1 33 Masculin L1 dp_05 -4 dp0 24 Non 65
  10. 4 2 22 Feminin L1 dp_0|… 14 dp1 14 Oui 80
  11. 5 2 22 Feminin L1 dp_0|… 14 dp05 28 Oui 50
  12. 6 2 22 Feminin L1 dp_0|… 14 dp0 20 Non 60
  13. 7 3 20 Feminin L1 dp_0|… 0 dp1 2 Oui 63
  14. 8 3 20 Feminin L1 dp_0|… 0 dp05 2 Non 86
  15. 9 3 20 Feminin L1 dp_0|… 0 dp0 4 Oui 55
  16. 10 4 20 Feminin L1 dp_0|… 3 dp1 6 Oui 80
  17. 11 4 20 Feminin L1 dp_0|… 3 dp05 9 Non 65
  18. 12 4 20 Feminin L1 dp_0|… 3 dp0 11 Non 43
  19. 13 5 18 Feminin L1 dp_1|… 2 dp1 6 Oui 90
  20. 14 5 18 Feminin L1 dp_1|… 2 dp05 8 Oui 50
  21. 15 5 18 Feminin L1 dp_1|… 2 dp0 8 Non 50
  22. 16 6 18 Feminin L1 dp_1|… 0 dp1 21 Oui 80
  23. 17 6 18 Feminin L1 dp_1|… 0 dp05 21 Non 90
  24. 18 6 18 Feminin L1 dp_1|… 0 dp0 20 Non 80

We use .value within the names_to to ensure that the 3 measure values are spread across different columns.

答案2

得分: 1

假设数据如下所示:

  1. df <- structure(list(`participant id` = c(1, 2, 3), `measure 1 (cond1)` = c(10,
  2. 12, 8), `measure 1 (cond2)` = c(15, 14, 9), `measure 1 (cond3)` = c(20,
  3. 18, 10), `measure2 (cond1)` = c(25, 22, 15), `measure2 (cond2)` = c(30,
  4. 28, 19), `measure2 (cond3)` = c(35, 30, 22), `measure3 (cond1)` = c(40,
  5. 38, 25), `measure3 (cond2)` = c(45, 42, 28), `measure3 (cond3)` = c(50,
  6. 48, 32), age = c(25, 30, 27), gender = c("Male", "Female", "Male"
  7. )), class = "data.frame", row.names = c(NA, -3L))

你可以这样操作:

  1. library(dplyr)
  2. library(tidyr)
  3. df <- pivot_longer(df,
  4. cols = starts_with("measure"),
  5. names_pattern = "measure ?(\\d+) \\(cond(\\d+)\\)",
  6. names_to = c("measure", "condition")) %>%
  7. mutate(condition = as.integer(condition),
  8. measure = as.integer(measure))
  9. # 输出结果:
  10. `participant id` age gender measure condition value
  11. <dbl> <dbl> <chr> <int> <int> <dbl>
  12. 1 1 25 Male 1 1 10
  13. 2 1 25 Male 1 2 15
  14. 3 1 25 Male 1 3 20
  15. 4 1 25 Male 2 1 25
  16. 5 1 25 Male 2 2 30
  17. 6 1 25 Male 2 3 35
  18. 7 1 25 Male 3 1 40
  19. 8 1 25 Male 3 2 45
  20. 9 1 25 Male 3 3 50
  21. 10 2 30 Female 1 1 12
  22. # ℹ 还有17行数据

要恢复为宽格式:

  1. df %>% pivot_wider(names_from = measure, values_from = value, names_prefix = "measure")
英文:

Assuming the data is something like this:

  1. df &lt;- structure(list(`participant id` = c(1, 2, 3), `measure 1 (cond1)` = c(10,
  2. 12, 8), `measure 1 (cond2)` = c(15, 14, 9), `measure 1 (cond3)` = c(20,
  3. 18, 10), `measure2 (cond1)` = c(25, 22, 15), `measure2 (cond2)` = c(30,
  4. 28, 19), `measure2 (cond3)` = c(35, 30, 22), `measure3 (cond1)` = c(40,
  5. 38, 25), `measure3 (cond2)` = c(45, 42, 28), `measure3 (cond3)` = c(50,
  6. 48, 32), age = c(25, 30, 27), gender = c(&quot;Male&quot;, &quot;Female&quot;, &quot;Male&quot;
  7. )), class = &quot;data.frame&quot;, row.names = c(NA, -3L))

You can do something like this:

  1. library(dplyr)
  2. library(tidyr)
  3. df &lt;- pivot_longer(df,
  4. cols = starts_with(&quot;measure&quot;),
  5. names_pattern = &quot;measure ?(\\d+) \\(cond(\\d+)\\)&quot;,
  6. names_to = c(&quot;measure&quot;, &quot;condition&quot;)) |&gt;
  7. mutate(condition = as.integer(condition),
  8. measure = as.integer(measure))
  9. # Output:
  10. `participant id` age gender measure condition value
  11. &lt;dbl&gt; &lt;dbl&gt; &lt;chr&gt; &lt;int&gt; &lt;int&gt; &lt;dbl&gt;
  12. 1 1 25 Male 1 1 10
  13. 2 1 25 Male 1 2 15
  14. 3 1 25 Male 1 3 20
  15. 4 1 25 Male 2 1 25
  16. 5 1 25 Male 2 2 30
  17. 6 1 25 Male 2 3 35
  18. 7 1 25 Male 3 1 40
  19. 8 1 25 Male 3 2 45
  20. 9 1 25 Male 3 3 50
  21. 10 2 30 Female 1 1 12
  22. # ℹ 17 more rows

To make it wide again:

  1. df |&gt; pivot_wider(names_from = measure, values_from = value, names_prefix = &quot;measure&quot;)

huangapple
  • 本文由 发表于 2023年7月31日 18:17:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/76802640.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定