如何将宽格式数据重塑为长格式?

huangapple go评论104阅读模式
英文:

How to reshape wide format data to long format?

问题

我有一个宽格式的数据集,其中包含年份变量(yr1、yr2、yr3)和持续时间变量(yr1_time、yr2_time、yr3_time)。yr1范围从2023年到2025年。yr2和yr3分别等于yr1+1或+2。

如何将宽格式数据转换为长格式?以下是我期望的输出:

  1. id yr yr_time
  2. 1 2023 -0.18649844
  3. 1 2024 1.41458053
  4. 1 2025 -1.12031610
  5. 2 2025 -0.01977439
  6. 2 2026 0.68985414
  7. 2 2027 -0.69038076

谢谢!

英文:

I have a wide format dataset, which contains year variables (yr1, yr2, yr3), and duration variables (yr1_time, yr2_time, yr3_time). yr1 ranges from 2023 to 2025. yr2 and yr3 equals to yr1+1 or +2, respectively.

  1. id<-rep(c(1:20),times=1)
  2. df1<-data.frame(id)
  3. df1$yr1 <- sample(2022:2025, length(df1$id), replace=TRUE)
  4. df1$yr1_time <- rnorm(n = 20, mean = 0, sd = 0.6)
  5. df1$yr2 <- df1$yr1+1
  6. df1$yr2_time <- rnorm(n = 20, mean = 0, sd = 0.6)
  7. df1$yr3 <- df1$yr1+2
  8. df1$yr3_time <- rnorm(n = 20, mean = 0, sd = 0.6)
  9. print(df1)
  10. # id yr1 yr1_time yr2 yr2_time yr3 yr3_time
  11. # 1 1 2023 -0.18649844 2024 1.41458053 2025 -1.12031610
  12. # 2 2 2025 -0.01977439 2026 0.68985414 2027 -0.69038076
  13. # 3 3 2023 -0.08855173 2024 0.76039453 2025 -0.36913641
  14. # 4 4 2023 0.28576478 2024 -0.35622031 2025 0.89810598
  15. # 5 5 2024 -0.42831014 2025 -1.28914071 2026 0.44912268
  16. # 6 6 2023 -1.02487195 2024 -0.27391726 2025 -0.62189347
  17. # 7 7 2024 0.16888122 2025 -0.10572896 2026 -0.43966363
  18. # 8 8 2025 0.80350550 2026 0.41403554 2027 -1.41913317
  19. # 9 9 2023 0.59990953 2024 -0.42688373 2025 -0.73899889

How to shape the wide format to the long format? Here is my expected output:

  1. id yr yr_time
  2. 1 2023 -0.18649844
  3. 1 2024 1.41458053
  4. 1 2025 -1.12031610
  5. 2 2025 -0.01977439
  6. 2 2026 0.68985414
  7. 2 2027 -0.69038076

Thanks!

答案1

得分: 1

你可以使用 tidyr 包中的 pivot_longer() 函数 将数据转换为长格式,然后使用 dplyr 包 中的几个函数来得到你想要的最终格式。

  1. library(dplyr)
  2. library(tidyr)
  3. df1 |>
  4. # 将数据转换为长格式
  5. pivot_longer(cols = starts_with("yr")) |>
  6. mutate(
  7. # 如果一行代表一年,将其`value`分配给`year`列
  8. year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
  9. # 将所有值上移一行(这将`yr_time`的值移至相应年份的同一行)
  10. yr_time = lead(value)
  11. ) |>
  12. # 删除不包含年份值的行,因为这些行不需要(而且现在包含错误的`yr_time`值)
  13. filter(!is.na(year)) |>
  14. # 按照你要求的顺序排列列
  15. select(id, year, yr_time)

如果我没有解释得很清楚,尝试分别运行每个步骤,你应该能够看到它是如何工作的。

英文:

You can do this by using pivot_longer() from the tidyr package to convert the data to long format, then several functions from the dplyr package to get to the final format you want.

  1. library(dplyr)
  2. library(tidyr)
  3. df1 |>
  4. # Convert data to long format
  5. pivot_longer(cols = starts_with("yr")) |>
  6. mutate(
  7. # If a row represents a year, assign its `value` to the `year` column
  8. year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
  9. # Move all the values up one row (this moves the `yr_time` for each year
  10. # into the same row as the corresponding year)
  11. yr_time = lead(value)
  12. ) |>
  13. # Remove the rows that don't contain a year value, since those rows aren't
  14. # needed (and now contain the wrong `yr_time` values)
  15. filter(!is.na(year)) |>
  16. # Arrange columns in the order you asked for
  17. select(id, year, yr_time)

If I haven't explained each step well, try running each step separately and you should be able to see how it works.

huangapple
  • 本文由 发表于 2023年3月7日 07:06:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75656635.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定