英文:
How to reshape wide format data to long format?
问题
我有一个宽格式的数据集,其中包含年份变量(yr1、yr2、yr3)和持续时间变量(yr1_time、yr2_time、yr3_time)。yr1范围从2023年到2025年。yr2和yr3分别等于yr1+1或+2。
如何将宽格式数据转换为长格式?以下是我期望的输出:
id yr yr_time
1 2023 -0.18649844
1 2024 1.41458053
1 2025 -1.12031610
2 2025 -0.01977439
2 2026 0.68985414
2 2027 -0.69038076
谢谢!
英文:
I have a wide format dataset, which contains year variables (yr1, yr2, yr3), and duration variables (yr1_time, yr2_time, yr3_time). yr1 ranges from 2023 to 2025. yr2 and yr3 equals to yr1+1 or +2, respectively.
id<-rep(c(1:20),times=1)
df1<-data.frame(id)
df1$yr1 <- sample(2022:2025, length(df1$id), replace=TRUE)
df1$yr1_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr2 <- df1$yr1+1
df1$yr2_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr3 <- df1$yr1+2
df1$yr3_time <- rnorm(n = 20, mean = 0, sd = 0.6)
print(df1)
# id yr1 yr1_time yr2 yr2_time yr3 yr3_time
# 1 1 2023 -0.18649844 2024 1.41458053 2025 -1.12031610
# 2 2 2025 -0.01977439 2026 0.68985414 2027 -0.69038076
# 3 3 2023 -0.08855173 2024 0.76039453 2025 -0.36913641
# 4 4 2023 0.28576478 2024 -0.35622031 2025 0.89810598
# 5 5 2024 -0.42831014 2025 -1.28914071 2026 0.44912268
# 6 6 2023 -1.02487195 2024 -0.27391726 2025 -0.62189347
# 7 7 2024 0.16888122 2025 -0.10572896 2026 -0.43966363
# 8 8 2025 0.80350550 2026 0.41403554 2027 -1.41913317
# 9 9 2023 0.59990953 2024 -0.42688373 2025 -0.73899889
How to shape the wide format to the long format? Here is my expected output:
id yr yr_time
1 2023 -0.18649844
1 2024 1.41458053
1 2025 -1.12031610
2 2025 -0.01977439
2 2026 0.68985414
2 2027 -0.69038076
Thanks!
答案1
得分: 1
你可以使用 tidyr
包中的 pivot_longer()
函数 将数据转换为长格式,然后使用 dplyr 包 中的几个函数来得到你想要的最终格式。
library(dplyr)
library(tidyr)
df1 |>
# 将数据转换为长格式
pivot_longer(cols = starts_with("yr")) |>
mutate(
# 如果一行代表一年,将其`value`分配给`year`列
year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
# 将所有值上移一行(这将`yr_time`的值移至相应年份的同一行)
yr_time = lead(value)
) |>
# 删除不包含年份值的行,因为这些行不需要(而且现在包含错误的`yr_time`值)
filter(!is.na(year)) |>
# 按照你要求的顺序排列列
select(id, year, yr_time)
如果我没有解释得很清楚,尝试分别运行每个步骤,你应该能够看到它是如何工作的。
英文:
You can do this by using pivot_longer()
from the tidyr package to convert the data to long format, then several functions from the dplyr package to get to the final format you want.
library(dplyr)
library(tidyr)
df1 |>
# Convert data to long format
pivot_longer(cols = starts_with("yr")) |>
mutate(
# If a row represents a year, assign its `value` to the `year` column
year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
# Move all the values up one row (this moves the `yr_time` for each year
# into the same row as the corresponding year)
yr_time = lead(value)
) |>
# Remove the rows that don't contain a year value, since those rows aren't
# needed (and now contain the wrong `yr_time` values)
filter(!is.na(year)) |>
# Arrange columns in the order you asked for
select(id, year, yr_time)
If I haven't explained each step well, try running each step separately and you should be able to see how it works.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论