如何将宽格式数据重塑为长格式?

huangapple go评论86阅读模式
英文:

How to reshape wide format data to long format?

问题

我有一个宽格式的数据集,其中包含年份变量(yr1、yr2、yr3)和持续时间变量(yr1_time、yr2_time、yr3_time)。yr1范围从2023年到2025年。yr2和yr3分别等于yr1+1或+2。

如何将宽格式数据转换为长格式?以下是我期望的输出:

   id    yr      yr_time  
    1   2023  -0.18649844 
    1   2024   1.41458053
    1   2025  -1.12031610
    2   2025  -0.01977439 
    2   2026   0.68985414
    2   2027  -0.69038076

谢谢!

英文:

I have a wide format dataset, which contains year variables (yr1, yr2, yr3), and duration variables (yr1_time, yr2_time, yr3_time). yr1 ranges from 2023 to 2025. yr2 and yr3 equals to yr1+1 or +2, respectively.

id<-rep(c(1:20),times=1)
df1<-data.frame(id)
df1$yr1 <- sample(2022:2025, length(df1$id), replace=TRUE)
df1$yr1_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr2 <- df1$yr1+1
df1$yr2_time <- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr3 <- df1$yr1+2
df1$yr3_time <- rnorm(n = 20, mean = 0, sd = 0.6)

print(df1)

#   id  yr1    yr1_time  yr2    yr2_time  yr3    yr3_time
# 1   1 2023 -0.18649844 2024  1.41458053 2025 -1.12031610
# 2   2 2025 -0.01977439 2026  0.68985414 2027 -0.69038076
# 3   3 2023 -0.08855173 2024  0.76039453 2025 -0.36913641
# 4   4 2023  0.28576478 2024 -0.35622031 2025  0.89810598
# 5   5 2024 -0.42831014 2025 -1.28914071 2026  0.44912268
# 6   6 2023 -1.02487195 2024 -0.27391726 2025 -0.62189347
# 7   7 2024  0.16888122 2025 -0.10572896 2026 -0.43966363
# 8   8 2025  0.80350550 2026  0.41403554 2027 -1.41913317
# 9   9 2023  0.59990953 2024 -0.42688373 2025 -0.73899889

How to shape the wide format to the long format? Here is my expected output:

   id  yr    yr_time  
    1 2023 -0.18649844 
    1 2024 1.41458053
    1 2025 -1.12031610
    2 2025 -0.01977439 
    2 2026  0.68985414
    2 2027 -0.69038076



Thanks!

答案1

得分: 1

你可以使用 tidyr 包中的 pivot_longer() 函数 将数据转换为长格式,然后使用 dplyr 包 中的几个函数来得到你想要的最终格式。

library(dplyr)
library(tidyr)

df1 |> 
  # 将数据转换为长格式
  pivot_longer(cols = starts_with("yr")) |> 
  mutate(
    # 如果一行代表一年,将其`value`分配给`year`列
    year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
    # 将所有值上移一行(这将`yr_time`的值移至相应年份的同一行)
    yr_time = lead(value)
  ) |>
  # 删除不包含年份值的行,因为这些行不需要(而且现在包含错误的`yr_time`值)
  filter(!is.na(year)) |> 
  # 按照你要求的顺序排列列
  select(id, year, yr_time)

如果我没有解释得很清楚,尝试分别运行每个步骤,你应该能够看到它是如何工作的。

英文:

You can do this by using pivot_longer() from the tidyr package to convert the data to long format, then several functions from the dplyr package to get to the final format you want.

library(dplyr)
library(tidyr)

df1 |> 
  # Convert data to long format
  pivot_longer(cols = starts_with("yr")) |> 
  mutate(
    # If a row represents a year, assign its `value` to the `year` column
    year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
    # Move all the values up one row (this moves the `yr_time` for each year 
    # into the same row as the corresponding year)
    yr_time = lead(value)
  ) |>
  # Remove the rows that don't contain a year value, since those rows aren't 
  # needed (and now contain the wrong `yr_time` values)
  filter(!is.na(year)) |> 
  # Arrange columns in the order you asked for
  select(id, year, yr_time)

If I haven't explained each step well, try running each step separately and you should be able to see how it works.

huangapple
  • 本文由 发表于 2023年3月7日 07:06:15
  • 转载请务必保留本文链接:https://go.coder-hub.com/75656635.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定