2023年3月7日 07:06:15go评论132阅读模式

英文:

How to reshape wide format data to long format?

问题

我有一个宽格式的数据集，其中包含年份变量（yr1、yr2、yr3）和持续时间变量（yr1_time、yr2_time、yr3_time）。yr1范围从2023年到2025年。yr2和yr3分别等于yr1+1或+2。

如何将宽格式数据转换为长格式？以下是我期望的输出：

   id    yr      yr_time  
    1   2023  -0.18649844 
    1   2024   1.41458053
    1   2025  -1.12031610
    2   2025  -0.01977439 
    2   2026   0.68985414
    2   2027  -0.69038076

谢谢！

英文:

I have a wide format dataset, which contains year variables (yr1, yr2, yr3), and duration variables (yr1_time, yr2_time, yr3_time). yr1 ranges from 2023 to 2025. yr2 and yr3 equals to yr1+1 or +2, respectively.

id&lt;-rep(c(1:20),times=1)
df1&lt;-data.frame(id)
df1$yr1 &lt;- sample(2022:2025, length(df1$id), replace=TRUE)
df1$yr1_time &lt;- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr2 &lt;- df1$yr1+1
df1$yr2_time &lt;- rnorm(n = 20, mean = 0, sd = 0.6)
df1$yr3 &lt;- df1$yr1+2
df1$yr3_time &lt;- rnorm(n = 20, mean = 0, sd = 0.6)
print(df1)
#   id  yr1    yr1_time  yr2    yr2_time  yr3    yr3_time
# 1   1 2023 -0.18649844 2024  1.41458053 2025 -1.12031610
# 2   2 2025 -0.01977439 2026  0.68985414 2027 -0.69038076
# 3   3 2023 -0.08855173 2024  0.76039453 2025 -0.36913641
# 4   4 2023  0.28576478 2024 -0.35622031 2025  0.89810598
# 5   5 2024 -0.42831014 2025 -1.28914071 2026  0.44912268
# 6   6 2023 -1.02487195 2024 -0.27391726 2025 -0.62189347
# 7   7 2024  0.16888122 2025 -0.10572896 2026 -0.43966363
# 8   8 2025  0.80350550 2026  0.41403554 2027 -1.41913317
# 9   9 2023  0.59990953 2024 -0.42688373 2025 -0.73899889

How to shape the wide format to the long format? Here is my expected output:

   id  yr    yr_time  
    1 2023 -0.18649844 
    1 2024 1.41458053
    1 2025 -1.12031610
    2 2025 -0.01977439 
    2 2026  0.68985414
    2 2027 -0.69038076

Thanks!

答案1

得分: 1

你可以使用 tidyr 包中的 pivot_longer() 函数将数据转换为长格式，然后使用 dplyr 包中的几个函数来得到你想要的最终格式。

library(dplyr)
library(tidyr)
df1 |&gt; 
  # 将数据转换为长格式
  pivot_longer(cols = starts_with("yr")) |&gt; 
  mutate(
    # 如果一行代表一年，将其`value`分配给`year`列
    year = if_else(name %in% c("yr1", "yr2", "yr3"), value, NA_real_),
    # 将所有值上移一行（这将`yr_time`的值移至相应年份的同一行）
    yr_time = lead(value)
  ) |&gt;
  # 删除不包含年份值的行，因为这些行不需要（而且现在包含错误的`yr_time`值）
  filter(!is.na(year)) |&gt; 
  # 按照你要求的顺序排列列
  select(id, year, yr_time)

如果我没有解释得很清楚，尝试分别运行每个步骤，你应该能够看到它是如何工作的。

英文:

You can do this by using pivot_longer() from the tidyr package to convert the data to long format, then several functions from the dplyr package to get to the final format you want.

library(dplyr)
library(tidyr)
df1 |&gt; 
  # Convert data to long format
  pivot_longer(cols = starts_with(&quot;yr&quot;)) |&gt; 
  mutate(
    # If a row represents a year, assign its `value` to the `year` column
    year = if_else(name %in% c(&quot;yr1&quot;, &quot;yr2&quot;, &quot;yr3&quot;), value, NA_real_),
    # Move all the values up one row (this moves the `yr_time` for each year 
    # into the same row as the corresponding year)
    yr_time = lead(value)
  ) |&gt;
  # Remove the rows that don&#39;t contain a year value, since those rows aren&#39;t 
  # needed (and now contain the wrong `yr_time` values)
  filter(!is.na(year)) |&gt; 
  # Arrange columns in the order you asked for
  select(id, year, yr_time)

If I haven't explained each step well, try running each step separately and you should be able to see how it works.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将宽格式数据重塑为长格式？

问题

答案1

如何在R中使用ggplot绘制xts时间序列？

在R中识别数据框中因子之间不重叠的数值。

如何在R中创建数据框的路由

如何使用Purrr/reduce组合数据框对象

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。