如何在R中按年重新排列我的数据框,同时带有条件和计算?

huangapple go评论94阅读模式
英文:

How to rearrange my data frame per year, with conditions and a calculation at the same time in R?

问题

Sure, here's a snippet of R code that achieves the transformation you described:

  1. library(dplyr)
  2. library(tidyr)
  3. # Filter the data to keep only IDs that appear 3 or more times
  4. df_filtered <- df %>%
  5. group_by(ID) %>%
  6. filter(n() >= 3)
  7. # Create a sequence column for each year
  8. df_filtered <- df_filtered %>%
  9. group_by(ID) %>%
  10. mutate(year_seq = row_number())
  11. # Pivot the data to wide format
  12. df_pivoted <- df_filtered %>%
  13. pivot_wider(names_from = year_seq,
  14. names_glue = "Year{.value}",
  15. values_from = c(year, value, status))
  16. # Calculate the differences between values
  17. df_pivoted <- df_pivoted %>%
  18. mutate(DiffValue1 = Value2 - Value1,
  19. DiffValue2 = Value3 - Value2)
  20. # Rename the columns
  21. colnames(df_pivoted) <- gsub("status_", "Status", colnames(df_pivoted))
  22. colnames(df_pivoted) <- gsub("value_", "Value", colnames(df_pivoted))
  23. colnames(df_pivoted) <- gsub("year_", "Year", colnames(df_pivoted))
  24. # Remove row names
  25. rownames(df_pivoted) <- NULL
  26. # View the final data frame
  27. df_pivoted

This code will filter the data, pivot it into the desired wide format, calculate the differences between values, and rename the columns as shown in your example.

英文:

So I have a data frame that looks like this:

  1. df &lt;- data.frame (ID = c(&quot;A1&quot;,&quot;A1&quot;,&quot;A1&quot;,&quot;A2&quot;,&quot;A2&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A4&quot;,&quot;A4&quot;,&quot;A4&quot;,&quot;A4&quot;),
  2. status = c(1,1,0,1,0,1,1,1,0,1,1,1,1),
  3. value = c( 10,12,0,40,42,30,31,34,0,32,34,36,37),
  4. year = c(2000,2005,2010,2005,2010,2000,2005,2010,2015,2000,2005,2010,2015
  5. ))

I want to transform this df to keep only rows for values that appear 3 or more times in the ID column. I want to arrange it in a way to have per row values that reappear in 3 different years, with the status in each year and the differences between the value in second and first, and third and second year. The final df should look like this:

  1. df &lt;- data.frame (ID = c(&quot;A1&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A4&quot;,&quot;A4&quot;),
  2. Year1= c(2000,200,2005,2000,2005),
  3. Year2 = c(2005,2005,2010,2005,2010),
  4. Year3 = c(2010,2010,2015,2010,2015),
  5. Value1 = c(10,30,31,32,34),
  6. Value2 = c(12,31,34,34,36),
  7. Value3 = c(0,34,0,36,37),
  8. DiffValue1 = c(2,1,3,2,2),
  9. DiffValue2 = c(0,3,0,2,1),
  10. Status1 = c(1,1,1,1,1),
  11. Status2 = c(1,1,1,1,1),
  12. Status3 = c( 0,1,0,1,1)
  13. )

I know I could start doing this step by step, first subsetting the data to keep only IDs that repeat 3 or more times, then rearrange rows to columns and calculating the differences in values, but is there a way to combine all of this into one snippet of code?

答案1

得分: 1

以下是您要翻译的代码部分:

  1. library(dplyr)
  2. df %>%
  3. relocate(year, value, status, .after = ID) %>%
  4. group_by(ID) %>%
  5. filter(n() > 2) %>%
  6. mutate(diff = c(0, diff(value)) * status) %>%
  7. reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %>%
  8. arrange(ID, year_1) %>%
  9. select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %>%
  10. relocate(starts_with("status"), .after = last_col())

希望这对您有所帮助。

英文:

You can try the following:

  1. library(dplyr)
  2. df %&gt;%
  3. relocate(year, value, status, .after = ID) %&gt;%
  4. group_by(ID) %&gt;%
  5. filter(n() &gt; 2) %&gt;%
  6. mutate(diff = c(0, diff(value)) * status) %&gt;%
  7. reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %&gt;%
  8. arrange(ID, year_1) %&gt;%
  9. select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %&gt;%
  10. relocate(starts_with(&quot;status&quot;), .after = last_col())
  11. # A tibble: 5 &#215; 12
  12. ID year_1 year_2 year_3 value_1 value_2 value_3 diff_1 diff_2 status_1 status_2 status_3
  13. &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
  14. 1 A1 2000 2005 2010 10 12 0 2 0 1 1 0
  15. 2 A3 2000 2005 2010 30 31 34 1 3 1 1 1
  16. 3 A3 2005 2010 2015 31 34 0 3 0 1 1 0
  17. 4 A4 2000 2005 2010 32 34 36 2 2 1 1 1
  18. 5 A4 2005 2010 2015 34 36 37 2 1 1 1 1

huangapple
  • 本文由 发表于 2023年6月8日 21:05:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76432162.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定