如何在R中按年重新排列我的数据框,同时带有条件和计算?

huangapple go评论52阅读模式
英文:

How to rearrange my data frame per year, with conditions and a calculation at the same time in R?

问题

Sure, here's a snippet of R code that achieves the transformation you described:

library(dplyr)
library(tidyr)

# Filter the data to keep only IDs that appear 3 or more times
df_filtered <- df %>%
  group_by(ID) %>%
  filter(n() >= 3)

# Create a sequence column for each year
df_filtered <- df_filtered %>%
  group_by(ID) %>%
  mutate(year_seq = row_number())

# Pivot the data to wide format
df_pivoted <- df_filtered %>%
  pivot_wider(names_from = year_seq,
              names_glue = "Year{.value}",
              values_from = c(year, value, status))

# Calculate the differences between values
df_pivoted <- df_pivoted %>%
  mutate(DiffValue1 = Value2 - Value1,
         DiffValue2 = Value3 - Value2)

# Rename the columns
colnames(df_pivoted) <- gsub("status_", "Status", colnames(df_pivoted))
colnames(df_pivoted) <- gsub("value_", "Value", colnames(df_pivoted))
colnames(df_pivoted) <- gsub("year_", "Year", colnames(df_pivoted))

# Remove row names
rownames(df_pivoted) <- NULL

# View the final data frame
df_pivoted

This code will filter the data, pivot it into the desired wide format, calculate the differences between values, and rename the columns as shown in your example.

英文:

So I have a data frame that looks like this:

df &lt;- data.frame (ID  = c(&quot;A1&quot;,&quot;A1&quot;,&quot;A1&quot;,&quot;A2&quot;,&quot;A2&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A4&quot;,&quot;A4&quot;,&quot;A4&quot;,&quot;A4&quot;),
                  status = c(1,1,0,1,0,1,1,1,0,1,1,1,1),
                  value = c( 10,12,0,40,42,30,31,34,0,32,34,36,37),
                  year = c(2000,2005,2010,2005,2010,2000,2005,2010,2015,2000,2005,2010,2015
                  ))

I want to transform this df to keep only rows for values that appear 3 or more times in the ID column. I want to arrange it in a way to have per row values that reappear in 3 different years, with the status in each year and the differences between the value in second and first, and third and second year. The final df should look like this:

df &lt;- data.frame (ID  = c(&quot;A1&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A4&quot;,&quot;A4&quot;),
                  Year1= c(2000,200,2005,2000,2005),
                  Year2 = c(2005,2005,2010,2005,2010),
                  Year3 = c(2010,2010,2015,2010,2015),
                  Value1 = c(10,30,31,32,34),
                  Value2 = c(12,31,34,34,36),
                  Value3 = c(0,34,0,36,37),
                  DiffValue1 = c(2,1,3,2,2),
                  DiffValue2 = c(0,3,0,2,1),
                  Status1 = c(1,1,1,1,1),
                  Status2 = c(1,1,1,1,1),
                  Status3 = c( 0,1,0,1,1)
)

I know I could start doing this step by step, first subsetting the data to keep only IDs that repeat 3 or more times, then rearrange rows to columns and calculating the differences in values, but is there a way to combine all of this into one snippet of code?

答案1

得分: 1

以下是您要翻译的代码部分:

library(dplyr)

df %>%
  relocate(year, value, status, .after = ID) %>%
  group_by(ID) %>%
  filter(n() > 2) %>%
  mutate(diff = c(0, diff(value)) * status) %>%
  reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %>%
  arrange(ID, year_1) %>%
  select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %>%
  relocate(starts_with("status"), .after = last_col())

希望这对您有所帮助。

英文:

You can try the following:

library(dplyr)
    
df %&gt;%
   relocate(year, value, status, .after = ID) %&gt;%
   group_by(ID) %&gt;%
   filter(n() &gt; 2) %&gt;%
   mutate(diff = c(0, diff(value)) * status) %&gt;%
   reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %&gt;%
   arrange(ID, year_1) %&gt;%
   select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %&gt;%
   relocate(starts_with(&quot;status&quot;), .after = last_col())

# A tibble: 5 &#215; 12
  ID    year_1 year_2 year_3 value_1 value_2 value_3 diff_1 diff_2 status_1 status_2 status_3
  &lt;chr&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
1 A1      2000   2005   2010      10      12       0      2      0        1        1        0
2 A3      2000   2005   2010      30      31      34      1      3        1        1        1
3 A3      2005   2010   2015      31      34       0      3      0        1        1        0
4 A4      2000   2005   2010      32      34      36      2      2        1        1        1
5 A4      2005   2010   2015      34      36      37      2      1        1        1        1

huangapple
  • 本文由 发表于 2023年6月8日 21:05:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76432162.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定