英文:
How to rearrange my data frame per year, with conditions and a calculation at the same time in R?
问题
Sure, here's a snippet of R code that achieves the transformation you described:
library(dplyr)
library(tidyr)
# Filter the data to keep only IDs that appear 3 or more times
df_filtered <- df %>%
group_by(ID) %>%
filter(n() >= 3)
# Create a sequence column for each year
df_filtered <- df_filtered %>%
group_by(ID) %>%
mutate(year_seq = row_number())
# Pivot the data to wide format
df_pivoted <- df_filtered %>%
pivot_wider(names_from = year_seq,
names_glue = "Year{.value}",
values_from = c(year, value, status))
# Calculate the differences between values
df_pivoted <- df_pivoted %>%
mutate(DiffValue1 = Value2 - Value1,
DiffValue2 = Value3 - Value2)
# Rename the columns
colnames(df_pivoted) <- gsub("status_", "Status", colnames(df_pivoted))
colnames(df_pivoted) <- gsub("value_", "Value", colnames(df_pivoted))
colnames(df_pivoted) <- gsub("year_", "Year", colnames(df_pivoted))
# Remove row names
rownames(df_pivoted) <- NULL
# View the final data frame
df_pivoted
This code will filter the data, pivot it into the desired wide format, calculate the differences between values, and rename the columns as shown in your example.
英文:
So I have a data frame that looks like this:
df <- data.frame (ID = c("A1","A1","A1","A2","A2","A3","A3","A3","A3","A4","A4","A4","A4"),
status = c(1,1,0,1,0,1,1,1,0,1,1,1,1),
value = c( 10,12,0,40,42,30,31,34,0,32,34,36,37),
year = c(2000,2005,2010,2005,2010,2000,2005,2010,2015,2000,2005,2010,2015
))
I want to transform this df
to keep only rows for values that appear 3 or more times in the ID
column. I want to arrange it in a way to have per row values that reappear in 3 different years, with the status
in each year and the differences between the value
in second and first, and third and second year. The final df
should look like this:
df <- data.frame (ID = c("A1","A3","A3","A4","A4"),
Year1= c(2000,200,2005,2000,2005),
Year2 = c(2005,2005,2010,2005,2010),
Year3 = c(2010,2010,2015,2010,2015),
Value1 = c(10,30,31,32,34),
Value2 = c(12,31,34,34,36),
Value3 = c(0,34,0,36,37),
DiffValue1 = c(2,1,3,2,2),
DiffValue2 = c(0,3,0,2,1),
Status1 = c(1,1,1,1,1),
Status2 = c(1,1,1,1,1),
Status3 = c( 0,1,0,1,1)
)
I know I could start doing this step by step, first subsetting the data to keep only ID
s that repeat 3 or more times, then rearrange rows to columns and calculating the differences in values, but is there a way to combine all of this into one snippet of code?
答案1
得分: 1
以下是您要翻译的代码部分:
library(dplyr)
df %>%
relocate(year, value, status, .after = ID) %>%
group_by(ID) %>%
filter(n() > 2) %>%
mutate(diff = c(0, diff(value)) * status) %>%
reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %>%
arrange(ID, year_1) %>%
select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %>%
relocate(starts_with("status"), .after = last_col())
希望这对您有所帮助。
英文:
You can try the following:
library(dplyr)
df %>%
relocate(year, value, status, .after = ID) %>%
group_by(ID) %>%
filter(n() > 2) %>%
mutate(diff = c(0, diff(value)) * status) %>%
reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %>%
arrange(ID, year_1) %>%
select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %>%
relocate(starts_with("status"), .after = last_col())
# A tibble: 5 × 12
ID year_1 year_2 year_3 value_1 value_2 value_3 diff_1 diff_2 status_1 status_2 status_3
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1 2000 2005 2010 10 12 0 2 0 1 1 0
2 A3 2000 2005 2010 30 31 34 1 3 1 1 1
3 A3 2005 2010 2015 31 34 0 3 0 1 1 0
4 A4 2000 2005 2010 32 34 36 2 2 1 1 1
5 A4 2005 2010 2015 34 36 37 2 1 1 1 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论