2023年6月8日 21:05:13go评论94阅读模式

英文:

How to rearrange my data frame per year, with conditions and a calculation at the same time in R?

问题

Sure, here's a snippet of R code that achieves the transformation you described:

library(dplyr)
library(tidyr)
# Filter the data to keep only IDs that appear 3 or more times
df_filtered <- df %>%
  group_by(ID) %>%
  filter(n() >= 3)
# Create a sequence column for each year
df_filtered <- df_filtered %>%
  group_by(ID) %>%
  mutate(year_seq = row_number())
# Pivot the data to wide format
df_pivoted <- df_filtered %>%
  pivot_wider(names_from = year_seq,
              names_glue = "Year{.value}",
              values_from = c(year, value, status))
# Calculate the differences between values
df_pivoted <- df_pivoted %>%
  mutate(DiffValue1 = Value2 - Value1,
         DiffValue2 = Value3 - Value2)
# Rename the columns
colnames(df_pivoted) <- gsub("status_", "Status", colnames(df_pivoted))
colnames(df_pivoted) <- gsub("value_", "Value", colnames(df_pivoted))
colnames(df_pivoted) <- gsub("year_", "Year", colnames(df_pivoted))
# Remove row names
rownames(df_pivoted) <- NULL
# View the final data frame
df_pivoted

This code will filter the data, pivot it into the desired wide format, calculate the differences between values, and rename the columns as shown in your example.

英文:

So I have a data frame that looks like this:

df &lt;- data.frame (ID  = c(&quot;A1&quot;,&quot;A1&quot;,&quot;A1&quot;,&quot;A2&quot;,&quot;A2&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A4&quot;,&quot;A4&quot;,&quot;A4&quot;,&quot;A4&quot;),
                  status = c(1,1,0,1,0,1,1,1,0,1,1,1,1),
                  value = c( 10,12,0,40,42,30,31,34,0,32,34,36,37),
                  year = c(2000,2005,2010,2005,2010,2000,2005,2010,2015,2000,2005,2010,2015
                  ))

I want to transform this df to keep only rows for values that appear 3 or more times in the ID column. I want to arrange it in a way to have per row values that reappear in 3 different years, with the status in each year and the differences between the value in second and first, and third and second year. The final df should look like this:

df &lt;- data.frame (ID  = c(&quot;A1&quot;,&quot;A3&quot;,&quot;A3&quot;,&quot;A4&quot;,&quot;A4&quot;),
                  Year1= c(2000,200,2005,2000,2005),
                  Year2 = c(2005,2005,2010,2005,2010),
                  Year3 = c(2010,2010,2015,2010,2015),
                  Value1 = c(10,30,31,32,34),
                  Value2 = c(12,31,34,34,36),
                  Value3 = c(0,34,0,36,37),
                  DiffValue1 = c(2,1,3,2,2),
                  DiffValue2 = c(0,3,0,2,1),
                  Status1 = c(1,1,1,1,1),
                  Status2 = c(1,1,1,1,1),
                  Status3 = c( 0,1,0,1,1)
)

I know I could start doing this step by step, first subsetting the data to keep only IDs that repeat 3 or more times, then rearrange rows to columns and calculating the differences in values, but is there a way to combine all of this into one snippet of code?

答案1

得分: 1

以下是您要翻译的代码部分：

library(dplyr)
df %>%
  relocate(year, value, status, .after = ID) %>%
  group_by(ID) %>%
  filter(n() > 2) %>%
  mutate(diff = c(0, diff(value)) * status) %>%
  reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %>%
  arrange(ID, year_1) %>%
  select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %>%
  relocate(starts_with("status"), .after = last_col())

希望这对您有所帮助。

英文:

You can try the following:

library(dplyr)
    
df %&gt;%
   relocate(year, value, status, .after = ID) %&gt;%
   group_by(ID) %&gt;%
   filter(n() &gt; 2) %&gt;%
   mutate(diff = c(0, diff(value)) * status) %&gt;%
   reframe(across(everything(), ~ data.frame(embed(rev(.x), 3), check.names = FALSE), .unpack = TRUE)) %&gt;%
   arrange(ID, year_1) %&gt;%
   select(-diff_1, diff_1 = diff_2, diff_2 = diff_3) %&gt;%
   relocate(starts_with(&quot;status&quot;), .after = last_col())
# A tibble: 5 &#215; 12
  ID    year_1 year_2 year_3 value_1 value_2 value_3 diff_1 diff_2 status_1 status_2 status_3
  &lt;chr&gt;  &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;   &lt;dbl&gt;  &lt;dbl&gt;  &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt;
1 A1      2000   2005   2010      10      12       0      2      0        1        1        0
2 A3      2000   2005   2010      30      31      34      1      3        1        1        1
3 A3      2005   2010   2015      31      34       0      3      0        1        1        0
4 A4      2000   2005   2010      32      34      36      2      2        1        1        1
5 A4      2005   2010   2015      34      36      37      2      1        1        1        1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在R中按年重新排列我的数据框，同时带有条件和计算？

问题

答案1

优化数据以避免 “数据基本上是常数” 错误的 t.test。

如何使用if_else输出一个tibble，以及如何使用mutate输出一个list-column？

如何避免在ggarrange中裁剪标签？

使用select函数选择数据集中的所有行，除了一行。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。