在R中使用group_by计算每日平均值时出现日期/时间问题。

huangapple go评论58阅读模式
英文:

Problem in Date/Time while calculating daily average from hourly average using group_by in R

问题

我已经使用以下的R代码计算了每小时的平均值。每小时的平均值存储在一个名为 'avghr' 的对象中。由于数据预处理生成了 'merged' 和 'avghr',我想要使用 'avghr' 来计算每日平均值。我遇到了错误 'Error in group_by(): 在参数中:Time_sp = lubridate::floor_date(Time_sp, "1 day")。由于在计算 'avghr' 时对数据进行了处理,即使我删除了计算 'avghr' 时的特定代码行 mutate(Time_sp...),仍然会出现相同的错误。

我可以单独使用相同的代码计算每小时或每日平均值。问题出现在使用每小时平均值对象计算每日平均值时。请问有人可以帮忙解决吗?

以下是 'avghr' 的前几行输出:

structure(list(Time_sp = structure(c(788918400, 788922000, 788925600, 
788929200, 788932800, 788936400), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `C1 (C1)` = c(84.6666666666667, 90.95, 94.7333333333333, 
95.95, 95.4666666666667, 90.3833333333333), `C2 (C2)` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `C5 (C5)` = 
c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `C8 
(C8)` = c(911.983333333333, 1062.41666666667, 1147.88333333333, 
1156.15, 1089.73333333333, 956.233333333333), `C11 (C11)` = 
c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `C14 
(C14)` = c(1139.86233333333, 1304.17933333333, 1394.967, 
1406.02683333333, 1336.59633333333, 1191.40616666667)), row.names = 
c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))
英文:

I have calculated hourly average using the following code in R. Hourly average is stored in an object 'avghr'. Because of the data preprocessing done to obtain 'merged' followed by 'avghr', I want to use 'avghr' to calculate daily average. I am facing error 'Error in group_by():
ℹ In argument: Time_sp = lubridate::floor_date(Time_sp, "1 day").
Caused by error in lubridate::floor_date():
! object 'Time_sp' not found'.
Could anyone please help me in resolving the error.

Code:

 merged <- rbindlist(dt.tidied, fill = TRUE, use.names = TRUE)
 
 #Required Columns
 cn <- c('Date/Time', 'C1 (C1)', 'C2 (C2)', 'C5 (C5)', 'C8 (C8)', 'C11 (C11)', 'C14 (C14)')                                      

 #Calc 15min average
 avghr <- merged %>%
 select(any_of(cn)) %>%
 as_tibble() %>%
 group_by(Time_sp = lubridate::floor_date(`Date/Time`, "60 mins")) %>%               #I want Date/Time column to be written as Time_sp in the output.
 #mutate(Time_sp = format(Date_Time, "%Y-%m-%d %H:%M:%S+00")) %>%                    # #I want Time_sp column to be written in the specified way. 
 summarise(across(where(is.numeric), ~ if(mean(is.na(.x)) > 0.5) NA else mean(.x, na.rm = 
                                                                               TRUE)))

write.csv(avghr, paste0(dirlist[idx],"_hr.csv"), row.names = FALSE)


 #Calc daily average
 avgdl <- avghr %>%
 select(any_of(cn)) %>%
 as_tibble() %>%
 group_by(Time_sp = lubridate::floor_date(`Time_sp`, "1 day")) %>%
 #mutate(Time_sp = format(Date_Time, "%Y-%m-%d %H:%M:%S+00")) %>%
 summarise(across(where(is.numeric), ~ if(sum(is.na(.x)) > 1) NA else mean(.x, na.rm = 
                                                                               TRUE)))

write.csv(avgdl, paste0(dirlist[idx],"_dly.csv"), row.names = FALSE)

}

I understand that the mutated Time_sp column is no longer in Date/Time format. But even if I remove the particular code line mutate(Time_sp...) while calculating avghr, it still gives me the same error.

I am able to calculate hourly or daily average individually using the same code. The problem is occurring while using hourly average object to calculate daily average. Can anyone please help.

Output of dput(head(avghr))

structure(list(Time_sp = structure(c(788918400, 788922000, 788925600, 
788929200, 788932800, 788936400), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), `C1 (C1)` = c(84.6666666666667, 90.95, 94.7333333333333, 
95.95, 95.4666666666667, 90.3833333333333), `C2 (C2)` = c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `C5 (C5)` = 
c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `C8 
(C8)` = c(911.983333333333, 1062.41666666667, 1147.88333333333, 
1156.15, 1089.73333333333, 956.233333333333), `C11 (C11)` = 
c(NA_real_, NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), `C14 
(C14)` = c(1139.86233333333, 1304.17933333333, 1394.967, 
1406.02683333333, 1336.59633333333, 1191.40616666667)), row.names = 
c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"))

答案1

得分: 1

#计算15分钟平均
avghr <- merged %>%
  select(any_of(cn)) %>%
  as_tibble() %>%
  group_by(Time_sp = lubridate::floor_date(`Date/Time`, "60 mins")) %>% 
  summarise(across(where(is.numeric), ~ if(mean(is.na(.x)) > 0.5) NA else mean(.x, na.rm = TRUE)))

mutate(avghr, Time_sp = format(Time_sp, "%Y-%m-%d %H:%M:%S+00")) %>%
  write.csv(paste0(dirlist[idx],"_hr.csv"), row.names = FALSE)

#计算每日平均
avgdl <- avghr %>%
  group_by(Time_sp = lubridate::floor_date(`Time_sp`, "1 day")) %>%
  summarise(across(where(is.numeric), ~ if(sum(is.na(.x)) > 1) NA else mean(.x, na.rm = TRUE))) %>% 
  mutate(Time_sp = format(Time_sp, "%Y-%m-%d %H:%M:%S+00"))

write.csv(avgdl, paste0(dirlist[idx],"_dly.csv"), row.names = FALSE)
英文:
#Calc 15min average
avghr <- merged %>%
  select(any_of(cn)) %>%
  as_tibble() %>%
  group_by(Time_sp = lubridate::floor_date(`Date/Time`, "60 mins")) %>% 
  summarise(across(where(is.numeric), ~ if(mean(is.na(.x)) > 0.5) NA else mean(.x, na.rm = TRUE)))

mutate(avghr, Time_sp = format(Time_sp, "%Y-%m-%d %H:%M:%S+00")) %>%
  write.csv(paste0(dirlist[idx],"_hr.csv"), row.names = FALSE)

#Calc daily average
avgdl <- avghr %>%
  group_by(Time_sp = lubridate::floor_date(`Time_sp`, "1 day")) %>%
  summarise(across(where(is.numeric), ~ if(sum(is.na(.x)) > 1) NA else mean(.x, na.rm = TRUE))) %>% 
  mutate(Time_sp = format(Time_sp, "%Y-%m-%d %H:%M:%S+00"))

write.csv(avgdl, paste0(dirlist[idx],"_dly.csv"), row.names = FALSE)

huangapple
  • 本文由 发表于 2023年5月29日 20:25:22
  • 转载请务必保留本文链接:https://go.coder-hub.com/76357365.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定