英文:
Loop operation on a list of dataframe
问题
我有一个包含超过100,000行的广泛数据框,包括不同站点和不同日期的多个采样事件。以下是一个类似于我的模拟数据集,但我有更多列也需要保留。请注意,每个站点有多个日期,只是这里没有显示(共有500个日期*站点组合)。
df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
"date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
"depth" = rep(c(1, 2, 3, 4), 4),
"temp" = runif(16))
df
我需要计算每个日期和站点之间连续深度的温度差。所以我期望这里有一个名为d_temp的列:
df_expected <- data.frame("station" = rep(c("A", "B"), each = 4),
"date" = rep(c("2011-01-20", "2011-06-05"), each = 4),
"depth" = rep(c(1, 2, 3, 4), 2),
"temp" = runif(8),
"d_temp" = c("NA", 0.69-0.9, 0.63-0.69, 0.94-0.63, "NA", 0.72-0.55, 0.33-0.72, 0.81-0.33))
df_expected
我尝试将其全部拆分为列表,但一旦拆分,我就陷入了困境。我尝试在列表上使用for循环和lapply,但我没有找到一个看似简单的解决方案。
感谢您的帮助。
英文:
I have a extensive dataframe with > 100 k lines consisting of several sampling events at different stations and at different dates. Here is a simulated data set similar to what I have, but I have more columns that I also need to keep. Note that I do have multiple dates per station, just not here (it gives 500 date*station combinations).
df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
"date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
"depth" = rep(c(1, 2, 3, 4), 4),
"temp" = runif(16))
df
I need to calculate the delta temperature between each consecutive depths by date and station. So what I am expecting is column d_temp here
df_expected <- data.frame("station" = rep(c("A", "B"), each = 4),
"date" = rep(c("2011-01-20", "2011-06-05"), each = 4),
"depth" = rep(c(1, 2, 3, 4), 2),
"temp" = runif(8),
"d_temp" = c("NA", 0.69-0.9, 0.63-0.69, 0.94-0.63, "NA", 0.72-0.55, 0.33-0.72, 0.81-0.33))
df_expected
I tried spliting it all in list, but once there I am stuck, I tried using for loop on the list and lapply but I am not finding the solution to something that has to be simple.
Thank you for your help
答案1
得分: 1
这是一个dplyr
的解决方案:
set.seed(234)
df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
"date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
"depth" = rep(c(1, 2, 3, 4), 4),
"temp" = runif(16))
library(dplyr)
df %>%
mutate(d_temp = temp - lag(temp, order_by = depth),
.by = c(station, date))
#> station date depth temp d_temp
#> 1 A 2011-01-20 1 0.745619998 NA
#> 2 A 2011-01-20 2 0.781712425 0.0360924273
#> 3 A 2011-01-20 3 0.020037114 -0.7616753110
#> 4 A 2011-01-20 4 0.776085387 0.7560482735
#> 5 B 2011-06-05 1 0.066910093 NA
#> 6 B 2011-06-05 2 0.644795124 0.5778850310
#> 7 B 2011-06-05 3 0.929385959 0.2845908350
#> 8 B 2011-06-05 4 0.717642189 -0.2117437709
#> 9 C 2015-07-15 1 0.927736510 NA
#> 10 C 2015-07-15 2 0.284230120 -0.6435063903
#> 11 C 2015-07-15 3 0.555724930 0.2714948107
#> 12 C 2015-07-15 4 0.547701653 -0.0080232776
#> 13 D 2017-08-09 1 0.582847855 NA
#> 14 D 2017-08-09 2 0.582989913 0.0001420584
#> 15 D 2017-08-09 3 0.001198341 -0.5817915718
#> 16 D 2017-08-09 4 0.441117854 0.4399195127
创建于2023-08-03,使用reprex v2.0.2
英文:
Here's a dplyr
solution:
set.seed(234)
df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
"date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
"depth" = rep(c(1, 2, 3, 4), 4),
"temp" = runif(16))
library(dplyr)
df %>%
mutate(d_temp = temp - lag(temp, order_by = depth),
.by = c(station, date))
#> station date depth temp d_temp
#> 1 A 2011-01-20 1 0.745619998 NA
#> 2 A 2011-01-20 2 0.781712425 0.0360924273
#> 3 A 2011-01-20 3 0.020037114 -0.7616753110
#> 4 A 2011-01-20 4 0.776085387 0.7560482735
#> 5 B 2011-06-05 1 0.066910093 NA
#> 6 B 2011-06-05 2 0.644795124 0.5778850310
#> 7 B 2011-06-05 3 0.929385959 0.2845908350
#> 8 B 2011-06-05 4 0.717642189 -0.2117437709
#> 9 C 2015-07-15 1 0.927736510 NA
#> 10 C 2015-07-15 2 0.284230120 -0.6435063903
#> 11 C 2015-07-15 3 0.555724930 0.2714948107
#> 12 C 2015-07-15 4 0.547701653 -0.0080232776
#> 13 D 2017-08-09 1 0.582847855 NA
#> 14 D 2017-08-09 2 0.582989913 0.0001420584
#> 15 D 2017-08-09 3 0.001198341 -0.5817915718
#> 16 D 2017-08-09 4 0.441117854 0.4399195127
<sup>Created on 2023-08-03 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论