在数据框列表上执行循环操作。

huangapple go评论92阅读模式
英文:

Loop operation on a list of dataframe

问题

我有一个包含超过100,000行的广泛数据框,包括不同站点和不同日期的多个采样事件。以下是一个类似于我的模拟数据集,但我有更多列也需要保留。请注意,每个站点有多个日期,只是这里没有显示(共有500个日期*站点组合)。

df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
                 "date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
                 "depth" = rep(c(1, 2, 3, 4), 4),
                 "temp" = runif(16))
df

我需要计算每个日期和站点之间连续深度的温度差。所以我期望这里有一个名为d_temp的列:

df_expected <- data.frame("station" = rep(c("A", "B"), each = 4),
                 "date" = rep(c("2011-01-20", "2011-06-05"), each = 4),
                 "depth" = rep(c(1, 2, 3, 4), 2),
                 "temp" = runif(8),
                 "d_temp" = c("NA", 0.69-0.9, 0.63-0.69, 0.94-0.63, "NA", 0.72-0.55, 0.33-0.72, 0.81-0.33))
df_expected

我尝试将其全部拆分为列表,但一旦拆分,我就陷入了困境。我尝试在列表上使用for循环和lapply,但我没有找到一个看似简单的解决方案。

感谢您的帮助。

英文:

I have a extensive dataframe with > 100 k lines consisting of several sampling events at different stations and at different dates. Here is a simulated data set similar to what I have, but I have more columns that I also need to keep. Note that I do have multiple dates per station, just not here (it gives 500 date*station combinations).

    df &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;), each = 4),
                     &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;, &quot;2015-07-15&quot;, &quot;2017-08-09&quot;), each = 4),
                     &quot;depth&quot; = rep(c(1, 2, 3, 4), 4),
                     &quot;temp&quot; = runif(16))
    df

I need to calculate the delta temperature between each consecutive depths by date and station. So what I am expecting is column d_temp here

    df_expected &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;), each = 4),
                     &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;), each = 4),
                     &quot;depth&quot; = rep(c(1, 2, 3, 4), 2),
                     &quot;temp&quot; = runif(8),
                     &quot;d_temp&quot; = c(&quot;NA&quot;, 0.69-0.9, 0.63-0.69, 0.94-0.63, &quot;NA&quot;, 0.72-0.55, 0.33-0.72, 0.81-0.33))
    df_expected

I tried spliting it all in list, but once there I am stuck, I tried using for loop on the list and lapply but I am not finding the solution to something that has to be simple.

Thank you for your help

答案1

得分: 1

这是一个dplyr的解决方案:

set.seed(234)
df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
                 "date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
                 "depth" = rep(c(1, 2, 3, 4), 4),
                 "temp" = runif(16))

library(dplyr)

df %>%
  mutate(d_temp = temp - lag(temp, order_by = depth),
         .by = c(station, date))
#>    station       date depth        temp        d_temp
#> 1        A 2011-01-20     1 0.745619998            NA
#> 2        A 2011-01-20     2 0.781712425  0.0360924273
#> 3        A 2011-01-20     3 0.020037114 -0.7616753110
#> 4        A 2011-01-20     4 0.776085387  0.7560482735
#> 5        B 2011-06-05     1 0.066910093            NA
#> 6        B 2011-06-05     2 0.644795124  0.5778850310
#> 7        B 2011-06-05     3 0.929385959  0.2845908350
#> 8        B 2011-06-05     4 0.717642189 -0.2117437709
#> 9        C 2015-07-15     1 0.927736510            NA
#> 10       C 2015-07-15     2 0.284230120 -0.6435063903
#> 11       C 2015-07-15     3 0.555724930  0.2714948107
#> 12       C 2015-07-15     4 0.547701653 -0.0080232776
#> 13       D 2017-08-09     1 0.582847855            NA
#> 14       D 2017-08-09     2 0.582989913  0.0001420584
#> 15       D 2017-08-09     3 0.001198341 -0.5817915718
#> 16       D 2017-08-09     4 0.441117854  0.4399195127

创建于2023-08-03,使用reprex v2.0.2

英文:

Here's a dplyr solution:

set.seed(234)
df &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;), each = 4),
                 &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;, &quot;2015-07-15&quot;, &quot;2017-08-09&quot;), each = 4),
                 &quot;depth&quot; = rep(c(1, 2, 3, 4), 4),
                 &quot;temp&quot; = runif(16))

library(dplyr)


df %&gt;%
  mutate(d_temp = temp - lag(temp, order_by = depth),
         .by = c(station, date))
#&gt;    station       date depth        temp        d_temp
#&gt; 1        A 2011-01-20     1 0.745619998            NA
#&gt; 2        A 2011-01-20     2 0.781712425  0.0360924273
#&gt; 3        A 2011-01-20     3 0.020037114 -0.7616753110
#&gt; 4        A 2011-01-20     4 0.776085387  0.7560482735
#&gt; 5        B 2011-06-05     1 0.066910093            NA
#&gt; 6        B 2011-06-05     2 0.644795124  0.5778850310
#&gt; 7        B 2011-06-05     3 0.929385959  0.2845908350
#&gt; 8        B 2011-06-05     4 0.717642189 -0.2117437709
#&gt; 9        C 2015-07-15     1 0.927736510            NA
#&gt; 10       C 2015-07-15     2 0.284230120 -0.6435063903
#&gt; 11       C 2015-07-15     3 0.555724930  0.2714948107
#&gt; 12       C 2015-07-15     4 0.547701653 -0.0080232776
#&gt; 13       D 2017-08-09     1 0.582847855            NA
#&gt; 14       D 2017-08-09     2 0.582989913  0.0001420584
#&gt; 15       D 2017-08-09     3 0.001198341 -0.5817915718
#&gt; 16       D 2017-08-09     4 0.441117854  0.4399195127

<sup>Created on 2023-08-03 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年8月4日 03:43:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76831197.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定