在数据框列表上执行循环操作。

huangapple go评论129阅读模式
英文:

Loop operation on a list of dataframe

问题

我有一个包含超过100,000行的广泛数据框,包括不同站点和不同日期的多个采样事件。以下是一个类似于我的模拟数据集,但我有更多列也需要保留。请注意,每个站点有多个日期,只是这里没有显示(共有500个日期*站点组合)。

  1. df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
  2. "date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
  3. "depth" = rep(c(1, 2, 3, 4), 4),
  4. "temp" = runif(16))
  5. df

我需要计算每个日期和站点之间连续深度的温度差。所以我期望这里有一个名为d_temp的列:

  1. df_expected <- data.frame("station" = rep(c("A", "B"), each = 4),
  2. "date" = rep(c("2011-01-20", "2011-06-05"), each = 4),
  3. "depth" = rep(c(1, 2, 3, 4), 2),
  4. "temp" = runif(8),
  5. "d_temp" = c("NA", 0.69-0.9, 0.63-0.69, 0.94-0.63, "NA", 0.72-0.55, 0.33-0.72, 0.81-0.33))
  6. df_expected

我尝试将其全部拆分为列表,但一旦拆分,我就陷入了困境。我尝试在列表上使用for循环和lapply,但我没有找到一个看似简单的解决方案。

感谢您的帮助。

英文:

I have a extensive dataframe with > 100 k lines consisting of several sampling events at different stations and at different dates. Here is a simulated data set similar to what I have, but I have more columns that I also need to keep. Note that I do have multiple dates per station, just not here (it gives 500 date*station combinations).

  1. df &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;), each = 4),
  2. &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;, &quot;2015-07-15&quot;, &quot;2017-08-09&quot;), each = 4),
  3. &quot;depth&quot; = rep(c(1, 2, 3, 4), 4),
  4. &quot;temp&quot; = runif(16))
  5. df

I need to calculate the delta temperature between each consecutive depths by date and station. So what I am expecting is column d_temp here

  1. df_expected &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;), each = 4),
  2. &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;), each = 4),
  3. &quot;depth&quot; = rep(c(1, 2, 3, 4), 2),
  4. &quot;temp&quot; = runif(8),
  5. &quot;d_temp&quot; = c(&quot;NA&quot;, 0.69-0.9, 0.63-0.69, 0.94-0.63, &quot;NA&quot;, 0.72-0.55, 0.33-0.72, 0.81-0.33))
  6. df_expected

I tried spliting it all in list, but once there I am stuck, I tried using for loop on the list and lapply but I am not finding the solution to something that has to be simple.

Thank you for your help

答案1

得分: 1

这是一个dplyr的解决方案:

  1. set.seed(234)
  2. df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
  3. "date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
  4. "depth" = rep(c(1, 2, 3, 4), 4),
  5. "temp" = runif(16))
  6. library(dplyr)
  7. df %>%
  8. mutate(d_temp = temp - lag(temp, order_by = depth),
  9. .by = c(station, date))
  10. #> station date depth temp d_temp
  11. #> 1 A 2011-01-20 1 0.745619998 NA
  12. #> 2 A 2011-01-20 2 0.781712425 0.0360924273
  13. #> 3 A 2011-01-20 3 0.020037114 -0.7616753110
  14. #> 4 A 2011-01-20 4 0.776085387 0.7560482735
  15. #> 5 B 2011-06-05 1 0.066910093 NA
  16. #> 6 B 2011-06-05 2 0.644795124 0.5778850310
  17. #> 7 B 2011-06-05 3 0.929385959 0.2845908350
  18. #> 8 B 2011-06-05 4 0.717642189 -0.2117437709
  19. #> 9 C 2015-07-15 1 0.927736510 NA
  20. #> 10 C 2015-07-15 2 0.284230120 -0.6435063903
  21. #> 11 C 2015-07-15 3 0.555724930 0.2714948107
  22. #> 12 C 2015-07-15 4 0.547701653 -0.0080232776
  23. #> 13 D 2017-08-09 1 0.582847855 NA
  24. #> 14 D 2017-08-09 2 0.582989913 0.0001420584
  25. #> 15 D 2017-08-09 3 0.001198341 -0.5817915718
  26. #> 16 D 2017-08-09 4 0.441117854 0.4399195127

创建于2023-08-03,使用reprex v2.0.2

英文:

Here's a dplyr solution:

  1. set.seed(234)
  2. df &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;), each = 4),
  3. &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;, &quot;2015-07-15&quot;, &quot;2017-08-09&quot;), each = 4),
  4. &quot;depth&quot; = rep(c(1, 2, 3, 4), 4),
  5. &quot;temp&quot; = runif(16))
  6. library(dplyr)
  7. df %&gt;%
  8. mutate(d_temp = temp - lag(temp, order_by = depth),
  9. .by = c(station, date))
  10. #&gt; station date depth temp d_temp
  11. #&gt; 1 A 2011-01-20 1 0.745619998 NA
  12. #&gt; 2 A 2011-01-20 2 0.781712425 0.0360924273
  13. #&gt; 3 A 2011-01-20 3 0.020037114 -0.7616753110
  14. #&gt; 4 A 2011-01-20 4 0.776085387 0.7560482735
  15. #&gt; 5 B 2011-06-05 1 0.066910093 NA
  16. #&gt; 6 B 2011-06-05 2 0.644795124 0.5778850310
  17. #&gt; 7 B 2011-06-05 3 0.929385959 0.2845908350
  18. #&gt; 8 B 2011-06-05 4 0.717642189 -0.2117437709
  19. #&gt; 9 C 2015-07-15 1 0.927736510 NA
  20. #&gt; 10 C 2015-07-15 2 0.284230120 -0.6435063903
  21. #&gt; 11 C 2015-07-15 3 0.555724930 0.2714948107
  22. #&gt; 12 C 2015-07-15 4 0.547701653 -0.0080232776
  23. #&gt; 13 D 2017-08-09 1 0.582847855 NA
  24. #&gt; 14 D 2017-08-09 2 0.582989913 0.0001420584
  25. #&gt; 15 D 2017-08-09 3 0.001198341 -0.5817915718
  26. #&gt; 16 D 2017-08-09 4 0.441117854 0.4399195127

<sup>Created on 2023-08-03 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年8月4日 03:43:05
  • 转载请务必保留本文链接:https://go.coder-hub.com/76831197.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定