2023年8月4日 03:43:05go评论129阅读模式

英文:

Loop operation on a list of dataframe

问题

我有一个包含超过100,000行的广泛数据框，包括不同站点和不同日期的多个采样事件。以下是一个类似于我的模拟数据集，但我有更多列也需要保留。请注意，每个站点有多个日期，只是这里没有显示（共有500个日期*站点组合）。

df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
                 "date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
                 "depth" = rep(c(1, 2, 3, 4), 4),
                 "temp" = runif(16))
df

我需要计算每个日期和站点之间连续深度的温度差。所以我期望这里有一个名为d_temp的列：

df_expected <- data.frame("station" = rep(c("A", "B"), each = 4),
                 "date" = rep(c("2011-01-20", "2011-06-05"), each = 4),
                 "depth" = rep(c(1, 2, 3, 4), 2),
                 "temp" = runif(8),
                 "d_temp" = c("NA", 0.69-0.9, 0.63-0.69, 0.94-0.63, "NA", 0.72-0.55, 0.33-0.72, 0.81-0.33))
df_expected

我尝试将其全部拆分为列表，但一旦拆分，我就陷入了困境。我尝试在列表上使用for循环和lapply，但我没有找到一个看似简单的解决方案。

感谢您的帮助。

英文:

I have a extensive dataframe with > 100 k lines consisting of several sampling events at different stations and at different dates. Here is a simulated data set similar to what I have, but I have more columns that I also need to keep. Note that I do have multiple dates per station, just not here (it gives 500 date*station combinations).

    df &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;), each = 4),
                     &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;, &quot;2015-07-15&quot;, &quot;2017-08-09&quot;), each = 4),
                     &quot;depth&quot; = rep(c(1, 2, 3, 4), 4),
                     &quot;temp&quot; = runif(16))
    df

I need to calculate the delta temperature between each consecutive depths by date and station. So what I am expecting is column d_temp here

    df_expected &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;), each = 4),
                     &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;), each = 4),
                     &quot;depth&quot; = rep(c(1, 2, 3, 4), 2),
                     &quot;temp&quot; = runif(8),
                     &quot;d_temp&quot; = c(&quot;NA&quot;, 0.69-0.9, 0.63-0.69, 0.94-0.63, &quot;NA&quot;, 0.72-0.55, 0.33-0.72, 0.81-0.33))
    df_expected

I tried spliting it all in list, but once there I am stuck, I tried using for loop on the list and lapply but I am not finding the solution to something that has to be simple.

Thank you for your help

答案1

得分: 1

这是一个dplyr的解决方案：

set.seed(234)
df <- data.frame("station" = rep(c("A", "B", "C", "D"), each = 4),
                 "date" = rep(c("2011-01-20", "2011-06-05", "2015-07-15", "2017-08-09"), each = 4),
                 "depth" = rep(c(1, 2, 3, 4), 4),
                 "temp" = runif(16))
library(dplyr)
df %>%
  mutate(d_temp = temp - lag(temp, order_by = depth),
         .by = c(station, date))
#>    station       date depth        temp        d_temp
#> 1        A 2011-01-20     1 0.745619998            NA
#> 2        A 2011-01-20     2 0.781712425  0.0360924273
#> 3        A 2011-01-20     3 0.020037114 -0.7616753110
#> 4        A 2011-01-20     4 0.776085387  0.7560482735
#> 5        B 2011-06-05     1 0.066910093            NA
#> 6        B 2011-06-05     2 0.644795124  0.5778850310
#> 7        B 2011-06-05     3 0.929385959  0.2845908350
#> 8        B 2011-06-05     4 0.717642189 -0.2117437709
#> 9        C 2015-07-15     1 0.927736510            NA
#> 10       C 2015-07-15     2 0.284230120 -0.6435063903
#> 11       C 2015-07-15     3 0.555724930  0.2714948107
#> 12       C 2015-07-15     4 0.547701653 -0.0080232776
#> 13       D 2017-08-09     1 0.582847855            NA
#> 14       D 2017-08-09     2 0.582989913  0.0001420584
#> 15       D 2017-08-09     3 0.001198341 -0.5817915718
#> 16       D 2017-08-09     4 0.441117854  0.4399195127

^{创建于2023-08-03，使用reprex v2.0.2}

英文:

Here's a dplyr solution:

set.seed(234)
df &lt;- data.frame(&quot;station&quot; = rep(c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;), each = 4),
                 &quot;date&quot; = rep(c(&quot;2011-01-20&quot;, &quot;2011-06-05&quot;, &quot;2015-07-15&quot;, &quot;2017-08-09&quot;), each = 4),
                 &quot;depth&quot; = rep(c(1, 2, 3, 4), 4),
                 &quot;temp&quot; = runif(16))
library(dplyr)
df %&gt;%
  mutate(d_temp = temp - lag(temp, order_by = depth),
         .by = c(station, date))
#&gt;    station       date depth        temp        d_temp
#&gt; 1        A 2011-01-20     1 0.745619998            NA
#&gt; 2        A 2011-01-20     2 0.781712425  0.0360924273
#&gt; 3        A 2011-01-20     3 0.020037114 -0.7616753110
#&gt; 4        A 2011-01-20     4 0.776085387  0.7560482735
#&gt; 5        B 2011-06-05     1 0.066910093            NA
#&gt; 6        B 2011-06-05     2 0.644795124  0.5778850310
#&gt; 7        B 2011-06-05     3 0.929385959  0.2845908350
#&gt; 8        B 2011-06-05     4 0.717642189 -0.2117437709
#&gt; 9        C 2015-07-15     1 0.927736510            NA
#&gt; 10       C 2015-07-15     2 0.284230120 -0.6435063903
#&gt; 11       C 2015-07-15     3 0.555724930  0.2714948107
#&gt; 12       C 2015-07-15     4 0.547701653 -0.0080232776
#&gt; 13       D 2017-08-09     1 0.582847855            NA
#&gt; 14       D 2017-08-09     2 0.582989913  0.0001420584
#&gt; 15       D 2017-08-09     3 0.001198341 -0.5817915718
#&gt; 16       D 2017-08-09     4 0.441117854  0.4399195127

<sup>Created on 2023-08-03 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

在数据框列表上执行循环操作。

问题

答案1

使用adorn_totals()按列指定计算。

将一列进行分组，同时保留其他常数。

如何确定最后一个整数何时被输入？Java

从另一个列表中的索引中移除列表中的元素在Python中。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。