2023年5月25日 19:22:31go评论150阅读模式

英文:

Loop function on timeseries works on small df, but not in large df - Error: C stack usage...too close to the limit

问题

I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying to find the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c). This works perfectly on a smaller dataframe but not on a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit." Having seen other similar questions (e.g., https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit), I do not think it's a problem of an infinite recursive function but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
library(dplyr)
df <- data.frame("Date_time" = seq(from=as.POSIXct("2022-01-01 00:00"), by= 15*60, to=as.POSIXct("2022-01-01 07:00")), 
             "Site" = rep(c("Site A", "Site B"), each = 29),
             "Value" = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
                         16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
                         rep(10.3, times=29)))
df <- df %>% group_by(Site) %>% mutate(Lead_Value = lead(Value))
df$Surge_start <- NA
df[which(df$Lead_Value - df$Value >= 2),"Surge_start"] <- 
 paste("Surge", seq(1,length(which(df$Lead_Value - df$Value >= 2)),1), sep="")
 
###Applying the 'find.next.smaller' function
find.next.smaller <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
     find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time <- as.character(df$Date_time)
Output <- df %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine
df2 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
Output2 <- df2 %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work

英文:

I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying for the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c). This works perfectly on a smaller dataframe, but not a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit". Having seen other similar questions (e.g., https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit), I do not think its a problem of an infinite recursive function, but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
library(dplyr)
df &lt;- data.frame(&quot;Date_time&quot; =seq(from=as.POSIXct(&quot;2022-01-01 00:00&quot;) , by= 15*60, to=as.POSIXct(&quot;2022-01-01 07:00&quot;)), 
&quot;Site&quot; = rep(c(&quot;Site A&quot;, &quot;Site B&quot;), each = 29),
&quot;Value&quot; = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
rep(10.3,times=29)))
df &lt;- df %&gt;% group_by(Site) %&gt;% mutate(Lead_Value = lead(Value))
df$Surge_start &lt;- NA
df[which(df$Lead_Value - df$Value &gt;=2),&quot;Surge_start&quot;] &lt;- 
paste(&quot;Surge&quot;,seq(1,length(which(df$Lead_Value - df$Value &gt;=2)),1),sep=&quot;&quot;)
###Applying the &#39;find.next.smaller&#39; function
find.next.smaller &lt;- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] &gt;= vec[-1])), 
find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time &lt;- as.character(df$Date_time)
Output &lt;- df %&gt;% group_by(Site) %&gt;% mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine
df2 &lt;- do.call(&quot;rbind&quot;, replicate(1000, df, simplify = FALSE))
Output2 &lt;- df2 %&gt;% group_by(Site) %&gt;% mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work

答案1

得分: 1

I suggest you don't need recursion.

find_nearest_value <- function(surge, time1, val1, times, vals) {
  if (!grepl("Surge", surge)) NA else times[times > time1 & vals <= val1][1]
}
Output %>%
  group_by(Site) %>%
  mutate(end2 = if_else(grepl("Surge", Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %>%
  print(n=99)
# # A tibble: 58 × 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    <chr>               <chr>  <dbl>      <dbl> <chr>       <chr>               <chr>              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3      
<details>
<summary>英文:</summary>
I suggest you don&#39;t need recursion.
```r
find_nearest_value &lt;- function(surge, time1, val1, times, vals) {
  if (!grepl(&quot;Surge&quot;, surge)) NA else times[times &gt; time1 &amp; vals &lt;= val1][1]
}
Output %&gt;%
  group_by(Site) %&gt;%
  mutate(end2 = if_else(grepl(&quot;Surge&quot;, Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %&gt;%
  print(n=99)
# # A tibble: 58 &#215; 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    &lt;chr&gt;               &lt;chr&gt;  &lt;dbl&gt;      &lt;dbl&gt; &lt;chr&gt;       &lt;chr&gt;               &lt;chr&gt;              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 38 2022-01-01 02:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 39 2022-01-01 02:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 40 2022-01-01 02:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 41 2022-01-01 02:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 42 2022-01-01 03:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 43 2022-01-01 03:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 44 2022-01-01 03:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 45 2022-01-01 03:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 46 2022-01-01 04:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 47 2022-01-01 04:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 48 2022-01-01 04:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 49 2022-01-01 04:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 50 2022-01-01 05:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 51 2022-01-01 05:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 52 2022-01-01 05:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 53 2022-01-01 05:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 54 2022-01-01 06:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 55 2022-01-01 06:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 56 2022-01-01 06:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 57 2022-01-01 06:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 58 2022-01-01 07:00:00 Site B  10.3       NA   NA          NA                  NA

答案2

得分: 1

以下是翻译好的内容：

可能递归使用了太多内存，你可能最好使用矢量化/循环的方法，即使需要花费更多时间。下面我对你的函数进行了修改并创建了一些选项。

一些选项

原始代码:

find.next.smaller_rec <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
find.next.smaller_rec(ini + 1, vec[-1]))
}

用于矢量化的基本构建块：

find.next.smaller <- function(val, vec) {
if(val == length(vec)) NA  else val + min(which(vec[val] >= vec[-(1:val)]))
}

使用for循环:

find.next.smaller_for <- function(x, vec){
result <- numeric(x)
for(val in 1:x){
result[val] <- find.next.smaller(val, vec)
}
result
}

使用Vectorize()函数：

find.next.smaller_vec <- Vectorize(find.next.smaller, "val")

使用purrr::map函数：

find.next.smaller_map <- function(x, vec){
map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))
}

比较:

bench <- bench::mark(find.next.smaller_rec(1, df$Value),
find.next.smaller_for(nrow(df), df$Value),
find.next.smaller_vec(1:nrow(df), df$Value),
find.next.smaller_map(nrow(df), df$Value),
min_time = 2)
bench %>% select(c(median, mem_alloc, n_gc, `gc/sec`))
median mem_alloc  n_gc `gc/sec`
<bch:tm> <bch:byt> <dbl>    <dbl>
1  496µs    92.4KB   13      7.30
2  582µs    77.1KB   10      5.46
3  612µs    78.7KB   10      5.97
4  681µs    77.1KB   10      5.40

我们可以看到，即使它更快，递归使用了更多内存，这可能是导致错误的原因。

可能还有更好的选项，我只是想呈现与您原始选项类似的一些选项。

将它们应用到问题上

Output <- df %>%
group_by(Site) %>%
mutate(Surge_end = ifelse(grepl("Surge",Surge_start),
Date_time[find.next.smaller_for(n(), Value)],
NA_character_))

您还可以使用Date_time[find.next.smaller_map(n(), Value)]或Date_time[find.next.smaller_vec(1:n(), Value)]。

英文:

Possibly the recursion uses too much memory, and you're probably better of with a vectorized/looped approach, even if it takes a bit longer. Below I made an alteration to your function and created some options.

Some options

Original:

find.next.smaller_rec &lt;- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] &gt;= vec[-1])), 
find.next.smaller_rec(ini + 1, vec[-1]))
}

The building block for the vectorized ones:

find.next.smaller &lt;- function(val, vec) {
if(val == length(vec)) NA  else val + min(which(vec[val] &gt;= vec[-(1:val)]))
}

With a for loop:

find.next.smaller_for &lt;- function(x, vec){
result &lt;- numeric(x)
for(val in 1:x){
result[val] &lt;- find.next.smaller(val, vec)
}
result
}

With Vectorize():

find.next.smaller_vec &lt;- Vectorize(find.next.smaller, &quot;val&quot;)

With purrr::map:

find.next.smaller_map &lt;- function(x, vec){
map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))
}

Comparison:

bench &lt;- bench::mark(find.next.smaller_rec(1, df$Value),
find.next.smaller_for(nrow(df), df$Value),
find.next.smaller_vec(1:nrow(df), df$Value),
find.next.smaller_map(nrow(df), df$Value),
min_time = 2)
bench %&gt;% select(c(median, mem_alloc, n_gc, `gc/sec`))
median mem_alloc  n_gc `gc/sec`
&lt;bch:tm&gt; &lt;bch:byt&gt; &lt;dbl&gt;    &lt;dbl&gt;
1    496&#181;s    92.4KB    13     7.30
2    582&#181;s    77.1KB    10     5.46
3    612&#181;s    78.7KB    10     5.97
4    681&#181;s    77.1KB    10     5.40

We can see that, even if it's faster, the recursion uses more memory, and this might be the reason for your error.

There probably are even better options, I just wanted to present ones that were similar to your original one.

Applying them to the problem

Output &lt;- df %&gt;%
group_by(Site) %&gt;%
mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),
Date_time[find.next.smaller_for(n(), Value)],
NA_character_))

Where you can also use Date_time[find.next.smaller_map(n(), Value)] or Date_time[find.next.smaller_vec(1:n(), Value)].

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Loop function on timeseries works on small df, but not in large df – Error: C stack usage…too close to the limit

问题

答案1

答案2

一些选项

比较:

将它们应用到问题上

Some options

Comparison:

Applying them to the problem

strptime() 在不同系统上处理夏令时(DST)的方式不同。

在R中，在数据框中按因子水平添加一列比例：

在 for 循环中重命名多个列表项。

应用CSS样式到单个DT数据表。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。