Loop function on timeseries works on small df, but not in large df – Error: C stack usage…too close to the limit

huangapple go评论82阅读模式
英文:

Loop function on timeseries works on small df, but not in large df - Error: C stack usage...too close to the limit

问题

I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying to find the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c). This works perfectly on a smaller dataframe but not on a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit." Having seen other similar questions (e.g., https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit), I do not think it's a problem of an infinite recursive function but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
library(dplyr)
df <- data.frame("Date_time" = seq(from=as.POSIXct("2022-01-01 00:00"), by= 15*60, to=as.POSIXct("2022-01-01 07:00")), 
             "Site" = rep(c("Site A", "Site B"), each = 29),
             "Value" = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
                         16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
                         rep(10.3, times=29)))
df <- df %>% group_by(Site) %>% mutate(Lead_Value = lead(Value))
df$Surge_start <- NA
df[which(df$Lead_Value - df$Value >= 2),"Surge_start"] <- 
 paste("Surge", seq(1,length(which(df$Lead_Value - df$Value >= 2)),1), sep="")
 
###Applying the 'find.next.smaller' function

find.next.smaller <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
     find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time <- as.character(df$Date_time)
Output <- df %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine

df2 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
Output2 <- df2 %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work
英文:

I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying for the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c). This works perfectly on a smaller dataframe, but not a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit". Having seen other similar questions (e.g., https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit), I do not think its a problem of an infinite recursive function, but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
library(dplyr)
df &lt;- data.frame(&quot;Date_time&quot; =seq(from=as.POSIXct(&quot;2022-01-01 00:00&quot;) , by= 15*60, to=as.POSIXct(&quot;2022-01-01 07:00&quot;)), 
&quot;Site&quot; = rep(c(&quot;Site A&quot;, &quot;Site B&quot;), each = 29),
&quot;Value&quot; = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
16.7,16.3,16.4,14.2,10.2,10.1,10.3,10.2,11.7,13.2,13.2,11.1,11.4,
rep(10.3,times=29)))
df &lt;- df %&gt;% group_by(Site) %&gt;% mutate(Lead_Value = lead(Value))
df$Surge_start &lt;- NA
df[which(df$Lead_Value - df$Value &gt;=2),&quot;Surge_start&quot;] &lt;- 
paste(&quot;Surge&quot;,seq(1,length(which(df$Lead_Value - df$Value &gt;=2)),1),sep=&quot;&quot;)
###Applying the &#39;find.next.smaller&#39; function
find.next.smaller &lt;- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] &gt;= vec[-1])), 
find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time &lt;- as.character(df$Date_time)
Output &lt;- df %&gt;% group_by(Site) %&gt;% mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine
df2 &lt;- do.call(&quot;rbind&quot;, replicate(1000, df, simplify = FALSE))
Output2 &lt;- df2 %&gt;% group_by(Site) %&gt;% mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work

答案1

得分: 1

I suggest you don't need recursion.

find_nearest_value <- function(surge, time1, val1, times, vals) {
  if (!grepl("Surge", surge)) NA else times[times > time1 & vals <= val1][1]
}

Output %>%
  group_by(Site) %>%
  mutate(end2 = if_else(grepl("Surge", Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %>%
  print(n=99)
# # A tibble: 58 × 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    <chr>               <chr>  <dbl>      <dbl> <chr>       <chr>               <chr>              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3      

<details>
<summary>英文:</summary>

I suggest you don&#39;t need recursion.

```r
find_nearest_value &lt;- function(surge, time1, val1, times, vals) {
  if (!grepl(&quot;Surge&quot;, surge)) NA else times[times &gt; time1 &amp; vals &lt;= val1][1]
}

Output %&gt;%
  group_by(Site) %&gt;%
  mutate(end2 = if_else(grepl(&quot;Surge&quot;, Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %&gt;%
  print(n=99)
# # A tibble: 58 &#215; 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    &lt;chr&gt;               &lt;chr&gt;  &lt;dbl&gt;      &lt;dbl&gt; &lt;chr&gt;       &lt;chr&gt;               &lt;chr&gt;              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 38 2022-01-01 02:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 39 2022-01-01 02:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 40 2022-01-01 02:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 41 2022-01-01 02:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 42 2022-01-01 03:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 43 2022-01-01 03:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 44 2022-01-01 03:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 45 2022-01-01 03:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 46 2022-01-01 04:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 47 2022-01-01 04:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 48 2022-01-01 04:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 49 2022-01-01 04:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 50 2022-01-01 05:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 51 2022-01-01 05:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 52 2022-01-01 05:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 53 2022-01-01 05:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 54 2022-01-01 06:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 55 2022-01-01 06:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 56 2022-01-01 06:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 57 2022-01-01 06:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 58 2022-01-01 07:00:00 Site B  10.3       NA   NA          NA                  NA                 

答案2

得分: 1

以下是翻译好的内容:

可能递归使用了太多内存,你可能最好使用矢量化/循环的方法,即使需要花费更多时间。下面我对你的函数进行了修改并创建了一些选项。

一些选项

原始代码:

find.next.smaller_rec <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
find.next.smaller_rec(ini + 1, vec[-1]))
}

用于矢量化的基本构建块:

find.next.smaller <- function(val, vec) {
if(val == length(vec)) NA  else val + min(which(vec[val] >= vec[-(1:val)]))
}

使用for循环:

find.next.smaller_for <- function(x, vec){
result <- numeric(x)
for(val in 1:x){
result[val] <- find.next.smaller(val, vec)
}
result
}

使用Vectorize()函数:

find.next.smaller_vec <- Vectorize(find.next.smaller, "val")

使用purrr::map函数:

find.next.smaller_map <- function(x, vec){
map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))
}

比较:

bench <- bench::mark(find.next.smaller_rec(1, df$Value),
find.next.smaller_for(nrow(df), df$Value),
find.next.smaller_vec(1:nrow(df), df$Value),
find.next.smaller_map(nrow(df), df$Value),
min_time = 2)
bench %>% select(c(median, mem_alloc, n_gc, `gc/sec`))
median mem_alloc  n_gc `gc/sec`
<bch:tm> <bch:byt> <dbl>    <dbl>
1  496µs    92.4KB   13      7.30
2  582µs    77.1KB   10      5.46
3  612µs    78.7KB   10      5.97
4  681µs    77.1KB   10      5.40

我们可以看到,即使它更快,递归使用了更多内存,这可能是导致错误的原因。

可能还有更好的选项,我只是想呈现与您原始选项类似的一些选项。

将它们应用到问题上

Output <- df %>%
group_by(Site) %>%
mutate(Surge_end = ifelse(grepl("Surge",Surge_start),
Date_time[find.next.smaller_for(n(), Value)],
NA_character_))

您还可以使用Date_time[find.next.smaller_map(n(), Value)]Date_time[find.next.smaller_vec(1:n(), Value)]

英文:

Possibly the recursion uses too much memory, and you're probably better of with a vectorized/looped approach, even if it takes a bit longer. Below I made an alteration to your function and created some options.

Some options

Original:

find.next.smaller_rec &lt;- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] &gt;= vec[-1])), 
find.next.smaller_rec(ini + 1, vec[-1]))
}

The building block for the vectorized ones:

find.next.smaller &lt;- function(val, vec) {
if(val == length(vec)) NA  else val + min(which(vec[val] &gt;= vec[-(1:val)]))
}

With a for loop:

find.next.smaller_for &lt;- function(x, vec){
result &lt;- numeric(x)
for(val in 1:x){
result[val] &lt;- find.next.smaller(val, vec)
}
result
}

With Vectorize():

find.next.smaller_vec &lt;- Vectorize(find.next.smaller, &quot;val&quot;)

With purrr::map:

find.next.smaller_map &lt;- function(x, vec){
map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))
}

Comparison:

bench &lt;- bench::mark(find.next.smaller_rec(1, df$Value),
find.next.smaller_for(nrow(df), df$Value),
find.next.smaller_vec(1:nrow(df), df$Value),
find.next.smaller_map(nrow(df), df$Value),
min_time = 2)
bench %&gt;% select(c(median, mem_alloc, n_gc, `gc/sec`))
median mem_alloc  n_gc `gc/sec`
&lt;bch:tm&gt; &lt;bch:byt&gt; &lt;dbl&gt;    &lt;dbl&gt;
1    496&#181;s    92.4KB    13     7.30
2    582&#181;s    77.1KB    10     5.46
3    612&#181;s    78.7KB    10     5.97
4    681&#181;s    77.1KB    10     5.40

We can see that, even if it's faster, the recursion uses more memory, and this might be the reason for your error.

There probably are even better options, I just wanted to present ones that were similar to your original one.

Applying them to the problem

Output &lt;- df %&gt;%
group_by(Site) %&gt;%
mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),
Date_time[find.next.smaller_for(n(), Value)],
NA_character_))

Where you can also use Date_time[find.next.smaller_map(n(), Value)] or Date_time[find.next.smaller_vec(1:n(), Value)].

huangapple
  • 本文由 发表于 2023年5月25日 19:22:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331735.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定