Loop function on timeseries works on small df, but not in large df – Error: C stack usage…too close to the limit

huangapple go评论104阅读模式

Loop function on timeseries works on small df, but not in large df - Error: C stack usage...too close to the limit


I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying to find the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c). This works perfectly on a smaller dataframe but not on a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit." Having seen other similar questions (e.g., https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit), I do not think it's a problem of an infinite recursive function but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
df <- data.frame("Date_time" = seq(from=as.POSIXct("2022-01-01 00:00"), by= 15*60, to=as.POSIXct("2022-01-01 07:00")), 
             "Site" = rep(c("Site A", "Site B"), each = 29),
             "Value" = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
                         rep(10.3, times=29)))
df <- df %>% group_by(Site) %>% mutate(Lead_Value = lead(Value))
df$Surge_start <- NA
df[which(df$Lead_Value - df$Value >= 2),"Surge_start"] <- 
 paste("Surge", seq(1,length(which(df$Lead_Value - df$Value >= 2)),1), sep="")
###Applying the 'find.next.smaller' function

find.next.smaller <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
     find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time <- as.character(df$Date_time)
Output <- df %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine

df2 <- do.call("rbind", replicate(1000, df, simplify = FALSE))
Output2 <- df2 %>% group_by(Site) %>% mutate(Surge_end = ifelse(grepl("Surge",Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work

I have a dataframe with dates/times (time series), site (grouping var) and value. I have identified the start times of different 'surges' - defined as changes in values of >=2 in 15 mins. For each surge time, I am trying for the date/time where the value falls back down to (or below) the start of the surge (i.e., the end of the surge).

I can achieve this by using a recursive loop function ('find.next.smaller' from this question - https://stackoverflow.com/questions/38207584/in-a-dataframe-find-the-index-of-the-next-smaller-value-for-each-element-of-a-c). This works perfectly on a smaller dataframe, but not a large one. I get the error message "Error: C stack usage 15925584 is too close to the limit". Having seen other similar questions (e.g., https://stackoverflow.com/questions/14719349/error-c-stack-usage-is-too-close-to-the-limit), I do not think its a problem of an infinite recursive function, but a memory issue. But I do not know how to use shell (or powershell) to do this. I wondered whether there was any other way? Either through adapting my memory or the function below?

Some example code:

###df formatting    
df &lt;- data.frame(&quot;Date_time&quot; =seq(from=as.POSIXct(&quot;2022-01-01 00:00&quot;) , by= 15*60, to=as.POSIXct(&quot;2022-01-01 07:00&quot;)), 
&quot;Site&quot; = rep(c(&quot;Site A&quot;, &quot;Site B&quot;), each = 29),
&quot;Value&quot; = c(10,10.1,10.2,10.3,12.5,14.8,12.4,11.3,10.3,10.1,10.2,10.5,10.4,10.3,14.7,10.1,
df &lt;- df %&gt;% group_by(Site) %&gt;% mutate(Lead_Value = lead(Value))
df$Surge_start &lt;- NA
df[which(df$Lead_Value - df$Value &gt;=2),&quot;Surge_start&quot;] &lt;- 
paste(&quot;Surge&quot;,seq(1,length(which(df$Lead_Value - df$Value &gt;=2)),1),sep=&quot;&quot;)
###Applying the &#39;find.next.smaller&#39; function
find.next.smaller &lt;- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] &gt;= vec[-1])), 
find.next.smaller(ini + 1, vec[-1]))
}       # the recursive function will go element by element through the vector and find out 
# the index of the next smaller value.
df$Date_time &lt;- as.character(df$Date_time)
Output &lt;- df %&gt;% group_by(Site) %&gt;% mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),Date_time[find.next.smaller(1, Value)],NA))
###This works fine
df2 &lt;- do.call(&quot;rbind&quot;, replicate(1000, df, simplify = FALSE))
Output2 &lt;- df2 %&gt;% group_by(Site) %&gt;% mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),Date_time[find.next.smaller(1, Value)],NA))
####This does not work


得分: 1

I suggest you don't need recursion.

find_nearest_value <- function(surge, time1, val1, times, vals) {
  if (!grepl("Surge", surge)) NA else times[times > time1 & vals <= val1][1]

Output %>%
  group_by(Site) %>%
  mutate(end2 = if_else(grepl("Surge", Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %>%
# # A tibble: 58 × 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    <chr>               <chr>  <dbl>      <dbl> <chr>       <chr>               <chr>              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3      


I suggest you don&#39;t need recursion.

find_nearest_value &lt;- function(surge, time1, val1, times, vals) {
  if (!grepl(&quot;Surge&quot;, surge)) NA else times[times &gt; time1 &amp; vals &lt;= val1][1]

Output %&gt;%
  group_by(Site) %&gt;%
  mutate(end2 = if_else(grepl(&quot;Surge&quot;, Surge_start), mapply(find_nearest_value, Surge_start, Date_time, Value, list(Date_time), list(Value)), NA)) %&gt;%
# # A tibble: 58 &#215; 7
# # Groups:   Site [2]
#    Date_time           Site   Value Lead_Value Surge_start Surge_end           end2               
#    &lt;chr&gt;               &lt;chr&gt;  &lt;dbl&gt;      &lt;dbl&gt; &lt;chr&gt;       &lt;chr&gt;               &lt;chr&gt;              
#  1 2022-01-01 00:00:00 Site A  10         10.1 NA          NA                  NA                 
#  2 2022-01-01 00:15:00 Site A  10.1       10.2 NA          NA                  NA                 
#  3 2022-01-01 00:30:00 Site A  10.2       10.3 NA          NA                  NA                 
#  4 2022-01-01 00:45:00 Site A  10.3       12.5 Surge1      2022-01-01 02:00:00 2022-01-01 02:00:00
#  5 2022-01-01 01:00:00 Site A  12.5       14.8 Surge2      2022-01-01 01:30:00 2022-01-01 01:30:00
#  6 2022-01-01 01:15:00 Site A  14.8       12.4 NA          NA                  NA                 
#  7 2022-01-01 01:30:00 Site A  12.4       11.3 NA          NA                  NA                 
#  8 2022-01-01 01:45:00 Site A  11.3       10.3 NA          NA                  NA                 
#  9 2022-01-01 02:00:00 Site A  10.3       10.1 NA          NA                  NA                 
# 10 2022-01-01 02:15:00 Site A  10.1       10.2 NA          NA                  NA                 
# 11 2022-01-01 02:30:00 Site A  10.2       10.5 NA          NA                  NA                 
# 12 2022-01-01 02:45:00 Site A  10.5       10.4 NA          NA                  NA                 
# 13 2022-01-01 03:00:00 Site A  10.4       10.3 NA          NA                  NA                 
# 14 2022-01-01 03:15:00 Site A  10.3       14.7 Surge3      2022-01-01 03:45:00 2022-01-01 03:45:00
# 15 2022-01-01 03:30:00 Site A  14.7       10.1 NA          NA                  NA                 
# 16 2022-01-01 03:45:00 Site A  10.1       16.7 Surge4      2022-01-01 05:15:00 2022-01-01 05:15:00
# 17 2022-01-01 04:00:00 Site A  16.7       16.3 NA          NA                  NA                 
# 18 2022-01-01 04:15:00 Site A  16.3       16.4 NA          NA                  NA                 
# 19 2022-01-01 04:30:00 Site A  16.4       14.2 NA          NA                  NA                 
# 20 2022-01-01 04:45:00 Site A  14.2       10.2 NA          NA                  NA                 
# 21 2022-01-01 05:00:00 Site A  10.2       10.1 NA          NA                  NA                 
# 22 2022-01-01 05:15:00 Site A  10.1       10.3 NA          NA                  NA                 
# 23 2022-01-01 05:30:00 Site A  10.3       10.2 NA          NA                  NA                 
# 24 2022-01-01 05:45:00 Site A  10.2       11.7 NA          NA                  NA                 
# 25 2022-01-01 06:00:00 Site A  11.7       13.2 NA          NA                  NA                 
# 26 2022-01-01 06:15:00 Site A  13.2       13.2 NA          NA                  NA                 
# 27 2022-01-01 06:30:00 Site A  13.2       11.1 NA          NA                  NA                 
# 28 2022-01-01 06:45:00 Site A  11.1       11.4 NA          NA                  NA                 
# 29 2022-01-01 07:00:00 Site A  11.4       NA   NA          NA                  NA                 
# 30 2022-01-01 00:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 31 2022-01-01 00:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 32 2022-01-01 00:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 33 2022-01-01 00:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 34 2022-01-01 01:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 35 2022-01-01 01:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 36 2022-01-01 01:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 37 2022-01-01 01:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 38 2022-01-01 02:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 39 2022-01-01 02:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 40 2022-01-01 02:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 41 2022-01-01 02:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 42 2022-01-01 03:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 43 2022-01-01 03:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 44 2022-01-01 03:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 45 2022-01-01 03:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 46 2022-01-01 04:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 47 2022-01-01 04:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 48 2022-01-01 04:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 49 2022-01-01 04:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 50 2022-01-01 05:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 51 2022-01-01 05:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 52 2022-01-01 05:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 53 2022-01-01 05:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 54 2022-01-01 06:00:00 Site B  10.3       10.3 NA          NA                  NA                 
# 55 2022-01-01 06:15:00 Site B  10.3       10.3 NA          NA                  NA                 
# 56 2022-01-01 06:30:00 Site B  10.3       10.3 NA          NA                  NA                 
# 57 2022-01-01 06:45:00 Site B  10.3       10.3 NA          NA                  NA                 
# 58 2022-01-01 07:00:00 Site B  10.3       NA   NA          NA                  NA                 


得分: 1





find.next.smaller_rec <- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] >= vec[-1])), 
find.next.smaller_rec(ini + 1, vec[-1]))


find.next.smaller <- function(val, vec) {
if(val == length(vec)) NA  else val + min(which(vec[val] >= vec[-(1:val)]))


find.next.smaller_for <- function(x, vec){
result <- numeric(x)
for(val in 1:x){
result[val] <- find.next.smaller(val, vec)


find.next.smaller_vec <- Vectorize(find.next.smaller, "val")


find.next.smaller_map <- function(x, vec){
map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))


bench <- bench::mark(find.next.smaller_rec(1, df$Value),
find.next.smaller_for(nrow(df), df$Value),
find.next.smaller_vec(1:nrow(df), df$Value),
find.next.smaller_map(nrow(df), df$Value),
min_time = 2)
bench %>% select(c(median, mem_alloc, n_gc, `gc/sec`))
median mem_alloc  n_gc `gc/sec`
<bch:tm> <bch:byt> <dbl>    <dbl>
1  496µs    92.4KB   13      7.30
2  582µs    77.1KB   10      5.46
3  612µs    78.7KB   10      5.97
4  681µs    77.1KB   10      5.40




Output <- df %>%
group_by(Site) %>%
mutate(Surge_end = ifelse(grepl("Surge",Surge_start),
Date_time[find.next.smaller_for(n(), Value)],

您还可以使用Date_time[find.next.smaller_map(n(), Value)]Date_time[find.next.smaller_vec(1:n(), Value)]


Possibly the recursion uses too much memory, and you're probably better of with a vectorized/looped approach, even if it takes a bit longer. Below I made an alteration to your function and created some options.

Some options


find.next.smaller_rec &lt;- function(ini = 1, vec) {
if(length(vec) == 1) NA 
else c(ini + min(which(vec[1] &gt;= vec[-1])), 
find.next.smaller_rec(ini + 1, vec[-1]))

The building block for the vectorized ones:

find.next.smaller &lt;- function(val, vec) {
if(val == length(vec)) NA  else val + min(which(vec[val] &gt;= vec[-(1:val)]))

With a for loop:

find.next.smaller_for &lt;- function(x, vec){
result &lt;- numeric(x)
for(val in 1:x){
result[val] &lt;- find.next.smaller(val, vec)

With Vectorize():

find.next.smaller_vec &lt;- Vectorize(find.next.smaller, &quot;val&quot;)

With purrr::map:

find.next.smaller_map &lt;- function(x, vec){
map_dbl(1:x, ~ find.next.smaller(val = .x, vec = vec))


bench &lt;- bench::mark(find.next.smaller_rec(1, df$Value),
find.next.smaller_for(nrow(df), df$Value),
find.next.smaller_vec(1:nrow(df), df$Value),
find.next.smaller_map(nrow(df), df$Value),
min_time = 2)
bench %&gt;% select(c(median, mem_alloc, n_gc, `gc/sec`))
median mem_alloc  n_gc `gc/sec`
&lt;bch:tm&gt; &lt;bch:byt&gt; &lt;dbl&gt;    &lt;dbl&gt;
1    496&#181;s    92.4KB    13     7.30
2    582&#181;s    77.1KB    10     5.46
3    612&#181;s    78.7KB    10     5.97
4    681&#181;s    77.1KB    10     5.40

We can see that, even if it's faster, the recursion uses more memory, and this might be the reason for your error.

There probably are even better options, I just wanted to present ones that were similar to your original one.

Applying them to the problem

Output &lt;- df %&gt;%
group_by(Site) %&gt;%
mutate(Surge_end = ifelse(grepl(&quot;Surge&quot;,Surge_start),
Date_time[find.next.smaller_for(n(), Value)],

Where you can also use Date_time[find.next.smaller_map(n(), Value)] or Date_time[find.next.smaller_vec(1:n(), Value)].

  • 本文由 发表于 2023年5月25日 19:22:31
  • 转载请务必保留本文链接:https://go.coder-hub.com/76331735.html



:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:
