在R中检查前一行并找到数字何时变为零。

huangapple go评论64阅读模式
英文:

Check the previous row and find when the number change to zero in R

问题

您想要创建一个名为 pre_dur 的新列,其中包含以 dur_ 开头的变量的前一行的最后一个非负值。以下是一个通用的代码示例,可以处理不同的 dur_ 变量:

library(dplyr)

d1 <- d %>%
  mutate(pre_dur = {
    dur_columns <- select(., starts_with("dur_"))
    row_indices <- rowSums(dur_columns > 0)
    pre_dur <- sapply(1:nrow(d), function(i) {
      if (row_indices[i] > 0) {
        last_nonnegative <- tail(which(dur_columns[i,] > 0), 1)
        if (length(last_nonnegative) > 0) {
          return(dur_columns[i, last_nonnegative])
        }
      }
      return(NA)
    })
    return(pre_dur)
  })

这段代码将适用于具有不同数字后缀的 dur_ 变量,并为每一行计算 pre_dur

英文:

I have a data like this:

d &lt;- data.frame(
  ab = c(3, 4, 2, 6),
  dur_1 = c(32, 1, 3, 4),
  dur_2 = c(27, 9, 26, 5),
  dur_3 = c(25, 8, 21, 48),
  dur_5 = c(0, 4, 0, 42),
  dur_6 = c(0, 0, 0, 0),
  dur_7 = c(0, 0, 0, 0),
  cd = c(45, 67, 34, 78)
)

What I want to do is make a new column pre_dur.
pre_dur is the last nonnegative value of previous rows for the variables starts with dur_.
So, to get pre_dur, scan through the dur_ variables in the previous row, find where the number starts to change zero. The very last nonnegative number is the value that I want.

My expected output should be like this:

d1&lt;-data.frame(ab=c(3,4,2,6),
               dur_1=c(32,1,3,4),
               dur_2=c(27,9,26,5),
               dur_3=c(25,8,21,48),
               dur_5=c(0,4,0,42),
               dur_6=c(0,0,0,0),
               dur_7=c(0,0,0,0),
               cd=c(45,67,34,78),
               pre_dur=c(NA,25,4,21))

Actually the code below works:

d1 &lt;- d %&gt;%
  mutate(pre_dur = case_when(lag(dur_7) &gt; 0 ~ lag(dur_7),
                             lag(dur_6) &gt; 0 ~ lag(dur_6),
                             lag(dur_5) &gt; 0 ~ lag(dur_5),
                             lag(dur_3) &gt; 0 ~ lag(dur_3),
                             lag(dur_2) &gt; 0 ~ lag(dur_2),
                             lag(dur_1) &gt; 0 ~ lag(dur_1),
                             TRUE ~ NA_real_))

However, in my actual data, the natural number after dur_ can be changed. So I need a generalized code. How to do that?

答案1

得分: 2

以下是翻译好的代码部分:

library(tidyverse)
temp <- d %>%
  pivot_longer(-ab) %>%
  group_by(ab) %>%
  mutate(out = ifelse(value == 0, lag(value), 0)) %>%
  filter(out != 0)

bind_cols(d, pre_dur = lag(temp$out))

这段代码假设一旦数值降至0,就不再增加。如果不是这种情况,那么 ifelse 语句需要进行相应修改。

英文:

How about this solution?

library(tidyverse)
temp &lt;- d %&gt;% 
  pivot_longer(-ab) %&gt;% 
  group_by(ab) %&gt;% 
  mutate(out = ifelse(value == 0, lag(value), 0)) %&gt;% 
  filter(out != 0) 
  
# bind_cols(d,  pre_dur = c(NA, temp$out[2:nrow(temp)]))

bind_cols(d,  pre_dur = lag(temp$out))

This assumes that once a value drops to 0, then it doesn't increase. If this isn't the case, then the ifelse statement needs to capture this.

答案2

得分: 1

以下是您要翻译的内容:

首先,找到每个组的值,然后将其滞后一次。我们可以使用 rowwise() 来实现,但将其转换为长格式然后再转换回来可能会更快速且更易读。即:

library(dplyr)
library(tidyr)

d |&gt;
    pivot_longer(starts_with(&quot;dur&quot;)) |&gt;
    mutate(pre_dur = last(value[value &gt; 0]), .by = &quot;ab&quot;) |&gt;
    pivot_wider() |&gt;
    mutate(pre_dur = lag(pre_dur))
# 一个 tibble: 4 × 9
     ab    cd pre_dur dur_1 dur_2 dur_3 dur_5 dur_6 dur_7
  <dbl> <dbl>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1     3    45      NA    32    27    25     0     0     0
2     4    67      25     1     9     8     4     0     0
3     2    34       4     3    26    21     0     0     0
4     6    78      21     4     5    48    42     0     0
英文:

First, find the value for each group, then, lag it once. We could do it using rowwise(), but it's probably faster and more readable to transform it into long format and back again. I.e.

library(dplyr)
library(tidyr)

d |&gt;
    pivot_longer(starts_with(&quot;dur&quot;)) |&gt;
    mutate(pre_dur = last(value[value &gt; 0]), .by = &quot;ab&quot;) |&gt;
    pivot_wider() |&gt;
    mutate(pre_dur = lag(pre_dur))
# A tibble: 4 &#215; 9
     ab    cd pre_dur dur_1 dur_2 dur_3 dur_5 dur_6 dur_7
  &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
1     3    45      NA    32    27    25     0     0     0
2     4    67      25     1     9     8     4     0     0
3     2    34       4     3    26    21     0     0     0
4     6    78      21     4     5    48    42     0     0

答案3

得分: 1

A way in base using max.col with last.

i <- which(startsWith(names(d), "dur"))
j <- max.col(d[i] > 0, "last")
cbind(d, pre_dur = c(NA, d[cbind(seq_len(nrow(d)), i[j])][-nrow(d)]))
#  ab dur_1 dur_2 dur_3 dur_5 dur_6 dur_7 cd pre_dur
#1  3    32    27    25     0     0     0 45      NA
#2  4     1     9     8     4     0     0 67      25
#3  2     3    26    21     0     0     0 34       4
#4  6     4     5    48    42     0     0 78      21
英文:

A way in base using max.col with last.

i &lt;- which(startsWith(names(d), &quot;dur&quot;))
j &lt;- max.col(d[i] &gt; 0, &quot;last&quot;)
cbind(d, pre_dur = c(NA, d[cbind(seq_len(nrow(d)), i[j])][-nrow(d)]))
#  ab dur_1 dur_2 dur_3 dur_5 dur_6 dur_7 cd pre_dur
#1  3    32    27    25     0     0     0 45      NA
#2  4     1     9     8     4     0     0 67      25
#3  2     3    26    21     0     0     0 34       4
#4  6     4     5    48    42     0     0 78      21

huangapple
  • 本文由 发表于 2023年5月22日 16:47:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/76304435.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定