在R中找到上一行的最后一个字符

huangapple go评论65阅读模式
英文:

Find the very last character of the previous row in R

问题

以下是翻译的内容:

让我们假设我有一个像这样的数据框:

d <- data.frame(order = c("101", "01", "10", "01", "101"),
                dur_1 = c(50, 20, 40, 25, 45),
                dur_2 = c(40, 30, 45, 34, 96),
                dur_3 = c(20, 0, 0, 0, 125))

我想要做的是创建新的列 pre_orderpre_dur
pre_order 是前一个 order 的最后一个字符。所以第一行应该是 NA,因为我们不知道前一个 order 的值。

pre_dur 是前一个 dur_ 列的最后一个非负值。我想要表示为 dur_,因为在我的实际数据中,我必须处理一个通用版本。所以我必须将其表示为 dur_。我的预期输出应该如下所示:

d1<-data.frame(order=c("101","01","10","01","101"),
              dur_1=c(50,20,40,25,45),
              dur_2=c(40,30,45,34,96),
              dur_3=c(20,0,0,0,125),
              pre_order=c(NA,1,1,0,1),
              pre_dur=c(NA,20,30,45,34))

我不知道如何在R中实现这一点。

英文:

Let's say I have a dataframe like this:

d &lt;- data.frame(order = c(&quot;101&quot;, &quot;01&quot;, &quot;10&quot;, &quot;01&quot;, &quot;101&quot;),
                dur_1 = c(50, 20, 40, 25, 45),
                dur_2 = c(40, 30, 45, 34, 96),
                dur_3 = c(20, 0, 0, 0, 125))

What I want to do is make new columns pre_order and pre_dur.
pre_order is the very last character of the previous order. So the first row should be NA. Because we don't know the previous value of order

pre_dur is the very last nonnegative value of the previous dur_ column. I want to express as dur_ beacause, in my actual data, I have to deal with a generalized version. So I have to express it as dur_. My expected output should look like this:

d1&lt;-data.frame(order=c(&quot;101&quot;,&quot;01&quot;,&quot;10&quot;,&quot;01&quot;,&quot;101&quot;),
              dur_1=c(50,20,40,25,45),
              dur_2=c(40,30,45,34,96),
              dur_3=c(20,0,0,0,125),
              pre_order=c(NA,1,1,0,1),
              pre_dur=c(NA,20,30,45,34))

I don't know how to do that in R

答案1

得分: 3

使用 max.colsubstring 的基本变体。

i <- which(startsWith(names(d), "dur"))
j <- max.col(d[i] > 0, "last")
cbind(d, rbind(NA, data.frame(pre_order = substring(d$order, nchar(d$order)),
pre_dur = d[cbind(seq_len(nrow(d)), i[j])])[-nrow(d),]))
#  order dur_1 dur_2 dur_3 pre_order pre_dur
#1   101    50    40    20      <NA>    <NA>
#2    01    20    30     0         1      20
#3    10    40    45     0         1      30
#4    01    25    34     0         0      45
#5   101    45    96   125         1      34
英文:

A base variant using max.col and substring.

i &lt;- which(startsWith(names(d), &quot;dur&quot;))
j &lt;- max.col(d[i] &gt; 0, &quot;last&quot;)
cbind(d, rbind(NA,data.frame(pre_order = substring(d$order, nchar(d$order)),
pre_dur = d[cbind(seq_len(nrow(d)), i[j])])[-nrow(d),]))
#  order dur_1 dur_2 dur_3 pre_order pre_dur
#1   101    50    40    20      &lt;NA&gt;    &lt;NA&gt;
#2    01    20    30     0         1      20
#3    10    40    45     0         1      30
#4    01    25    34     0         0      45
#5   101    45    96   125         1      34

答案2

得分: 1

以下是您要翻译的内容:

Edit:

This approach solves the first part of your question but doesn't account for the "what if I have an unknown number of 'dur_' columns?" part of the question. I tried a few different solutions using purrr::map() but I wasn't able to get the correct output with your criteria.

GKi's solution successfully answers both parts of your question.


Original Answer:

If I've understood correctly, this is one potential solution:

library(tidyverse)

d <- data.frame(order = c("101", "01", "10", "01", "101"),
                dur_1 = c(50, 20, 40, 25, 45),
                dur_2 = c(40, 30, 45, 34, 96),
                dur_3 = c(20, 0, 0, 0, 125))

d1<-data.frame(order=c("101","01","10","01","101"),
               dur_1=c(50,20,40,25,45),
               dur_2=c(40,30,45,34,96),
               dur_3=c(20,0,0,0,125),
               pre_order=c(NA,1,1,0,1),
               pre_dur=c(NA,20,30,45,34))
d1
#>   order dur_1 dur_2 dur_3 pre_order pre_dur
#> 1   101    50    40    20        NA      NA
#> 2    01    20    30     0         1      20
#> 3    10    40    45     0         1      30
#> 4    01    25    34     0         0      45
#> 5   101    45    96   125         1      34

d %>%
  mutate(pre_order = lag(str_sub(order, -1)),
         pre_dur = case_when(lag(dur_3) > 0 ~ lag(dur_3),
                             lag(dur_2) > 0 ~ lag(dur_2),
                             lag(dur_1) > 0 ~ lag(dur_1),
                             TRUE ~ NA_real_))
#>   order dur_1 dur_2 dur_3 pre_order pre_dur
#> 1   101    50    40    20      <NA>      NA
#> 2    01    20    30     0         1      20
#> 3    10    40    45     0         1      30
#> 4    01    25    34     0         0      45
#> 5   101    45    96   125         1      34

Created on 2023-05-22 with reprex v2.0.2

英文:

Edit:

This approach solves the first part of your question but doesn't account for the "what if I have an unknown number of 'dur_' columns?" part of the question. I tried a few different solutions using purrr::map() but I wasn't able to get the correct output with your criteria.

GKi's solution successfully answers both parts of your question.


Original Answer:

If I've understood correctly, this is one potential solution:

library(tidyverse)

d &lt;- data.frame(order = c(&quot;101&quot;, &quot;01&quot;, &quot;10&quot;, &quot;01&quot;, &quot;101&quot;),
                dur_1 = c(50, 20, 40, 25, 45),
                dur_2 = c(40, 30, 45, 34, 96),
                dur_3 = c(20, 0, 0, 0, 125))

d1&lt;-data.frame(order=c(&quot;101&quot;,&quot;01&quot;,&quot;10&quot;,&quot;01&quot;,&quot;101&quot;),
               dur_1=c(50,20,40,25,45),
               dur_2=c(40,30,45,34,96),
               dur_3=c(20,0,0,0,125),
               pre_order=c(NA,1,1,0,1),
               pre_dur=c(NA,20,30,45,34))
d1
#&gt;   order dur_1 dur_2 dur_3 pre_order pre_dur
#&gt; 1   101    50    40    20        NA      NA
#&gt; 2    01    20    30     0         1      20
#&gt; 3    10    40    45     0         1      30
#&gt; 4    01    25    34     0         0      45
#&gt; 5   101    45    96   125         1      34

d %&gt;%
  mutate(pre_order = lag(str_sub(order, -1)),
         pre_dur = case_when(lag(dur_3) &gt; 0 ~ lag(dur_3),
                             lag(dur_2) &gt; 0 ~ lag(dur_2),
                             lag(dur_1) &gt; 0 ~ lag(dur_1),
                             TRUE ~ NA_real_))
#&gt;   order dur_1 dur_2 dur_3 pre_order pre_dur
#&gt; 1   101    50    40    20      &lt;NA&gt;      NA
#&gt; 2    01    20    30     0         1      20
#&gt; 3    10    40    45     0         1      30
#&gt; 4    01    25    34     0         0      45
#&gt; 5   101    45    96   125         1      34

<sup>Created on 2023-05-22 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月22日 11:19:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302847.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定