在R中找到上一行的最后一个字符

huangapple go评论113阅读模式
英文:

Find the very last character of the previous row in R

问题

以下是翻译的内容:

让我们假设我有一个像这样的数据框:

  1. d <- data.frame(order = c("101", "01", "10", "01", "101"),
  2. dur_1 = c(50, 20, 40, 25, 45),
  3. dur_2 = c(40, 30, 45, 34, 96),
  4. dur_3 = c(20, 0, 0, 0, 125))

我想要做的是创建新的列 pre_orderpre_dur
pre_order 是前一个 order 的最后一个字符。所以第一行应该是 NA,因为我们不知道前一个 order 的值。

pre_dur 是前一个 dur_ 列的最后一个非负值。我想要表示为 dur_,因为在我的实际数据中,我必须处理一个通用版本。所以我必须将其表示为 dur_。我的预期输出应该如下所示:

  1. d1<-data.frame(order=c("101","01","10","01","101"),
  2. dur_1=c(50,20,40,25,45),
  3. dur_2=c(40,30,45,34,96),
  4. dur_3=c(20,0,0,0,125),
  5. pre_order=c(NA,1,1,0,1),
  6. pre_dur=c(NA,20,30,45,34))

我不知道如何在R中实现这一点。

英文:

Let's say I have a dataframe like this:

  1. d &lt;- data.frame(order = c(&quot;101&quot;, &quot;01&quot;, &quot;10&quot;, &quot;01&quot;, &quot;101&quot;),
  2. dur_1 = c(50, 20, 40, 25, 45),
  3. dur_2 = c(40, 30, 45, 34, 96),
  4. dur_3 = c(20, 0, 0, 0, 125))

What I want to do is make new columns pre_order and pre_dur.
pre_order is the very last character of the previous order. So the first row should be NA. Because we don't know the previous value of order

pre_dur is the very last nonnegative value of the previous dur_ column. I want to express as dur_ beacause, in my actual data, I have to deal with a generalized version. So I have to express it as dur_. My expected output should look like this:

  1. d1&lt;-data.frame(order=c(&quot;101&quot;,&quot;01&quot;,&quot;10&quot;,&quot;01&quot;,&quot;101&quot;),
  2. dur_1=c(50,20,40,25,45),
  3. dur_2=c(40,30,45,34,96),
  4. dur_3=c(20,0,0,0,125),
  5. pre_order=c(NA,1,1,0,1),
  6. pre_dur=c(NA,20,30,45,34))

I don't know how to do that in R

答案1

得分: 3

使用 max.colsubstring 的基本变体。

  1. i <- which(startsWith(names(d), "dur"))
  2. j <- max.col(d[i] > 0, "last")
  3. cbind(d, rbind(NA, data.frame(pre_order = substring(d$order, nchar(d$order)),
  4. pre_dur = d[cbind(seq_len(nrow(d)), i[j])])[-nrow(d),]))
  5. # order dur_1 dur_2 dur_3 pre_order pre_dur
  6. #1 101 50 40 20 <NA> <NA>
  7. #2 01 20 30 0 1 20
  8. #3 10 40 45 0 1 30
  9. #4 01 25 34 0 0 45
  10. #5 101 45 96 125 1 34
英文:

A base variant using max.col and substring.

  1. i &lt;- which(startsWith(names(d), &quot;dur&quot;))
  2. j &lt;- max.col(d[i] &gt; 0, &quot;last&quot;)
  3. cbind(d, rbind(NA,data.frame(pre_order = substring(d$order, nchar(d$order)),
  4. pre_dur = d[cbind(seq_len(nrow(d)), i[j])])[-nrow(d),]))
  5. # order dur_1 dur_2 dur_3 pre_order pre_dur
  6. #1 101 50 40 20 &lt;NA&gt; &lt;NA&gt;
  7. #2 01 20 30 0 1 20
  8. #3 10 40 45 0 1 30
  9. #4 01 25 34 0 0 45
  10. #5 101 45 96 125 1 34

答案2

得分: 1

以下是您要翻译的内容:

Edit:

This approach solves the first part of your question but doesn't account for the "what if I have an unknown number of 'dur_' columns?" part of the question. I tried a few different solutions using purrr::map() but I wasn't able to get the correct output with your criteria.

GKi's solution successfully answers both parts of your question.


Original Answer:

If I've understood correctly, this is one potential solution:

  1. library(tidyverse)
  2. d <- data.frame(order = c("101", "01", "10", "01", "101"),
  3. dur_1 = c(50, 20, 40, 25, 45),
  4. dur_2 = c(40, 30, 45, 34, 96),
  5. dur_3 = c(20, 0, 0, 0, 125))
  6. d1<-data.frame(order=c("101","01","10","01","101"),
  7. dur_1=c(50,20,40,25,45),
  8. dur_2=c(40,30,45,34,96),
  9. dur_3=c(20,0,0,0,125),
  10. pre_order=c(NA,1,1,0,1),
  11. pre_dur=c(NA,20,30,45,34))
  12. d1
  13. #> order dur_1 dur_2 dur_3 pre_order pre_dur
  14. #> 1 101 50 40 20 NA NA
  15. #> 2 01 20 30 0 1 20
  16. #> 3 10 40 45 0 1 30
  17. #> 4 01 25 34 0 0 45
  18. #> 5 101 45 96 125 1 34
  19. d %>%
  20. mutate(pre_order = lag(str_sub(order, -1)),
  21. pre_dur = case_when(lag(dur_3) > 0 ~ lag(dur_3),
  22. lag(dur_2) > 0 ~ lag(dur_2),
  23. lag(dur_1) > 0 ~ lag(dur_1),
  24. TRUE ~ NA_real_))
  25. #> order dur_1 dur_2 dur_3 pre_order pre_dur
  26. #> 1 101 50 40 20 <NA> NA
  27. #> 2 01 20 30 0 1 20
  28. #> 3 10 40 45 0 1 30
  29. #> 4 01 25 34 0 0 45
  30. #> 5 101 45 96 125 1 34

Created on 2023-05-22 with reprex v2.0.2

英文:

Edit:

This approach solves the first part of your question but doesn't account for the "what if I have an unknown number of 'dur_' columns?" part of the question. I tried a few different solutions using purrr::map() but I wasn't able to get the correct output with your criteria.

GKi's solution successfully answers both parts of your question.


Original Answer:

If I've understood correctly, this is one potential solution:

  1. library(tidyverse)
  2. d &lt;- data.frame(order = c(&quot;101&quot;, &quot;01&quot;, &quot;10&quot;, &quot;01&quot;, &quot;101&quot;),
  3. dur_1 = c(50, 20, 40, 25, 45),
  4. dur_2 = c(40, 30, 45, 34, 96),
  5. dur_3 = c(20, 0, 0, 0, 125))
  6. d1&lt;-data.frame(order=c(&quot;101&quot;,&quot;01&quot;,&quot;10&quot;,&quot;01&quot;,&quot;101&quot;),
  7. dur_1=c(50,20,40,25,45),
  8. dur_2=c(40,30,45,34,96),
  9. dur_3=c(20,0,0,0,125),
  10. pre_order=c(NA,1,1,0,1),
  11. pre_dur=c(NA,20,30,45,34))
  12. d1
  13. #&gt; order dur_1 dur_2 dur_3 pre_order pre_dur
  14. #&gt; 1 101 50 40 20 NA NA
  15. #&gt; 2 01 20 30 0 1 20
  16. #&gt; 3 10 40 45 0 1 30
  17. #&gt; 4 01 25 34 0 0 45
  18. #&gt; 5 101 45 96 125 1 34
  19. d %&gt;%
  20. mutate(pre_order = lag(str_sub(order, -1)),
  21. pre_dur = case_when(lag(dur_3) &gt; 0 ~ lag(dur_3),
  22. lag(dur_2) &gt; 0 ~ lag(dur_2),
  23. lag(dur_1) &gt; 0 ~ lag(dur_1),
  24. TRUE ~ NA_real_))
  25. #&gt; order dur_1 dur_2 dur_3 pre_order pre_dur
  26. #&gt; 1 101 50 40 20 &lt;NA&gt; NA
  27. #&gt; 2 01 20 30 0 1 20
  28. #&gt; 3 10 40 45 0 1 30
  29. #&gt; 4 01 25 34 0 0 45
  30. #&gt; 5 101 45 96 125 1 34

<sup>Created on 2023-05-22 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月22日 11:19:21
  • 转载请务必保留本文链接:https://go.coder-hub.com/76302847.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定