在R中,统计数值为1之后,其余行为NA的行数。

huangapple go评论94阅读模式
英文:

Count rows after value 1 where rest of rows are NAs in r

问题

我想要填充1出现后的缺失值,用计数来表示。像这样:

  1. df2 <- tibble(a = c(NA, NA, 1, 2, 3, 4, 1, 2, 3))

到目前为止,我还没有找到解决方案。有什么想法吗?

英文:

I have this data:

  1. df &lt;- tibble(a = c(NA, NA, 1, NA, NA, NA, 1, NA, NA))

I want to fill the NAs after the occurence of 1 with counts. Like this:

  1. df2 &lt;- tibble(a = c(NA, NA, 1, 2, 3, 4, 1, 2, 3))

I've no solution so far. Any ideas?

答案1

得分: 2

使用 tidyverse 中的一种可能方法解决这个问题:

  1. library(dplyr)
  2. df %>%
  3. # 返回 TRUE 或 FALSE(1 或 0),如果不是 NA,则运行累积和以识别组
  4. dplyr::mutate(grp = cumsum(!is.na(a))) %>%
  5. # 构建分组
  6. dplyr::group_by(grp) %>%
  7. # 如果组不等于 0(第一次直到 a = 1 的第一行),为每个组提供行号
  8. dplyr::transmute(a = ifelse(grp != 0, dplyr::row_number(), NA)) %>%
  9. # 解除分组,以防止下游出现不需要的行为
  10. dplyr::ungroup() %>%
  11. # 如果在后续计算中不需要 grp,则取消选择 grp
  12. dplyr::select(-grp)
  13. # 一个 tibble: 9 x 1
  14. a
  15. <int>
  16. 1 NA
  17. 2 NA
  18. 3 1
  19. 4 2
  20. 5 3
  21. 6 4
  22. 7 1
  23. 8 2
  24. 9 3
英文:

one possible way to solve this within the tidyverse:

  1. library(dplyr)
  2. df %&gt;%
  3. # return TRUE or FALSE (1 or 0) if is not NA and run cummulative sum to identify groups
  4. dplyr::mutate(grp = cumsum(!is.na(a))) %&gt;%
  5. # build grouping
  6. dplyr::group_by(grp) %&gt;%
  7. # give rownumber per group if group != 0 (first rows until a = 1 for the frist time
  8. dplyr::transmute(a = ifelse(grp != 0, dplyr::row_number(), NA)) %&gt;%
  9. # release groupings to prevent unwanted behaviour down stream
  10. dplyr::ungroup() %&gt;%
  11. # unselect grp if you do not need it further on in your calculations
  12. dplyr::select(-grp)
  13. # A tibble: 9 x 1
  14. a
  15. &lt;int&gt;
  16. 1 NA
  17. 2 NA
  18. 3 1
  19. 4 2
  20. 5 3
  21. 6 4
  22. 7 1
  23. 8 2
  24. 9 3

答案2

得分: 1

  1. 使用`data.table`

library(data.table)
setDT(df)[, a2 := seq_len(.N) * NA^(all(is.na(a))), cumsum(!is.na(a))]

  1. -输出

> df
a a2
1: NA NA
2: NA NA
3: 1 1
4: NA 2
5: NA 3
6: NA 4
7: 1 1
8: NA 2
9: NA 3

英文:

Using data.table

  1. library(data.table)
  2. setDT(df)[, a2 := seq_len(.N) * NA^(all(is.na(a))), cumsum(!is.na(a))]

-output

  1. &gt; df
  2. a a2
  3. 1: NA NA
  4. 2: NA NA
  5. 3: 1 1
  6. 4: NA 2
  7. 5: NA 3
  8. 6: NA 4
  9. 7: 1 1
  10. 8: NA 2
  11. 9: NA 3

答案3

得分: 0

使用 sequence 函数:

  1. idx = which(df$a == 1)
  2. df$a[idx[1]:length(df$a)] <- sequence(diff(c(idx, length(df$a) + 1)))
  3. #df$a
  4. #[1] NA NA 1 2 3 4 1 2 3
英文:

With sequence:

  1. idx = which(df$a == 1)
  2. df$a[idx[1]:length(df$a)] &lt;- sequence(diff(c(idx, length(df$a) + 1)))
  3. #df$a
  4. #[1] NA NA 1 2 3 4 1 2 3

huangapple
  • 本文由 发表于 2023年3月10日 01:13:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/75687913.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定