英文:
Count rows after value 1 where rest of rows are NAs in r
问题
我想要填充1出现后的缺失值,用计数来表示。像这样:
df2 <- tibble(a = c(NA, NA, 1, 2, 3, 4, 1, 2, 3))
到目前为止,我还没有找到解决方案。有什么想法吗?
英文:
I have this data:
df <- tibble(a = c(NA, NA, 1, NA, NA, NA, 1, NA, NA))
I want to fill the NAs after the occurence of 1 with counts. Like this:
df2 <- tibble(a = c(NA, NA, 1, 2, 3, 4, 1, 2, 3))
I've no solution so far. Any ideas?
答案1
得分: 2
使用 tidyverse
中的一种可能方法解决这个问题:
library(dplyr)
df %>%
# 返回 TRUE 或 FALSE(1 或 0),如果不是 NA,则运行累积和以识别组
dplyr::mutate(grp = cumsum(!is.na(a))) %>%
# 构建分组
dplyr::group_by(grp) %>%
# 如果组不等于 0(第一次直到 a = 1 的第一行),为每个组提供行号
dplyr::transmute(a = ifelse(grp != 0, dplyr::row_number(), NA)) %>%
# 解除分组,以防止下游出现不需要的行为
dplyr::ungroup() %>%
# 如果在后续计算中不需要 grp,则取消选择 grp
dplyr::select(-grp)
# 一个 tibble: 9 x 1
a
<int>
1 NA
2 NA
3 1
4 2
5 3
6 4
7 1
8 2
9 3
英文:
one possible way to solve this within the tidyverse
:
library(dplyr)
df %>%
# return TRUE or FALSE (1 or 0) if is not NA and run cummulative sum to identify groups
dplyr::mutate(grp = cumsum(!is.na(a))) %>%
# build grouping
dplyr::group_by(grp) %>%
# give rownumber per group if group != 0 (first rows until a = 1 for the frist time
dplyr::transmute(a = ifelse(grp != 0, dplyr::row_number(), NA)) %>%
# release groupings to prevent unwanted behaviour down stream
dplyr::ungroup() %>%
# unselect grp if you do not need it further on in your calculations
dplyr::select(-grp)
# A tibble: 9 x 1
a
<int>
1 NA
2 NA
3 1
4 2
5 3
6 4
7 1
8 2
9 3
答案2
得分: 1
使用`data.table`
library(data.table)
setDT(df)[, a2 := seq_len(.N) * NA^(all(is.na(a))), cumsum(!is.na(a))]
-输出
> df
a a2
1: NA NA
2: NA NA
3: 1 1
4: NA 2
5: NA 3
6: NA 4
7: 1 1
8: NA 2
9: NA 3
英文:
Using data.table
library(data.table)
setDT(df)[, a2 := seq_len(.N) * NA^(all(is.na(a))), cumsum(!is.na(a))]
-output
> df
a a2
1: NA NA
2: NA NA
3: 1 1
4: NA 2
5: NA 3
6: NA 4
7: 1 1
8: NA 2
9: NA 3
答案3
得分: 0
使用 sequence
函数:
idx = which(df$a == 1)
df$a[idx[1]:length(df$a)] <- sequence(diff(c(idx, length(df$a) + 1)))
#df$a
#[1] NA NA 1 2 3 4 1 2 3
英文:
With sequence
:
idx = which(df$a == 1)
df$a[idx[1]:length(df$a)] <- sequence(diff(c(idx, length(df$a) + 1)))
#df$a
#[1] NA NA 1 2 3 4 1 2 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论