英文:
Creating a counter that increments along changes in a column
问题
我正在处理一些类似以下结构的数据:
testdat <- data.frame(MyDat = c("a","a","b","b","b","a","a","a","a","b"))
我想要创建一个名为`MyCount`的计数变量,它会沿着`MyDat`迭代,在`'MyDat'`的因子级别发生变化时中断并将计数加1。理想情况下,结果应该类似于:
| MyCount | MyDat |
| -------- | -------|
| 1 | a |
| 1 | a |
| 2 | b |
| 2 | b |
| 2 | b |
| 3 | a |
| 3 | a |
| 3 | a |
| 3 | a |
| 4 | b |
我在尝试设置此循环以检查一行值是否等于前一行值并在不等于时中断并将计数加1方面遇到了困难。似乎我还需要从第二行开始迭代。类似于:
testdat <- data.frame(MyDat = c("a","a","b","b","b","a","a","a","a","b"))
v <- vector(mode = "integer", length = length(testdat))
counter <- 1
for(i in 2:length(testdat)) {
if(testdat$MyDat[i] == testdat$MyDat[i-1]) {
counter
} else {
counter = counter + 1
}
v[i] <- counter
}
both <- cbind(MyCount = v, MyDat = testdat$MyDat)
英文:
I am working with data structured somewhat like the following:
testdat <- data.frame(MyDat = c("a","a","b","b","b","a","a","a","a","b"))
I would like to create a counter variable MyCount
that iterates along MyDat
, breaking and adding 1 to the count whenever there is a shift in the factor level of 'MyDat. Ideally, the result would look something like this:
MyCount | MyDat |
---|---|
1 | a |
1 | a |
2 | b |
2 | b |
2 | b |
3 | a |
3 | a |
3 | a |
3 | a |
4 | b |
I am struggling with trying to figure out how to set this loop up for checking whether one row value is equivalent to the previous row and if not then breaking and adding one to the counter. It also appears I need to start iterating only on the second row onward. Something like:
testdat <- data.frame(MyDat = c("a","a","b","b","b","a","a","a","a","b"))
v <- vector(mode = "integer", length = length(testdat))
counter <- 1
for(i in v) {
if(testdat[, MyDat] == testdat[i-1, MyDat]) {
counter
} else {
counter = counter + 1
}
both <- cbind(v, testdat)
答案1
得分: 3
使用 `consecutive_id`
```R
library(dplyr)
testdat %>%
mutate(MyCount = consecutive_id(MyDat), .before = 1)
-输出
MyCount MyDat
1 1 a
2 1 a
3 2 b
4 2 b
5 2 b
6 3 a
7 3 a
8 3 a
9 3 a
10 4 b
或者使用 base R
和 rle
with(rle(testdat$MyDat), rep(seq_along(values), lengths))
[1] 1 1 2 2 2 3 3 3 3 4
英文:
Use consecutive_id
library(dplyr)
testdat %>%
mutate(MyCount = consecutive_id(MyDat), .before = 1)
-output
MyCount MyDat
1 1 a
2 1 a
3 2 b
4 2 b
5 2 b
6 3 a
7 3 a
8 3 a
9 3 a
10 4 b
Or in base R
with rle
with(rle(testdat$MyDat), rep(seq_along(values), lengths))
[1] 1 1 2 2 2 3 3 3 3 4
答案2
得分: 3
两种`base`选项:
```r
# (1)
cumsum(c(1, tail(testdat$MyDat, -1) != head(testdat$MyDat, -1)))
# [1] 1 1 2 2 2 3 3 3 3 4
# (2)
cumsum(c(1, diff(as.integer(factor(testdat$MyDat))) != 0)))
# [1] 1 1 2 2 2 3 3 3 3 4
英文:
Two base
options:
# (1)
cumsum(c(1, tail(testdat$MyDat, -1) != head(testdat$MyDat, -1)))
# [1] 1 1 2 2 2 3 3 3 3 4
# (2)
cumsum(c(1, diff(as.integer(factor(testdat$MyDat))) != 0))
# [1] 1 1 2 2 2 3 3 3 3 4
答案3
得分: 2
这是 data.table::rleid
的功能:
library(data.table)
setDT(testdat)[, MyCount := rleid(MyDat)]
#> MyDat MyCount
#> 1: a 1
#> 2: a 1
#> 3: b 2
#> 4: b 2
#> 5: b 2
#> 6: a 3
#> 7: a 3
#> 8: a 3
#> 9: a 3
#> 10: b 4
英文:
This is what data.table::rleid
does:
library(data.table)
setDT(testdat)[ , MyCount := rleid(MyDat)]
#> MyDat MyCount
#> 1: a 1
#> 2: a 1
#> 3: b 2
#> 4: b 2
#> 5: b 2
#> 6: a 3
#> 7: a 3
#> 8: a 3
#> 9: a 3
#> 10: b 4
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论