寻找在R中执行此for循环的更有效方法

huangapple go评论94阅读模式
英文:

Looking for a more efficient way of doing this for-loop in R

问题

我的数据集有超过300万条记录,所以这个循环花费了很长时间。

我想要创建一个bout_len_tracker变量,用于计算连续的行数,其中相同的主题(SUBJECT)具有VECTORMAGNITUDECOUNTS >= 1853。

  1. sub_study$bout_len_tracker <- 0
  2. for (i in 2:nrow(sub_study)) {
  3. if ((sub_study$VECTORMAGNITUDECOUNTS[i] >= 1853) & (sub_study$SUBJECT[i] == sub_study$SUBJECT[i-1]))
  4. sub_study$bout_len_tracker[i] <- sub_study$bout_len_tracker[i-1] + 1
  5. }
英文:

My data set has over 3 million records so this loop is taking forever.

I want to create a bout_len_tracker variable that counts how many consecutive rows for the same SUBJECT have VECTORMAGNITUDECOUNTS >= 1853.

  1. sub_study$bout_len_tracker &lt;- 0
  2. for ( i in 2:nrow(sub_study) ) {
  3. if ( (sub_study$VECTORMAGNITUDECOUNTS[i] &gt;= 1853) &amp; (sub_study$SUBJECT[i] == sub_study$SUBJECT[i-1]) )
  4. sub_study$bout_len_tracker[i] &lt;- sub_study$bout_len_tracker[i-1]+1
  5. }

答案1

得分: 1

你可以将 VECTORMAGNITUDECOUNTS &gt;= 1853cumsum 放入 ave 中。这里有一个示例:

  1. dat$len &lt;- with(dat, ave(VECTORMAGNITUDECOUNTS &gt;= 1853, id, FUN=cumsum))
  2. dat
  3. # id t VECTORMAGNITUDECOUNTS len
  4. # 1 1 1 17 1
  5. # 2 1 2 18 2
  6. # 3 1 3 5 2
  7. # 4 2 1 5 0
  8. # 5 2 2 17 1
  9. # 6 2 3 14 2
  10. # 7 3 1 1 0
  11. # 8 3 2 15 1
  12. # 9 3 3 20 2
  13. # 10 4 1 10 1
  14. # 11 4 2 7 1
  15. # 12 4 3 18 2
  16. # 13 5 1 4 0
  17. # 14 5 2 4 0
  18. # 15 5 3 15 1

如果 n 中有 NA 值,

  1. dat$n[sample.int(nrow(dat), nrow(dat)*.2)] &lt;- NA

你可以将其扩展为:

  1. dat$len &lt;- with(dat, ave(VECTORMAGNITUDECOUNTS &gt;= 1853, id, FUN=\(x) cumsum(replace(x, is.na(x), 0))))

数据:

  1. dat &lt;- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
  2. 4L, 4L, 5L, 5L, 5L), t = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
  3. 1L, 2L, 3L, 1L, 2L, 3L), n = c(17L, 18L, 5L, 5L, 17L, 14L, 1L,
  4. 15L, 20L, 10L, 7L, 18L, 4L, 4L, 15L)), out.attrs = list(dim = c(id = 5L,
  5. t = 3L), dimnames = list(id = c("id=1", "id=2", "id=3", "id=4",
  6. "id=5"), t = c("t=1", "t=2", "t=3"))), row.names = c(NA, -15L
  7. ), class = "data.frame")
英文:

You could put cumsum of VECTORMAGNITUDECOUNTS &gt;= 1853 in ave. Here an example:

  1. dat$len &lt;- with(dat, ave(n &gt;= 10, id, FUN=cumsum))
  2. dat
  3. # id t n len
  4. # 1 1 1 17 1
  5. # 2 1 2 18 2
  6. # 3 1 3 5 2
  7. # 4 2 1 5 0
  8. # 5 2 2 17 1
  9. # 6 2 3 14 2
  10. # 7 3 1 1 0
  11. # 8 3 2 15 1
  12. # 9 3 3 20 2
  13. # 10 4 1 10 1
  14. # 11 4 2 7 1
  15. # 12 4 3 18 2
  16. # 13 5 1 4 0
  17. # 14 5 2 4 0
  18. # 15 5 3 15 1

If there are NAs in n,

  1. dat$n[sample.int(nrow(dat), nrow(dat)*.2)] &lt;- NA

you can expand this to:

  1. dat$len &lt;- with(dat, ave(n &gt;= 10, id, FUN=\(x) cumsum(replace(x, is.na(x), 0))))

Data:

  1. dat &lt;- structure(list(id = c(1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 4L,
  2. 4L, 4L, 5L, 5L, 5L), t = c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L,
  3. 1L, 2L, 3L, 1L, 2L, 3L), n = c(17L, 18L, 5L, 5L, 17L, 14L, 1L,
  4. 15L, 20L, 10L, 7L, 18L, 4L, 4L, 15L)), out.attrs = list(dim = c(id = 5L,
  5. t = 3L), dimnames = list(id = c(&quot;id=1&quot;, &quot;id=2&quot;, &quot;id=3&quot;, &quot;id=4&quot;,
  6. &quot;id=5&quot;), t = c(&quot;t=1&quot;, &quot;t=2&quot;, &quot;t=3&quot;))), row.names = c(NA, -15L
  7. ), class = &quot;data.frame&quot;)

答案2

得分: 1

以下是代码的翻译部分:

  1. library(data.table)
  2. setDT(dat)[, tracker := (n >= 10) * rowid(n >= 10), id][]

如果需要进一步的帮助,请随时提问。

英文:

library(data.table)

  1. setDT(dat)[,tracker:= (n&gt;=10) * rowid(n &gt;= 10),id][]
  2. id t n tracker
  3. 1: 1 1 17 1
  4. 2: 1 2 18 2
  5. 3: 1 3 5 0
  6. 4: 2 1 5 0
  7. 5: 2 2 17 1
  8. 6: 2 3 14 2
  9. 7: 3 1 1 0
  10. 8: 3 2 15 1
  11. 9: 3 3 20 2
  12. 10: 4 1 10 1
  13. 11: 4 2 7 0
  14. 12: 4 3 18 2
  15. 13: 5 1 4 0
  16. 14: 5 2 4 0
  17. 15: 5 3 15 1

huangapple
  • 本文由 发表于 2023年3月1日 13:38:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75599942.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定