将值根据条件复制到行中

huangapple go评论103阅读模式
英文:

Copy values to rows based on conditions

问题

我正在尝试基于它们匹配的案例的索引日期复制控件的数据集的索引日期变量。在这个数据中,case = 1,control = 0。每对在"matchid"列中都有一个唯一的ID,时间=timepoint。我有以下示例数据集:

  1. Study_ID time index_date case matchid
  2. <chr> <dbl> <dbl> <dbl> <dbl>
  3. 1 101 0 2 1 1
  4. 2 101 1 2 1 1
  5. 3 101 2 2 1 1
  6. 4 101 3 2 1 1
  7. 5 340 0 NA 0 1
  8. 6 340 1 NA 0 1
  9. 7 340 2 NA 0 1
  10. 8 340 3 NA 0 1

我需要将行5-8的index_date列设置为"2",基于"matchid"相同,使其看起来像下面这样:

  1. Study_ID time index_date case matchid
  2. <chr> <dbl> <dbl> <dbl> <dbl>
  3. 1 101 0 2 1 1
  4. 2 101 1 2 1 1
  5. 3 101 2 2 1 1
  6. 4 101 3 2 1 1
  7. 5 340 0 2 0 1
  8. 6 340 1 2 0 1
  9. 7 340 2 2 0 1
  10. 8 340 3 2 0 1

非常感谢您的帮助,因为类似问题的解决方案没有解决我的问题。

我已经尝试了以下Stack Overflow解决方案,但我收到了错误信息。

Stack Overflow链接1

Stack Overflow链接2

英文:

I have a dataset that I am trying to copy an index date variable for controls based on their matched case's index date. In this data, case = 1, control = 0. Each pair has a unique ID in the "matchid" column and time = the timepoint. I have the below sample dataset:

  1. Study_ID time index_date case matchid
  2. <chr> <dbl> <dbl> <dbl> <dbl>
  3. 1 101 0 2 1 1
  4. 2 101 1 2 1 1
  5. 3 101 2 2 1 1
  6. 4 101 3 2 1 1
  7. 5 340 0 NA 0 1
  8. 6 340 1 NA 0 1
  9. 7 340 2 NA 0 1
  10. 8 340 3 NA 0 1

I need the index_date column for rows 5-8 to be "2" based on "matchid" being the same so it would look like the below:

  1. Study_ID time index_date case matchid
  2. <chr> <dbl> <dbl> <dbl> <dbl>
  3. 1 101 0 2 1 1
  4. 2 101 1 2 1 1
  5. 3 101 2 2 1 1
  6. 4 101 3 2 1 1
  7. 5 340 0 2 0 1
  8. 6 340 1 2 0 1
  9. 7 340 2 2 0 1
  10. 8 340 3 2 0 1

Any help would be greatly appreciated as the solution for a similar question did not resolve my issue.

I have tried the below Stack Overflow solutions but I am getting errors.

https://stackoverflow.com/questions/67399813/copy-values-from-one-row-to-another-based-on-condition?newreg=c75a97bbb15f47fb87f7df5a19348948

https://stackoverflow.com/questions/33998856/r-copy-value-based-on-match-in-another-column

答案1

得分: 1

fill 应该执行的操作是:

  1. library(tidyverse)
  2. s = 'Study_ID time index_date case matchid
  3. 1 101 0 2 1 1
  4. 2 101 1 2 1 1
  5. 3 101 2 2 1 1
  6. 4 101 3 2 1 1
  7. 5 340 0 NA 0 1
  8. 6 340 1 NA 0 1
  9. 7 340 2 NA 0 1
  10. 8 340 3 NA 0 1'
  11. t = read.table(text = s)
  12. t %>%
  13. group_by(matchid) %>%
  14. fill(index_date, .direction = 'down')
  15. # Study_ID time index_date case matchid
  16. # <int> <int> <int> <int> <int>
  17. # 1 101 0 2 1 1
  18. # 2 101 1 2 1 1
  19. # 3 101 2 2 1 1
  20. # 4 101 3 2 1 1
  21. # 5 340 0 2 0 1
  22. # 6 340 1 2 0 1
  23. # 7 340 2 2 0 1
  24. # 8 340 3 2 0 1
英文:

fill should do:

  1. library(tidyverse)
  2. s = &#39;Study_ID time index_date case matchid
  3. 1 101 0 2 1 1
  4. 2 101 1 2 1 1
  5. 3 101 2 2 1 1
  6. 4 101 3 2 1 1
  7. 5 340 0 NA 0 1
  8. 6 340 1 NA 0 1
  9. 7 340 2 NA 0 1
  10. 8 340 3 NA 0 1&#39;
  11. t = read.table(text = s)
  12. t %&gt;%
  13. group_by(matchid) %&gt;%
  14. fill(index_date, .direction = &#39;down&#39;)
  15. # Study_ID time index_date case matchid
  16. # &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt;
  17. # 1 101 0 2 1 1
  18. # 2 101 1 2 1 1
  19. # 3 101 2 2 1 1
  20. # 4 101 3 2 1 1
  21. # 5 340 0 2 0 1
  22. # 6 340 1 2 0 1
  23. # 7 340 2 2 0 1
  24. # 8 340 3 2 0 1

答案2

得分: 0

或许是这样的?

  1. library(dplyr)
  2. quux %>%
  3. mutate(
  4. index_date = if_else(is.na(index_date), na.omit(index_date)[1], index_date),
  5. .by = c(matchid, time)
  6. )
  7. # Study_ID time index_date case matchid
  8. # 1 101 0 2 1 1
  9. # 2 101 1 2 1 1
  10. # 3 101 2 2 1 1
  11. # 4 101 3 2 1 1
  12. # 5 340 0 2 0 1
  13. # 6 340 1 2 0 1
  14. # 7 340 2 2 0 1
  15. # 8 340 3 2 0 1

(注意:需要使用dplyr_1.1或更新版本才支持.by=;如果您使用较旧版本,请在mutate之前使用group_by(matchid, time)。)

我推测我们需要做的是将index_date中的所有NA值替换为每个由matchidtime定义的分组中第一个非NA值。

英文:

Perhaps this?

  1. library(dplyr)
  2. quux %&gt;%
  3. mutate(
  4. index_date = if_else(is.na(index_date), na.omit(index_date)[1], index_date),
  5. .by = c(matchid, time)
  6. )
  7. # Study_ID time index_date case matchid
  8. # 1 101 0 2 1 1
  9. # 2 101 1 2 1 1
  10. # 3 101 2 2 1 1
  11. # 4 101 3 2 1 1
  12. # 5 340 0 2 0 1
  13. # 6 340 1 2 0 1
  14. # 7 340 2 2 0 1
  15. # 8 340 3 2 0 1

(Note: .by= needs dplyr_1.1 or newer; if you have older, pre-use group_by(matchid, time) before the mutate.)

I'm inferring that what we need to do is replace all NA values with the first non-NA found in index_date within each group defined by matchid and time.


Data

  1. quux &lt;- structure(list(Study_ID = c(101L, 101L, 101L, 101L, 340L, 340L, 340L, 340L), time = c(0L, 1L, 2L, 3L, 0L, 1L, 2L, 3L), index_date = c(2L, 2L, 2L, 2L, NA, NA, NA, NA), case = c(1L, 1L, 1L, 1L, 0L, 0L, 0L, 0L), matchid = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L)), class = &quot;data.frame&quot;, row.names = c(&quot;1&quot;, &quot;2&quot;, &quot;3&quot;, &quot;4&quot;, &quot;5&quot;, &quot;6&quot;, &quot;7&quot;, &quot;8&quot;))

huangapple
  • 本文由 发表于 2023年7月14日 02:35:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76682345.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定