英文:
In R, how to replace values in specific rows with those from another row, based on a condition?
问题
我有一个随机对照试验的数据。数据以宽格式呈现。
我的数据集中的一些参与者需要在常规时间1和时间2测量之间进行特殊的临时测量。因此,就像下面的ID 1和3一样,这些个体都有一个额外的行,对应于该额外的测量(我称之为t1.5)。
对于这些参与者,我需要t1.5测量来替代t1测量。因此,对于下面的示例数据集,对于ID #1,我希望t1 OUTCOME为48,对于ID #3,t1 OUTCOME应为44。可以完全覆盖并忽略要替换的原始t1值,之后我还可以完全删除所有t1.5行。
示例数据:
tibble::tribble(~ ID, ~TIME, ~OUTCOME, 1, "t1", 50, 1, "t1.5", 48, 1, "t2",
30, 2, "t1" ,31 ,2, "t2", 20, 3, "t1", 45, 3, "t1.5", 44, 3, "t2", 33)
ID TIME OUTCOME
1 t1 50
1 t1.5 48
1 t2 30
2 t1 31
2 t2 20
3 t1 45
3 t1.5 44
3 t2 33
我正在使用R 4.2。我可以用蛮力来解决,但我相信有一种更优雅的方法。我希望使用tidyverse(dplyr、tidyr等)语法来解决问题。
我已经有一个数值向量pesky_IDs
,其中列出了所有需要替换的ID。我认为利用ID %in% pesky_IDs
可能会有帮助...但应该在哪个函数中使用?(或者忽略这一点,直接向我展示更好的方法!)
谢谢大家 - 我一直对这个社区印象深刻!
老实说,我不太确定从哪里开始,不做冗长和不优雅的事情。
英文:
I have data from a randomized controlled trial. The data is in wide format.
Some of the participants in my dataset required a special interim measurement in between the usual time 1 and time 2 measurements. Thus, like IDs 1 and 3 below, those individuals all have an extra row corresponding to that extra measurement (which I call t1.5 below).
For those participants, I need the t1.5 measurement to replace the t1 measurement. So, for the example dataset below, for ID #1, I would like t1 OUTCOME to be 48, and for ID #3, t1 OUTCOME should be 44. It's fine to fully overwrite and ignore the original t1 values for those we're replacing, and after that I can also remove all the t1.5 rows completely.
Example data:
tibble::tribble(~ ID, ~TIME, ~OUTCOME, 1, "t1", 50, 1, "t1.5", 48, 1, "t2",
30, 2, "t1" ,31 ,2, "t2", 20, 3, "t1", 45, 3, "t1.5", 44, 3, "t2", 33)
ID TIME OUTCOME
1 t1 50
1 t1.5 48
1 t2 30
2 t1 31
2 t2 20
3 t1 45
3 t1.5 44
3 t2 33
I am using R 4.2. I can brute force it, but I'm sure there's a way to do it elegantly. I'd love a solution using tidyverse (dplyr, tidyr, or what have you) syntax.
I already have a numeric vector pesky_IDs
which lists all IDs needing replacement. I thought it could be useful to utilize ID %in% pesky_IDs
...but in which function? (Or just ignore this last point and show me the better way!)
Thanks all - I am continuously impressed by this community!
I honestly am not sure where to start without doing something verbose and inelegant.
答案1
得分: 0
df %>%
group_by(ID) %>%
mutate(OUTCOME = replace(OUTCOME, TIME == 't1' & 't1.5' %in% TIME,
OUTCOME[TIME == 't1.5']))
英文:
df %>%
group_by(ID)%>%
mutate(OUTCOME = replace(OUTCOME, TIME == 't1' & 't1.5'%in%TIME,
OUTCOME[TIME == 't1.5']))
# A tibble: 8 × 3
# Groups: ID [3]
ID TIME OUTCOME
<dbl> <chr> <dbl>
1 1 t1 48
2 1 t1.5 48
3 1 t2 30
4 2 t1 31
5 2 t2 20
6 3 t1 44
7 3 t1.5 44
8 3 t2 33
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论