在R中,如何根据条件将特定行中的值替换为另一行中的值?

huangapple go评论57阅读模式
英文:

In R, how to replace values in specific rows with those from another row, based on a condition?

问题

我有一个随机对照试验的数据。数据以宽格式呈现。

我的数据集中的一些参与者需要在常规时间1和时间2测量之间进行特殊的临时测量。因此,就像下面的ID 1和3一样,这些个体都有一个额外的行,对应于该额外的测量(我称之为t1.5)。

对于这些参与者,我需要t1.5测量来替代t1测量。因此,对于下面的示例数据集,对于ID #1,我希望t1 OUTCOME为48,对于ID #3,t1 OUTCOME应为44。可以完全覆盖并忽略要替换的原始t1值,之后我还可以完全删除所有t1.5行。

示例数据:

tibble::tribble(~ ID, ~TIME, ~OUTCOME, 1, "t1", 50, 1, "t1.5", 48, 1, "t2&quot,
30, 2, "t1&quot ,31 ,2, "t2", 20, 3, "t1", 45, 3, "t1.5", 44, 3, "t2", 33)

ID	TIME	   OUTCOME
1	t1	       50
1	t1.5       48
1	t2	       30
2	t1	       31
2	t2	       20
3	t1	       45
3	t1.5       44
3	t2	       33

我正在使用R 4.2。我可以用蛮力来解决,但我相信有一种更优雅的方法。我希望使用tidyverse(dplyr、tidyr等)语法来解决问题。

我已经有一个数值向量pesky_IDs,其中列出了所有需要替换的ID。我认为利用ID %in% pesky_IDs可能会有帮助...但应该在哪个函数中使用?(或者忽略这一点,直接向我展示更好的方法!)

谢谢大家 - 我一直对这个社区印象深刻!

老实说,我不太确定从哪里开始,不做冗长和不优雅的事情。

英文:

I have data from a randomized controlled trial. The data is in wide format.

Some of the participants in my dataset required a special interim measurement in between the usual time 1 and time 2 measurements. Thus, like IDs 1 and 3 below, those individuals all have an extra row corresponding to that extra measurement (which I call t1.5 below).

For those participants, I need the t1.5 measurement to replace the t1 measurement. So, for the example dataset below, for ID #1, I would like t1 OUTCOME to be 48, and for ID #3, t1 OUTCOME should be 44. It's fine to fully overwrite and ignore the original t1 values for those we're replacing, and after that I can also remove all the t1.5 rows completely.

Example data:

tibble::tribble(~ ID, ~TIME, ~OUTCOME, 1, "t1", 50, 1, "t1.5", 48, 1, "t2", 
    30, 2, "t1" ,31 ,2, "t2", 20, 3, "t1", 45, 3, "t1.5", 44, 3, "t2", 33)

ID	TIME	   OUTCOME
1	t1	       50
1	t1.5       48
1	t2	       30
2	t1	       31
2	t2	       20
3	t1	       45
3	t1.5       44
3	t2	       33

I am using R 4.2. I can brute force it, but I'm sure there's a way to do it elegantly. I'd love a solution using tidyverse (dplyr, tidyr, or what have you) syntax.

I already have a numeric vector pesky_IDs which lists all IDs needing replacement. I thought it could be useful to utilize ID %in% pesky_IDs ...but in which function? (Or just ignore this last point and show me the better way!)

Thanks all - I am continuously impressed by this community!

I honestly am not sure where to start without doing something verbose and inelegant.

答案1

得分: 0

df %>%
  group_by(ID) %>%
  mutate(OUTCOME = replace(OUTCOME, TIME == 't1' & 't1.5' %in% TIME, 
                            OUTCOME[TIME == 't1.5']))
英文:
df %>%
  group_by(ID)%>%
  mutate(OUTCOME = replace(OUTCOME, TIME == 't1' & 't1.5'%in%TIME, 
                            OUTCOME[TIME == 't1.5']))

# A tibble: 8 × 3
# Groups:   ID [3]
     ID TIME  OUTCOME
  <dbl> <chr>   <dbl>
1     1 t1         48
2     1 t1.5       48
3     1 t2         30
4     2 t1         31
5     2 t2         20
6     3 t1         44
7     3 t1.5       44
8     3 t2         33

huangapple
  • 本文由 发表于 2023年2月14日 08:54:17
  • 转载请务必保留本文链接:https://go.coder-hub.com/75442535.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定