在R中将行(剂量的变化)添加到我的数据框中。

huangapple go评论69阅读模式
英文:

Adding rows (changes in dosing) to my data frame in R

问题

I can help you with the translation, but it seems like you don't want a translation of the code. Here's the translated description of your data manipulation task:

我可以帮助你进行翻译,但是似乎你不需要代码的翻译。以下是你数据处理任务的描述:

我正在处理临床数据,希望将其整理成一种格式,以便以后进行建模。我的当前数据集如下所示:

如你所见,每个患者ID的第一行只包含剂量,其他变量(时间,药物浓度)填充了句点。之后的行中,剂量列得到句点,而其他变量填充。

我的问题是:某些患者在治疗过程中更改了剂量。我想将这些剂量更改添加到我的数据集中,但不知道如何在R中高效地完成这项任务。比如说,患者1在时间=2时从100毫克变为50毫克,患者3在时间=2时从300毫克变为500毫克。我希望我的数据集看起来像这样:

如果你需要进一步的帮助,请随时告诉我。

英文:

I‘m working with clinical data that I wanna arrange in a format so i can model with it later.
my current dataset looks like this:

# create a data frame
df <- data.frame(ID = c(1,1,1,2,2,3,3,3),
                 DOSE = c(100, NA, NA, 200, NA, 300, NA, NA),
                 TIME = c(NA, 1, 2, NA, 3, NA, 1, 2),
                 Drug_concentration = c(NA, 5, 6.5, 3, 4, 8, 10, 12))

ID DOSE TIME Drug concentration
1 100 . .
1 . 1 5
1 . 2 6.5
1 . 3 8
2 200 . .
2 . 1 3
2 . 3 4
3 300 . .
3 . 1 8
3 . 2 10
3 . 3 12

As you can see each first row of a patient ID only contains the dose, other variables (TIME, drug concentration) are filled in with a full stop. After that initial row the dose column gets a full stop and the other variables are filled in.

My question: certain patients have gotten dose changes throughout their treatment. I wanna add those dose changes to my dataset but do not know how i can efficiently do this using R. Let’s say patient 1 had a dose change from 100 to 50 mg at TIME = 2 and patient 3 had a dose change from 300 mg to 500 mg at TIME = 2. i would want my dataset to look like this:

ID DOSE TIME Drug concentration
1 100 . .
1 . 1 5
1 50 . .
1 . 2 6.5
1 . 3 8
2 200 . .
2 . 1 3
2 . 3 4
3 300 . .
3 . 1 8
3 500 . .
3 . 2 10
3 . 3 12

I’ve tried using dyplyr but im not that good at R sadly

答案1

得分: 1

以下是翻译好的部分:

这是一种使用 tidyverse 的方法:

假设我们有:

dose_changes <- data.frame(ID = c(1, 3),
                           DOSE = c(50, 500),
                           TIME = c(2, 2))

我将在缺失的行中添加 TIME = 0,因为我希望确保对于每个 ID,它们首先进行排序。我还调整了 dose_changes 的 TIME 值,以确保它们的 TIME X 排序,以便在 TIME X 处的任何测量之前出现(因为药物更改隐含地在某个未指定的时间之前发生)。然后,我将 dose_changes 数据合并,按 ID 和 TIME 排序,向下填充(默认填充方向)缺失的 DOSE,并最终删除没有药物浓度观察的行。

library(tidyverse)
df %>%
  mutate(TIME = if_else(is.na(TIME), 0, TIME)) %>%
  bind_rows(dose_changes %>% mutate(TIME = TIME - 0.1) %>%
  arrange(ID, TIME) %>%
  group_by(ID) %>%
  fill(DOSE) %>%
  filter(!is.na(Drug_concentration)) %>%
  ungroup()

一个 tibble: 7 × 4

 ID  DOSE  TIME Drug_concentration


1 1 100 1 5
2 1 50 2 6.5
3 2 200 0 3
4 2 200 3 4
5 3 300 0 8
6 3 300 1 10
7 3 500 2 12

英文:

Here's one approach using tidyverse:

Assuming we have:

dose_changes &lt;- data.frame(ID = c(1, 3),
                           DOSE = c(50, 500),
                           TIME = c(2, 2))

I'll add a TIME = 0 for the missing rows, since I want to make sure these sort first for each ID. I also adjust the dose_changes TIME values to make sure their TIME X is sorted so that it appears before any measurements at TIME X (since the drug change implicitly happened at some unspecified time prior). Then I combine in the dose_changes data, arrange y ID and TIME, fill down (the default fill direction) the missing DOSEs, and finally remove the rows w/o Drug_concentration observations.

library(tidyverse)
df %&gt;%
  mutate(TIME = if_else(is.na(TIME), 0, TIME)) %&gt;%
  bind_rows(dose_changes %&gt;% mutate(TIME = TIME - 0.1) %&gt;%
  arrange(ID, TIME) %&gt;%
  group_by(ID) %&gt;%
  fill(DOSE) %&gt;%
  filter(!is.na(Drug_concentration)) %&gt;%
  ungroup()


# A tibble: 7 &#215; 4
     ID  DOSE  TIME Drug_concentration
  &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;              &lt;dbl&gt;
1     1   100     1                5  
2     1    50     2                6.5
3     2   200     0                3  
4     2   200     3                4  
5     3   300     0                8  
6     3   300     1               10  
7     3   500     2               12 

答案2

得分: 1

你的期望输出格式对于R建模来说不太适合,但我假设你有充分的理由要求这种格式。我过去确实曾与命令行分析软件包合作,它们以奇怪的方式要求数据。

无论如何,以下是代码:

library(tidyr)
library(dplyr)

df <- data.frame(
                  ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
                DOSE = c(100L, NA, NA, NA, 200L, NA, NA, 300L, NA, NA, NA),
                TIME = c(NA, 1L, 2L, 3L, NA, 1L, 3L, NA, 1L, 2L, 3L),
  Drug.concentration = c(NA, 5, 6.5, 8, NA, 3, 4, NA, 8, 10, 12)
      )

good_table <- 
    df %>%
    # 1. 填充NAs
    group_by(ID) %>% 
    fill(everything(), .direction = "down") %>% 
    ungroup() %>% 
    # 2. 添加时间点零以进行排序
    mutate(TIME = replace_na(TIME, 0)) %>% 
    # 3. 将Drug.concentration中的NAs替换为-99,以使它们排在最前面
    mutate(Drug.concentration = replace_na(Drug.concentration, -99)) %>% 
    # 4. 添加您需要的行
    add_row(ID   = c(1, 3), 
            DOSE = c(50, 500), 
            TIME = c(2, 2), 
            Drug.concentration = c(-99, -99)) %>% 
    # 5. 排列输出,使其按照你希望的顺序排列
    arrange(ID, TIME, Drug.concentration)

good_table

output_table <-
    good_table %>% 
    # 6. 所有内容都必须为字符类型以适应你的点
    mutate(across(everything(), as.character)) %>% 
    # 7. 仅保留DOSE的第一个原始值。
    mutate(DOSE = if_else(duplicated(DOSE), ".", DOSE)) %>% 
    # 8. 如果DOSE为.,则保留其值
    mutate(across(c(TIME, Drug.concentration), 
                  function(orig_value) {
                      if_else(DOSE == ".", orig_value, ".")
                  }))

output_table

创建于2023年4月1日,使用reprex v2.0.2

附言:你提供的用于生成df的第一组代码是错误的,它不会生成你在下面粘贴的表格。请在下次再次检查这一点。你的代码还生成了NAs而不是点,与下面的表格不同。在此示例中,我保留了NAs,但如果你的数据实际上有点,那么你需要在第13行和第14行之间插入以下内容,以创建NAs并将数据放入适当的类型中。

    mutate(across(c(ID, DOSE, TIME), as.integer)) %>% 
    mutate(across(Drug.concentration, as.numeric)) %>% 
英文:

Your desired output format is not right for modelling in R, but I assume you have a good reason for requesting it. I've certainly worked with command-line analysis packages in the past that ask for data in weird ways.

Anyway, here.

library(tidyr)
library(dplyr)


df &lt;- data.frame(
                  ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
                DOSE = c(100L, NA, NA, NA, 200L, NA, NA, 300L, NA, NA, NA),
                TIME = c(NA, 1L, 2L, 3L, NA, 1L, 3L, NA, 1L, 2L, 3L),
  Drug.concentration = c(NA, 5, 6.5, 8, NA, 3, 4, NA, 8, 10, 12)
      )


good_table &lt;- 
    df %&gt;%
    # 1. Fill NAs
    group_by(ID) %&gt;% 
    fill(everything(), .direction = &quot;down&quot;) %&gt;% 
    ungroup() %&gt;% 
    # 2. Add a timepoint zero for sorting
    mutate(TIME = replace_na(TIME, 0)) %&gt;% 
    # 3. Replace NAs in Drug.concentration with -99 so that they sort to the top.
    mutate(Drug.concentration = replace_na(Drug.concentration, -99)) %&gt;% 
    # 4. Add your rows
    add_row(ID   = c(1, 3), 
            DOSE = c(50, 500), 
            TIME = c(2, 2), 
            Drug.concentration = c(-99, -99)) %&gt;% 
    # 5. Arrange the output so that it is in the order you want.
    arrange(ID, TIME, Drug.concentration)

good_table
#&gt; # A tibble: 13 &#215; 4
#&gt;       ID  DOSE  TIME Drug.concentration
#&gt;    &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;              &lt;dbl&gt;
#&gt;  1     1   100     0              -99  
#&gt;  2     1   100     1                5  
#&gt;  3     1    50     2              -99  
#&gt;  4     1   100     2                6.5
#&gt;  5     1   100     3                8  
#&gt;  6     2   200     0              -99  
#&gt;  7     2   200     1                3  
#&gt;  8     2   200     3                4  
#&gt;  9     3   300     0              -99  
#&gt; 10     3   300     1                8  
#&gt; 11     3   500     2              -99  
#&gt; 12     3   300     2               10  
#&gt; 13     3   300     3               12

output_table &lt;-
    good_table %&gt;% 
    # 6. Everything has to be Character type for your dots
    mutate(across(everything(), as.character)) %&gt;% 
    # 7. Only the first original values of DOSE are kept.
    mutate(DOSE = if_else(duplicated(DOSE), &quot;.&quot;, DOSE)) %&gt;% 
    # 8. If DOSE is . , then keep its values
    mutate(across(c(TIME, Drug.concentration), 
                  function(orig_value) {
                      if_else(DOSE == &quot;.&quot;, orig_value, &quot;.&quot;)
                  }))


output_table
#&gt; # A tibble: 13 &#215; 4
#&gt;    ID    DOSE  TIME  Drug.concentration
#&gt;    &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;             
#&gt;  1 1     100   .     .                 
#&gt;  2 1     .     1     5                 
#&gt;  3 1     50    .     .                 
#&gt;  4 1     .     2     6.5               
#&gt;  5 1     .     3     8                 
#&gt;  6 2     200   .     .                 
#&gt;  7 2     .     1     3                 
#&gt;  8 2     .     3     4                 
#&gt;  9 3     300   .     .                 
#&gt; 10 3     .     1     8                 
#&gt; 11 3     500   .     .                 
#&gt; 12 3     .     2     10                
#&gt; 13 3     .     3     12

<sup>Created on 2023-04-01 with reprex v2.0.2</sup>

PS. The first set of code you gave to generate df is wrong, it doesn't generate the table you pasted underneath it. Please double-check that for next time. Your code also generated NAs instead of dots, unlike the table underneath it. I've kept the NAs for this example, but if your data in fact has dots, then you'll need to insert this between lines 13 and 14 to create the NAs and get the data into their proper types.

    mutate(across(c(ID, DOSE, TIME), as.integer)) %&gt;% 
    mutate(across(Drug.concentration, as.numeric)) %&gt;% 

答案3

得分: 1

你可以使用 which 来获取行数,然后使用 rbind 插入行。

i <- which(df$ID == 1 & df$TIME == 2)
df <- rbind(df[1:(i-1),], setNames(data.frame(1, 50, NA, NA), names(df)), df[i:nrow(df),])

i <- which(df$ID == 3 & df$TIME == 2)
df <- rbind(df[1:(i-1),], setNames(data.frame(3, 500, NA, NA), names(df)), df[i:nrow(df),])

df
#   ID DOSE TIME Drug_concentration
#1   1  100   NA                 NA
#2   1   NA    1                5.0
#3   1   50   NA                 NA
#31  1   NA    2                6.5
#4   2  200   NA                3.0
#5   2   NA    3                4.0
#6   3  300   NA                8.0
#7   3   NA    1               10.0
#11  3  500   NA                 NA
#8   3   NA    2               12.0

希望这对你有帮助。

英文:

You can use which to get the row and use rbind to insert the line.

i &lt;- which(df$ID == 1 &amp; df$TIME == 2)
df &lt;- rbind(df[1:(i-1),], setNames(data.frame(1, 50, NA, NA), names(df)), df[i:nrow(df),])
i &lt;- which(df$ID == 3 &amp; df$TIME == 2)
df &lt;- rbind(df[1:(i-1),], setNames(data.frame(3, 500, NA, NA), names(df)), df[i:nrow(df),])
df
#   ID DOSE TIME Drug_concentration
#1   1  100   NA                 NA
#2   1   NA    1                5.0
#3   1   50   NA                 NA
#31  1   NA    2                6.5
#4   2  200   NA                3.0
#5   2   NA    3                4.0
#6   3  300   NA                8.0
#7   3   NA    1               10.0
#11  3  500   NA                 NA
#8   3   NA    2               12.0

huangapple
  • 本文由 发表于 2023年4月1日 01:01:12
  • 转载请务必保留本文链接:https://go.coder-hub.com/75901009.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定