英文:
Adding rows (changes in dosing) to my data frame in R
问题
I can help you with the translation, but it seems like you don't want a translation of the code. Here's the translated description of your data manipulation task:
我可以帮助你进行翻译,但是似乎你不需要代码的翻译。以下是你数据处理任务的描述:
我正在处理临床数据,希望将其整理成一种格式,以便以后进行建模。我的当前数据集如下所示:
如你所见,每个患者ID的第一行只包含剂量,其他变量(时间,药物浓度)填充了句点。之后的行中,剂量列得到句点,而其他变量填充。
我的问题是:某些患者在治疗过程中更改了剂量。我想将这些剂量更改添加到我的数据集中,但不知道如何在R中高效地完成这项任务。比如说,患者1在时间=2时从100毫克变为50毫克,患者3在时间=2时从300毫克变为500毫克。我希望我的数据集看起来像这样:
如果你需要进一步的帮助,请随时告诉我。
英文:
I‘m working with clinical data that I wanna arrange in a format so i can model with it later.
my current dataset looks like this:
# create a data frame
df <- data.frame(ID = c(1,1,1,2,2,3,3,3),
DOSE = c(100, NA, NA, 200, NA, 300, NA, NA),
TIME = c(NA, 1, 2, NA, 3, NA, 1, 2),
Drug_concentration = c(NA, 5, 6.5, 3, 4, 8, 10, 12))
ID | DOSE | TIME | Drug concentration |
---|---|---|---|
1 | 100 | . | . |
1 | . | 1 | 5 |
1 | . | 2 | 6.5 |
1 | . | 3 | 8 |
2 | 200 | . | . |
2 | . | 1 | 3 |
2 | . | 3 | 4 |
3 | 300 | . | . |
3 | . | 1 | 8 |
3 | . | 2 | 10 |
3 | . | 3 | 12 |
As you can see each first row of a patient ID only contains the dose, other variables (TIME, drug concentration) are filled in with a full stop. After that initial row the dose column gets a full stop and the other variables are filled in.
My question: certain patients have gotten dose changes throughout their treatment. I wanna add those dose changes to my dataset but do not know how i can efficiently do this using R. Let’s say patient 1 had a dose change from 100 to 50 mg at TIME = 2 and patient 3 had a dose change from 300 mg to 500 mg at TIME = 2. i would want my dataset to look like this:
ID | DOSE | TIME | Drug concentration |
---|---|---|---|
1 | 100 | . | . |
1 | . | 1 | 5 |
1 | 50 | . | . |
1 | . | 2 | 6.5 |
1 | . | 3 | 8 |
2 | 200 | . | . |
2 | . | 1 | 3 |
2 | . | 3 | 4 |
3 | 300 | . | . |
3 | . | 1 | 8 |
3 | 500 | . | . |
3 | . | 2 | 10 |
3 | . | 3 | 12 |
I’ve tried using dyplyr but im not that good at R sadly
答案1
得分: 1
以下是翻译好的部分:
这是一种使用 tidyverse
的方法:
假设我们有:
dose_changes <- data.frame(ID = c(1, 3),
DOSE = c(50, 500),
TIME = c(2, 2))
我将在缺失的行中添加 TIME = 0,因为我希望确保对于每个 ID,它们首先进行排序。我还调整了 dose_changes
的 TIME 值,以确保它们的 TIME X 排序,以便在 TIME X 处的任何测量之前出现(因为药物更改隐含地在某个未指定的时间之前发生)。然后,我将 dose_changes
数据合并,按 ID 和 TIME 排序,向下填充(默认填充方向)缺失的 DOSE,并最终删除没有药物浓度观察的行。
library(tidyverse)
df %>%
mutate(TIME = if_else(is.na(TIME), 0, TIME)) %>%
bind_rows(dose_changes %>% mutate(TIME = TIME - 0.1) %>%
arrange(ID, TIME) %>%
group_by(ID) %>%
fill(DOSE) %>%
filter(!is.na(Drug_concentration)) %>%
ungroup()
一个 tibble: 7 × 4
ID DOSE TIME Drug_concentration
1 1 100 1 5
2 1 50 2 6.5
3 2 200 0 3
4 2 200 3 4
5 3 300 0 8
6 3 300 1 10
7 3 500 2 12
英文:
Here's one approach using tidyverse
:
Assuming we have:
dose_changes <- data.frame(ID = c(1, 3),
DOSE = c(50, 500),
TIME = c(2, 2))
I'll add a TIME = 0 for the missing rows, since I want to make sure these sort first for each ID. I also adjust the dose_changes
TIME values to make sure their TIME X is sorted so that it appears before any measurements at TIME X (since the drug change implicitly happened at some unspecified time prior). Then I combine in the dose_changes
data, arrange y ID and TIME, fill down (the default fill direction) the missing DOSEs, and finally remove the rows w/o Drug_concentration observations.
library(tidyverse)
df %>%
mutate(TIME = if_else(is.na(TIME), 0, TIME)) %>%
bind_rows(dose_changes %>% mutate(TIME = TIME - 0.1) %>%
arrange(ID, TIME) %>%
group_by(ID) %>%
fill(DOSE) %>%
filter(!is.na(Drug_concentration)) %>%
ungroup()
# A tibble: 7 × 4
ID DOSE TIME Drug_concentration
<dbl> <dbl> <dbl> <dbl>
1 1 100 1 5
2 1 50 2 6.5
3 2 200 0 3
4 2 200 3 4
5 3 300 0 8
6 3 300 1 10
7 3 500 2 12
答案2
得分: 1
你的期望输出格式对于R建模来说不太适合,但我假设你有充分的理由要求这种格式。我过去确实曾与命令行分析软件包合作,它们以奇怪的方式要求数据。
无论如何,以下是代码:
library(tidyr)
library(dplyr)
df <- data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
DOSE = c(100L, NA, NA, NA, 200L, NA, NA, 300L, NA, NA, NA),
TIME = c(NA, 1L, 2L, 3L, NA, 1L, 3L, NA, 1L, 2L, 3L),
Drug.concentration = c(NA, 5, 6.5, 8, NA, 3, 4, NA, 8, 10, 12)
)
good_table <-
df %>%
# 1. 填充NAs
group_by(ID) %>%
fill(everything(), .direction = "down") %>%
ungroup() %>%
# 2. 添加时间点零以进行排序
mutate(TIME = replace_na(TIME, 0)) %>%
# 3. 将Drug.concentration中的NAs替换为-99,以使它们排在最前面
mutate(Drug.concentration = replace_na(Drug.concentration, -99)) %>%
# 4. 添加您需要的行
add_row(ID = c(1, 3),
DOSE = c(50, 500),
TIME = c(2, 2),
Drug.concentration = c(-99, -99)) %>%
# 5. 排列输出,使其按照你希望的顺序排列
arrange(ID, TIME, Drug.concentration)
good_table
output_table <-
good_table %>%
# 6. 所有内容都必须为字符类型以适应你的点
mutate(across(everything(), as.character)) %>%
# 7. 仅保留DOSE的第一个原始值。
mutate(DOSE = if_else(duplicated(DOSE), ".", DOSE)) %>%
# 8. 如果DOSE为.,则保留其值
mutate(across(c(TIME, Drug.concentration),
function(orig_value) {
if_else(DOSE == ".", orig_value, ".")
}))
output_table
创建于2023年4月1日,使用reprex v2.0.2
附言:你提供的用于生成df
的第一组代码是错误的,它不会生成你在下面粘贴的表格。请在下次再次检查这一点。你的代码还生成了NAs而不是点,与下面的表格不同。在此示例中,我保留了NAs,但如果你的数据实际上有点,那么你需要在第13行和第14行之间插入以下内容,以创建NAs并将数据放入适当的类型中。
mutate(across(c(ID, DOSE, TIME), as.integer)) %>%
mutate(across(Drug.concentration, as.numeric)) %>%
英文:
Your desired output format is not right for modelling in R, but I assume you have a good reason for requesting it. I've certainly worked with command-line analysis packages in the past that ask for data in weird ways.
Anyway, here.
library(tidyr)
library(dplyr)
df <- data.frame(
ID = c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 3L, 3L, 3L, 3L),
DOSE = c(100L, NA, NA, NA, 200L, NA, NA, 300L, NA, NA, NA),
TIME = c(NA, 1L, 2L, 3L, NA, 1L, 3L, NA, 1L, 2L, 3L),
Drug.concentration = c(NA, 5, 6.5, 8, NA, 3, 4, NA, 8, 10, 12)
)
good_table <-
df %>%
# 1. Fill NAs
group_by(ID) %>%
fill(everything(), .direction = "down") %>%
ungroup() %>%
# 2. Add a timepoint zero for sorting
mutate(TIME = replace_na(TIME, 0)) %>%
# 3. Replace NAs in Drug.concentration with -99 so that they sort to the top.
mutate(Drug.concentration = replace_na(Drug.concentration, -99)) %>%
# 4. Add your rows
add_row(ID = c(1, 3),
DOSE = c(50, 500),
TIME = c(2, 2),
Drug.concentration = c(-99, -99)) %>%
# 5. Arrange the output so that it is in the order you want.
arrange(ID, TIME, Drug.concentration)
good_table
#> # A tibble: 13 × 4
#> ID DOSE TIME Drug.concentration
#> <dbl> <dbl> <dbl> <dbl>
#> 1 1 100 0 -99
#> 2 1 100 1 5
#> 3 1 50 2 -99
#> 4 1 100 2 6.5
#> 5 1 100 3 8
#> 6 2 200 0 -99
#> 7 2 200 1 3
#> 8 2 200 3 4
#> 9 3 300 0 -99
#> 10 3 300 1 8
#> 11 3 500 2 -99
#> 12 3 300 2 10
#> 13 3 300 3 12
output_table <-
good_table %>%
# 6. Everything has to be Character type for your dots
mutate(across(everything(), as.character)) %>%
# 7. Only the first original values of DOSE are kept.
mutate(DOSE = if_else(duplicated(DOSE), ".", DOSE)) %>%
# 8. If DOSE is . , then keep its values
mutate(across(c(TIME, Drug.concentration),
function(orig_value) {
if_else(DOSE == ".", orig_value, ".")
}))
output_table
#> # A tibble: 13 × 4
#> ID DOSE TIME Drug.concentration
#> <chr> <chr> <chr> <chr>
#> 1 1 100 . .
#> 2 1 . 1 5
#> 3 1 50 . .
#> 4 1 . 2 6.5
#> 5 1 . 3 8
#> 6 2 200 . .
#> 7 2 . 1 3
#> 8 2 . 3 4
#> 9 3 300 . .
#> 10 3 . 1 8
#> 11 3 500 . .
#> 12 3 . 2 10
#> 13 3 . 3 12
<sup>Created on 2023-04-01 with reprex v2.0.2</sup>
PS. The first set of code you gave to generate df
is wrong, it doesn't generate the table you pasted underneath it. Please double-check that for next time. Your code also generated NAs instead of dots, unlike the table underneath it. I've kept the NAs for this example, but if your data in fact has dots, then you'll need to insert this between lines 13 and 14 to create the NAs and get the data into their proper types.
mutate(across(c(ID, DOSE, TIME), as.integer)) %>%
mutate(across(Drug.concentration, as.numeric)) %>%
答案3
得分: 1
你可以使用 which
来获取行数,然后使用 rbind
插入行。
i <- which(df$ID == 1 & df$TIME == 2)
df <- rbind(df[1:(i-1),], setNames(data.frame(1, 50, NA, NA), names(df)), df[i:nrow(df),])
i <- which(df$ID == 3 & df$TIME == 2)
df <- rbind(df[1:(i-1),], setNames(data.frame(3, 500, NA, NA), names(df)), df[i:nrow(df),])
df
# ID DOSE TIME Drug_concentration
#1 1 100 NA NA
#2 1 NA 1 5.0
#3 1 50 NA NA
#31 1 NA 2 6.5
#4 2 200 NA 3.0
#5 2 NA 3 4.0
#6 3 300 NA 8.0
#7 3 NA 1 10.0
#11 3 500 NA NA
#8 3 NA 2 12.0
希望这对你有帮助。
英文:
You can use which
to get the row and use rbind
to insert the line.
i <- which(df$ID == 1 & df$TIME == 2)
df <- rbind(df[1:(i-1),], setNames(data.frame(1, 50, NA, NA), names(df)), df[i:nrow(df),])
i <- which(df$ID == 3 & df$TIME == 2)
df <- rbind(df[1:(i-1),], setNames(data.frame(3, 500, NA, NA), names(df)), df[i:nrow(df),])
df
# ID DOSE TIME Drug_concentration
#1 1 100 NA NA
#2 1 NA 1 5.0
#3 1 50 NA NA
#31 1 NA 2 6.5
#4 2 200 NA 3.0
#5 2 NA 3 4.0
#6 3 300 NA 8.0
#7 3 NA 1 10.0
#11 3 500 NA NA
#8 3 NA 2 12.0
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论