英文:
Flatten a dataframe, pivot to get one row per patientid and sets of repeating columns per visit date
问题
在R Studio中,你想要将合并的数据框进行重塑,以便根据测量结果获得一组重复的列,但又不知道每个患者可能有的最大访问次数。此外,你希望使用测量代码作为列名的后缀来分组这些列。你尝试使用tidyr::pivot_wider()
,但没有得到期望的结果。你希望知道如何正确使用pivot_wider_spec
或其他方法来获得期望的输出。
以下是重塑数据框的代码示例,可以实现你的期望输出:
library(dplyr)
library(tidyr)
# 合并数据框
merged_df <- merge(df_visits, df_measure, by = c("PATID", "VisitDate"))
# 添加行号,以便稍后根据行号进行列的分组
merged_df <- merged_df %>%
group_by(PATID) %>%
mutate(row_num = row_number())
# 使用pivot_wider进行重塑
pivoted_df <- merged_df %>%
pivot_wider(names_from = c("row_num", "Moment"),
values_from = c("VisitDate", "Chol", "Kreat", "HbA1c"),
names_sep = "_") %>%
select(PATID, starts_with("Moment"), starts_with("VisitDate"), starts_with("Chol"), starts_with("Kreat"), starts_with("HbA1c"))
# 移除列名的前缀
colnames(pivoted_df) <- gsub("^\\d+_","", colnames(pivoted_df))
# 填充空值为""
pivoted_df[is.na(pivoted_df)] <- ""
# 查看最终结果
print(pivoted_df)
这段代码首先合并了数据框,然后为每个患者的每个访问行添加了行号。接下来,使用pivot_wider
来将数据框重塑,最后对列名进行了适当的处理,以获得期望的输出格式。
英文:
In R studio, is it possible to pivot a merged dataframe in such a way that I get a set of repeating columns per measurement?
I have two dataframes, one with the patients and visit moment codes, and one with the patient, visit dates and measurements. The problem is that I don't know beforehand which is the maximum visits per patient, so I don't know beforehand what is the maximum needed columns. It could be that one patient has 3 visits and another can have just 1.
Also, the dataframe with the visits does have a measurement code column (1,2,3 etc.) so ideally, the sets of columns should be grouped using that as the column name postfix, so in the example like Chol_1
, Chol_2
, Chol_3
or something like that.
So I have the example code below
library(dplyr)
library(tidyr)
df_visits <- read.table(text = "
PATID,Moment,VisitDate
1001,1,2020-09-04
1001,2,2021-11-04
2002,1,2020-03-14
3003,1,2020-09-24
3003,2,2020-09-27
3003,3,2020-10-11
", header = TRUE, sep = ",") #
df_measure <- read.table(text = "
PATID,VisitDate,Chol,Kreat,HbA1c
1001,2021-11-04,3.70,82,6.7
1001,2020-09-04,4.60,63,7.9
2002,2020-03-14,2.60,53,6.0
3003,2020-10-11,3.90,94,7.5
3003,2020-09-27,4.90,137,6.7
3003,2020-09-24,4.90,127,7.5
", header = TRUE, sep = ",") #
# Merge the dataframes based on patient_id and visit_date
merged_df <- merge(df_visits, df_measure, by = c("PATID", "VisitDate"))
And then I tried using tidyr::pivot_wider()
, see code below.
pivoted_df <- merged_df %>%
pivot_wider(names_from = c("Chol", "Kreat", "HbA1c"),
values_from = c("Chol", "Kreat", "HbA1c"),
names_sep = "_")
But that doesn't work, it doesn't "flatten" to PATID
and VisitDate
and it seems to generate column names based on the actual values. I think maybe pivot_wider_spec
can be used, but I can't figure out the correct parameters to get the desired result or if it is even possible. The desired output dataframe is something like this:
| PATID | Moment_01 | Moment_02 | Moment_03 | VisitDate_01 | VisitDate_02 | VisitDate_03 | Chol_01 | Kreat_01 | HbA1c_01 | Chol_02 | Kreat_02 | HbA1c_02 | Chol_03 | Kreat_03 | HbA1c_03 |
| 1001 | 1 | 2 | | 2020-09-04 | 2021-11-04 | | 4.60 | 63 | 7.9 | 3.70 | 82 | 6.7 | | | |
| 2002 | 1 | | | 2020-03-14 | | | 2.60 | 53 | 6.0 | | | | | | |
| 3003 | 1 | 2 | 3 | 2020-09-24 | 2020-09-27 | 2020-10-11 | 4.90 | 127 | 7.5 | 4.90 | 137 | 6.7 | 3.90 | 94 | 7.5 |
Can anyone point me in the right direction?
答案1
得分: 2
以下是代码的翻译部分:
library(tidyverse)
merged_df %>%
pivot_wider(names_from = 'Moment',
values_from = c('Moment', 'VisitDate', 'Chol', 'Kreat', 'HbA1c'))
# A tibble: 3 x 16
PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
<int> <int> <int> <int> <chr> <chr> <chr> <dbl>
1 1001 1 2 NA 2020-09-04 2021-11-04 <NA> 4.6
2 2002 1 NA NA 2020-03-14 <NA> <NA> 2.6
3 3003 1 2 3 2020-09-24 2020-09-27 2020-10-11 4.9
# ... with 8 more variables: Chol_2 <dbl>, Chol_3 <dbl>, Kreat_1 <int>,
# Kreat_2 <int>, Kreat_3 <int>, HbA1c_1 <dbl>, HbA1c_2 <dbl>, HbA1c_3 <dbl>
请注意,代码中的函数和变量名称保持原样未翻译。
英文:
Are you looking for:
library(tidyverse)
merged_df %>%
pivot_wider(names_from = 'Moment',
values_from = c('Moment', 'VisitDate', 'Chol', 'Kreat', 'HbA1c'))
# A tibble: 3 x 16
PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
<int> <int> <int> <int> <chr> <chr> <chr> <dbl>
1 1001 1 2 NA 2020-09-04 2021-11-04 <NA> 4.6
2 2002 1 NA NA 2020-03-14 <NA> <NA> 2.6
3 3003 1 2 3 2020-09-24 2020-09-27 2020-10-11 4.9
# ... with 8 more variables: Chol_2 <dbl>, Chol_3 <dbl>, Kreat_1 <int>,
# Kreat_2 <int>, Kreat_3 <int>, HbA1c_1 <dbl>, HbA1c_2 <dbl>, HbA1c_3 <dbl>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论