将数据框展平,透视以获取每个患者ID一行,每次访问日期一组重复列。

huangapple go评论58阅读模式
英文:

Flatten a dataframe, pivot to get one row per patientid and sets of repeating columns per visit date

问题

在R Studio中,你想要将合并的数据框进行重塑,以便根据测量结果获得一组重复的列,但又不知道每个患者可能有的最大访问次数。此外,你希望使用测量代码作为列名的后缀来分组这些列。你尝试使用tidyr::pivot_wider(),但没有得到期望的结果。你希望知道如何正确使用pivot_wider_spec或其他方法来获得期望的输出。

以下是重塑数据框的代码示例,可以实现你的期望输出:

library(dplyr)
library(tidyr)

# 合并数据框
merged_df <- merge(df_visits, df_measure, by = c("PATID", "VisitDate"))

# 添加行号,以便稍后根据行号进行列的分组
merged_df <- merged_df %>%
  group_by(PATID) %>%
  mutate(row_num = row_number())

# 使用pivot_wider进行重塑
pivoted_df <- merged_df %>%
  pivot_wider(names_from = c("row_num", "Moment"),
              values_from = c("VisitDate", "Chol", "Kreat", "HbA1c"),
              names_sep = "_") %>%
  select(PATID, starts_with("Moment"), starts_with("VisitDate"), starts_with("Chol"), starts_with("Kreat"), starts_with("HbA1c"))

# 移除列名的前缀
colnames(pivoted_df) <- gsub("^\\d+_","", colnames(pivoted_df))

# 填充空值为""
pivoted_df[is.na(pivoted_df)] <- ""

# 查看最终结果
print(pivoted_df)

这段代码首先合并了数据框,然后为每个患者的每个访问行添加了行号。接下来,使用pivot_wider来将数据框重塑,最后对列名进行了适当的处理,以获得期望的输出格式。

英文:

In R studio, is it possible to pivot a merged dataframe in such a way that I get a set of repeating columns per measurement?

I have two dataframes, one with the patients and visit moment codes, and one with the patient, visit dates and measurements. The problem is that I don't know beforehand which is the maximum visits per patient, so I don't know beforehand what is the maximum needed columns. It could be that one patient has 3 visits and another can have just 1.

Also, the dataframe with the visits does have a measurement code column (1,2,3 etc.) so ideally, the sets of columns should be grouped using that as the column name postfix, so in the example like Chol_1, Chol_2, Chol_3 or something like that.

So I have the example code below

library(dplyr)
library(tidyr)

df_visits &lt;- read.table(text = &quot;
PATID,Moment,VisitDate
1001,1,2020-09-04
1001,2,2021-11-04
2002,1,2020-03-14
3003,1,2020-09-24
3003,2,2020-09-27
3003,3,2020-10-11
&quot;, header = TRUE, sep = &quot;,&quot;) #  

df_measure &lt;- read.table(text = &quot;
PATID,VisitDate,Chol,Kreat,HbA1c
1001,2021-11-04,3.70,82,6.7
1001,2020-09-04,4.60,63,7.9
2002,2020-03-14,2.60,53,6.0
3003,2020-10-11,3.90,94,7.5
3003,2020-09-27,4.90,137,6.7
3003,2020-09-24,4.90,127,7.5
&quot;, header = TRUE, sep = &quot;,&quot;) #  

# Merge the dataframes based on patient_id and visit_date
merged_df &lt;- merge(df_visits, df_measure, by = c(&quot;PATID&quot;, &quot;VisitDate&quot;))

And then I tried using tidyr::pivot_wider(), see code below.

pivoted_df &lt;- merged_df %&gt;%
pivot_wider(names_from = c(&quot;Chol&quot;, &quot;Kreat&quot;, &quot;HbA1c&quot;),
            values_from = c(&quot;Chol&quot;, &quot;Kreat&quot;, &quot;HbA1c&quot;),
            names_sep = &quot;_&quot;)

But that doesn't work, it doesn't "flatten" to PATID and VisitDate and it seems to generate column names based on the actual values. I think maybe pivot_wider_spec can be used, but I can't figure out the correct parameters to get the desired result or if it is even possible. The desired output dataframe is something like this:

| PATID | Moment_01 | Moment_02 | Moment_03 | VisitDate_01  | VisitDate_02 | VisitDate_03 | Chol_01 | Kreat_01 | HbA1c_01 | Chol_02 | Kreat_02 | HbA1c_02 | Chol_03 | Kreat_03 | HbA1c_03 |
|  1001 |         1 |         2 |           | 2020-09-04    | 2021-11-04   |              |    4.60 |       63 |      7.9 |    3.70 |       82 |      6.7 |         |          |          |
|  2002 |         1 |           |           | 2020-03-14    |              |              |    2.60 |       53 |      6.0 |         |          |          |         |          |          |
|  3003 |         1 |         2 |         3 | 2020-09-24    | 2020-09-27   | 2020-10-11   |    4.90 |      127 |      7.5 |    4.90 |      137 |      6.7 |    3.90 |       94 |      7.5 |

Can anyone point me in the right direction?

答案1

得分: 2

以下是代码的翻译部分:

library(tidyverse)
merged_df %>%
  pivot_wider(names_from = 'Moment',
              values_from = c('Moment', 'VisitDate', 'Chol', 'Kreat', 'HbA1c'))

# A tibble: 3 x 16
  PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
  <int>    <int>    <int>    <int> <chr>       <chr>       <chr>        <dbl>
1  1001        1        2       NA 2020-09-04  2021-11-04  <NA>           4.6
2  2002        1       NA       NA 2020-03-14  <NA>        <NA>           2.6
3  3003        1        2        3 2020-09-24  2020-09-27  2020-10-11     4.9
# ... with 8 more variables: Chol_2 <dbl>, Chol_3 <dbl>, Kreat_1 <int>,
#   Kreat_2 <int>, Kreat_3 <int>, HbA1c_1 <dbl>, HbA1c_2 <dbl>, HbA1c_3 <dbl>

请注意,代码中的函数和变量名称保持原样未翻译。

英文:

Are you looking for:

library(tidyverse)
merged_df %&gt;%
  pivot_wider(names_from = &#39;Moment&#39;,
              values_from = c(&#39;Moment&#39;, &#39;VisitDate&#39;, &#39;Chol&#39;, &#39;Kreat&#39;, &#39;HbA1c&#39;))

# A tibble: 3 x 16
  PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
  &lt;int&gt;    &lt;int&gt;    &lt;int&gt;    &lt;int&gt; &lt;chr&gt;       &lt;chr&gt;       &lt;chr&gt;        &lt;dbl&gt;
1  1001        1        2       NA 2020-09-04  2021-11-04  &lt;NA&gt;           4.6
2  2002        1       NA       NA 2020-03-14  &lt;NA&gt;        &lt;NA&gt;           2.6
3  3003        1        2        3 2020-09-24  2020-09-27  2020-10-11     4.9
# ... with 8 more variables: Chol_2 &lt;dbl&gt;, Chol_3 &lt;dbl&gt;, Kreat_1 &lt;int&gt;,
#   Kreat_2 &lt;int&gt;, Kreat_3 &lt;int&gt;, HbA1c_1 &lt;dbl&gt;, HbA1c_2 &lt;dbl&gt;, HbA1c_3 &lt;dbl&gt;

huangapple
  • 本文由 发表于 2023年5月18日 00:02:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76274061.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定