将数据框展平,透视以获取每个患者ID一行,每次访问日期一组重复列。

huangapple go评论85阅读模式
英文:

Flatten a dataframe, pivot to get one row per patientid and sets of repeating columns per visit date

问题

在R Studio中,你想要将合并的数据框进行重塑,以便根据测量结果获得一组重复的列,但又不知道每个患者可能有的最大访问次数。此外,你希望使用测量代码作为列名的后缀来分组这些列。你尝试使用tidyr::pivot_wider(),但没有得到期望的结果。你希望知道如何正确使用pivot_wider_spec或其他方法来获得期望的输出。

以下是重塑数据框的代码示例,可以实现你的期望输出:

  1. library(dplyr)
  2. library(tidyr)
  3. # 合并数据框
  4. merged_df <- merge(df_visits, df_measure, by = c("PATID", "VisitDate"))
  5. # 添加行号,以便稍后根据行号进行列的分组
  6. merged_df <- merged_df %>%
  7. group_by(PATID) %>%
  8. mutate(row_num = row_number())
  9. # 使用pivot_wider进行重塑
  10. pivoted_df <- merged_df %>%
  11. pivot_wider(names_from = c("row_num", "Moment"),
  12. values_from = c("VisitDate", "Chol", "Kreat", "HbA1c"),
  13. names_sep = "_") %>%
  14. select(PATID, starts_with("Moment"), starts_with("VisitDate"), starts_with("Chol"), starts_with("Kreat"), starts_with("HbA1c"))
  15. # 移除列名的前缀
  16. colnames(pivoted_df) <- gsub("^\\d+_","", colnames(pivoted_df))
  17. # 填充空值为""
  18. pivoted_df[is.na(pivoted_df)] <- ""
  19. # 查看最终结果
  20. print(pivoted_df)

这段代码首先合并了数据框,然后为每个患者的每个访问行添加了行号。接下来,使用pivot_wider来将数据框重塑,最后对列名进行了适当的处理,以获得期望的输出格式。

英文:

In R studio, is it possible to pivot a merged dataframe in such a way that I get a set of repeating columns per measurement?

I have two dataframes, one with the patients and visit moment codes, and one with the patient, visit dates and measurements. The problem is that I don't know beforehand which is the maximum visits per patient, so I don't know beforehand what is the maximum needed columns. It could be that one patient has 3 visits and another can have just 1.

Also, the dataframe with the visits does have a measurement code column (1,2,3 etc.) so ideally, the sets of columns should be grouped using that as the column name postfix, so in the example like Chol_1, Chol_2, Chol_3 or something like that.

So I have the example code below

  1. library(dplyr)
  2. library(tidyr)
  3. df_visits &lt;- read.table(text = &quot;
  4. PATID,Moment,VisitDate
  5. 1001,1,2020-09-04
  6. 1001,2,2021-11-04
  7. 2002,1,2020-03-14
  8. 3003,1,2020-09-24
  9. 3003,2,2020-09-27
  10. 3003,3,2020-10-11
  11. &quot;, header = TRUE, sep = &quot;,&quot;) #
  12. df_measure &lt;- read.table(text = &quot;
  13. PATID,VisitDate,Chol,Kreat,HbA1c
  14. 1001,2021-11-04,3.70,82,6.7
  15. 1001,2020-09-04,4.60,63,7.9
  16. 2002,2020-03-14,2.60,53,6.0
  17. 3003,2020-10-11,3.90,94,7.5
  18. 3003,2020-09-27,4.90,137,6.7
  19. 3003,2020-09-24,4.90,127,7.5
  20. &quot;, header = TRUE, sep = &quot;,&quot;) #
  21. # Merge the dataframes based on patient_id and visit_date
  22. merged_df &lt;- merge(df_visits, df_measure, by = c(&quot;PATID&quot;, &quot;VisitDate&quot;))

And then I tried using tidyr::pivot_wider(), see code below.

  1. pivoted_df &lt;- merged_df %&gt;%
  2. pivot_wider(names_from = c(&quot;Chol&quot;, &quot;Kreat&quot;, &quot;HbA1c&quot;),
  3. values_from = c(&quot;Chol&quot;, &quot;Kreat&quot;, &quot;HbA1c&quot;),
  4. names_sep = &quot;_&quot;)

But that doesn't work, it doesn't "flatten" to PATID and VisitDate and it seems to generate column names based on the actual values. I think maybe pivot_wider_spec can be used, but I can't figure out the correct parameters to get the desired result or if it is even possible. The desired output dataframe is something like this:

  1. | PATID | Moment_01 | Moment_02 | Moment_03 | VisitDate_01 | VisitDate_02 | VisitDate_03 | Chol_01 | Kreat_01 | HbA1c_01 | Chol_02 | Kreat_02 | HbA1c_02 | Chol_03 | Kreat_03 | HbA1c_03 |
  2. | 1001 | 1 | 2 | | 2020-09-04 | 2021-11-04 | | 4.60 | 63 | 7.9 | 3.70 | 82 | 6.7 | | | |
  3. | 2002 | 1 | | | 2020-03-14 | | | 2.60 | 53 | 6.0 | | | | | | |
  4. | 3003 | 1 | 2 | 3 | 2020-09-24 | 2020-09-27 | 2020-10-11 | 4.90 | 127 | 7.5 | 4.90 | 137 | 6.7 | 3.90 | 94 | 7.5 |

Can anyone point me in the right direction?

答案1

得分: 2

以下是代码的翻译部分:

  1. library(tidyverse)
  2. merged_df %>%
  3. pivot_wider(names_from = 'Moment',
  4. values_from = c('Moment', 'VisitDate', 'Chol', 'Kreat', 'HbA1c'))
  5. # A tibble: 3 x 16
  6. PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
  7. <int> <int> <int> <int> <chr> <chr> <chr> <dbl>
  8. 1 1001 1 2 NA 2020-09-04 2021-11-04 <NA> 4.6
  9. 2 2002 1 NA NA 2020-03-14 <NA> <NA> 2.6
  10. 3 3003 1 2 3 2020-09-24 2020-09-27 2020-10-11 4.9
  11. # ... with 8 more variables: Chol_2 <dbl>, Chol_3 <dbl>, Kreat_1 <int>,
  12. # Kreat_2 <int>, Kreat_3 <int>, HbA1c_1 <dbl>, HbA1c_2 <dbl>, HbA1c_3 <dbl>

请注意,代码中的函数和变量名称保持原样未翻译。

英文:

Are you looking for:

  1. library(tidyverse)
  2. merged_df %&gt;%
  3. pivot_wider(names_from = &#39;Moment&#39;,
  4. values_from = c(&#39;Moment&#39;, &#39;VisitDate&#39;, &#39;Chol&#39;, &#39;Kreat&#39;, &#39;HbA1c&#39;))
  5. # A tibble: 3 x 16
  6. PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
  7. &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;int&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
  8. 1 1001 1 2 NA 2020-09-04 2021-11-04 &lt;NA&gt; 4.6
  9. 2 2002 1 NA NA 2020-03-14 &lt;NA&gt; &lt;NA&gt; 2.6
  10. 3 3003 1 2 3 2020-09-24 2020-09-27 2020-10-11 4.9
  11. # ... with 8 more variables: Chol_2 &lt;dbl&gt;, Chol_3 &lt;dbl&gt;, Kreat_1 &lt;int&gt;,
  12. # Kreat_2 &lt;int&gt;, Kreat_3 &lt;int&gt;, HbA1c_1 &lt;dbl&gt;, HbA1c_2 &lt;dbl&gt;, HbA1c_3 &lt;dbl&gt;

huangapple
  • 本文由 发表于 2023年5月18日 00:02:30
  • 转载请务必保留本文链接:https://go.coder-hub.com/76274061.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定