2023年5月18日 00:02:30go评论85阅读模式

英文:

Flatten a dataframe, pivot to get one row per patientid and sets of repeating columns per visit date

问题

在R Studio中，你想要将合并的数据框进行重塑，以便根据测量结果获得一组重复的列，但又不知道每个患者可能有的最大访问次数。此外，你希望使用测量代码作为列名的后缀来分组这些列。你尝试使用tidyr::pivot_wider()，但没有得到期望的结果。你希望知道如何正确使用pivot_wider_spec或其他方法来获得期望的输出。

以下是重塑数据框的代码示例，可以实现你的期望输出：

library(dplyr)
library(tidyr)
# 合并数据框
merged_df <- merge(df_visits, df_measure, by = c("PATID", "VisitDate"))
# 添加行号，以便稍后根据行号进行列的分组
merged_df <- merged_df %>%
  group_by(PATID) %>%
  mutate(row_num = row_number())
# 使用pivot_wider进行重塑
pivoted_df <- merged_df %>%
  pivot_wider(names_from = c("row_num", "Moment"),
              values_from = c("VisitDate", "Chol", "Kreat", "HbA1c"),
              names_sep = "_") %>%
  select(PATID, starts_with("Moment"), starts_with("VisitDate"), starts_with("Chol"), starts_with("Kreat"), starts_with("HbA1c"))
# 移除列名的前缀
colnames(pivoted_df) <- gsub("^\\d+_","", colnames(pivoted_df))
# 填充空值为""
pivoted_df[is.na(pivoted_df)] <- ""
# 查看最终结果
print(pivoted_df)

这段代码首先合并了数据框，然后为每个患者的每个访问行添加了行号。接下来，使用pivot_wider来将数据框重塑，最后对列名进行了适当的处理，以获得期望的输出格式。

英文:

In R studio, is it possible to pivot a merged dataframe in such a way that I get a set of repeating columns per measurement?

I have two dataframes, one with the patients and visit moment codes, and one with the patient, visit dates and measurements. The problem is that I don't know beforehand which is the maximum visits per patient, so I don't know beforehand what is the maximum needed columns. It could be that one patient has 3 visits and another can have just 1.

Also, the dataframe with the visits does have a measurement code column (1,2,3 etc.) so ideally, the sets of columns should be grouped using that as the column name postfix, so in the example like Chol_1, Chol_2, Chol_3 or something like that.

So I have the example code below

library(dplyr)
library(tidyr)
df_visits &lt;- read.table(text = &quot;
PATID,Moment,VisitDate
1001,1,2020-09-04
1001,2,2021-11-04
2002,1,2020-03-14
3003,1,2020-09-24
3003,2,2020-09-27
3003,3,2020-10-11
&quot;, header = TRUE, sep = &quot;,&quot;) #  
df_measure &lt;- read.table(text = &quot;
PATID,VisitDate,Chol,Kreat,HbA1c
1001,2021-11-04,3.70,82,6.7
1001,2020-09-04,4.60,63,7.9
2002,2020-03-14,2.60,53,6.0
3003,2020-10-11,3.90,94,7.5
3003,2020-09-27,4.90,137,6.7
3003,2020-09-24,4.90,127,7.5
&quot;, header = TRUE, sep = &quot;,&quot;) #  
# Merge the dataframes based on patient_id and visit_date
merged_df &lt;- merge(df_visits, df_measure, by = c(&quot;PATID&quot;, &quot;VisitDate&quot;))

And then I tried using tidyr::pivot_wider(), see code below.

pivoted_df &lt;- merged_df %&gt;%
pivot_wider(names_from = c(&quot;Chol&quot;, &quot;Kreat&quot;, &quot;HbA1c&quot;),
            values_from = c(&quot;Chol&quot;, &quot;Kreat&quot;, &quot;HbA1c&quot;),
            names_sep = &quot;_&quot;)

But that doesn't work, it doesn't "flatten" to PATID and VisitDate and it seems to generate column names based on the actual values. I think maybe pivot_wider_spec can be used, but I can't figure out the correct parameters to get the desired result or if it is even possible. The desired output dataframe is something like this:

| PATID | Moment_01 | Moment_02 | Moment_03 | VisitDate_01  | VisitDate_02 | VisitDate_03 | Chol_01 | Kreat_01 | HbA1c_01 | Chol_02 | Kreat_02 | HbA1c_02 | Chol_03 | Kreat_03 | HbA1c_03 |
|  1001 |         1 |         2 |           | 2020-09-04    | 2021-11-04   |              |    4.60 |       63 |      7.9 |    3.70 |       82 |      6.7 |         |          |          |
|  2002 |         1 |           |           | 2020-03-14    |              |              |    2.60 |       53 |      6.0 |         |          |          |         |          |          |
|  3003 |         1 |         2 |         3 | 2020-09-24    | 2020-09-27   | 2020-10-11   |    4.90 |      127 |      7.5 |    4.90 |      137 |      6.7 |    3.90 |       94 |      7.5 |

Can anyone point me in the right direction?

答案1

得分: 2

以下是代码的翻译部分：

library(tidyverse)
merged_df %>%
  pivot_wider(names_from = 'Moment',
              values_from = c('Moment', 'VisitDate', 'Chol', 'Kreat', 'HbA1c'))
# A tibble: 3 x 16
  PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
  <int>    <int>    <int>    <int> <chr>       <chr>       <chr>        <dbl>
1  1001        1        2       NA 2020-09-04  2021-11-04  <NA>           4.6
2  2002        1       NA       NA 2020-03-14  <NA>        <NA>           2.6
3  3003        1        2        3 2020-09-24  2020-09-27  2020-10-11     4.9
# ... with 8 more variables: Chol_2 <dbl>, Chol_3 <dbl>, Kreat_1 <int>,
#   Kreat_2 <int>, Kreat_3 <int>, HbA1c_1 <dbl>, HbA1c_2 <dbl>, HbA1c_3 <dbl>

请注意，代码中的函数和变量名称保持原样未翻译。

英文:

Are you looking for:

library(tidyverse)
merged_df %&gt;%
  pivot_wider(names_from = &#39;Moment&#39;,
              values_from = c(&#39;Moment&#39;, &#39;VisitDate&#39;, &#39;Chol&#39;, &#39;Kreat&#39;, &#39;HbA1c&#39;))
# A tibble: 3 x 16
  PATID Moment_1 Moment_2 Moment_3 VisitDate_1 VisitDate_2 VisitDate_3 Chol_1
  &lt;int&gt;    &lt;int&gt;    &lt;int&gt;    &lt;int&gt; &lt;chr&gt;       &lt;chr&gt;       &lt;chr&gt;        &lt;dbl&gt;
1  1001        1        2       NA 2020-09-04  2021-11-04  &lt;NA&gt;           4.6
2  2002        1       NA       NA 2020-03-14  &lt;NA&gt;        &lt;NA&gt;           2.6
3  3003        1        2        3 2020-09-24  2020-09-27  2020-10-11     4.9
# ... with 8 more variables: Chol_2 &lt;dbl&gt;, Chol_3 &lt;dbl&gt;, Kreat_1 &lt;int&gt;,
#   Kreat_2 &lt;int&gt;, Kreat_3 &lt;int&gt;, HbA1c_1 &lt;dbl&gt;, HbA1c_2 &lt;dbl&gt;, HbA1c_3 &lt;dbl&gt;

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

将数据框展平，透视以获取每个患者ID一行，每次访问日期一组重复列。

问题

答案1

合并在R中相同名称的数据集中的列。

创建 Quarto 或 R Markdown 文档的代码块，其中源代码存储为向量中的元素。

如何将图标或自定义图像添加到shinyWidgets的treeInput节点中？

转换一个来自Eurostat的geojson文件为数据框，并绘制地图。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。