2023年6月15日 00:40:02go评论83阅读模式

英文:

How to pivot_wider/melt a matrix of values with multiple header rows in Excel sheet?

问题

我有一些Excel表格，每个表格都有大约120个实验结果和150名患者的矩阵。但是，实验结果存储为从第4行开始的行，患者存储在第1到第3行的额外标头列数据中。因此，Excel表格看起来像这样：

我想要读取它并将其转换为更长/规范化的数据框，基本上像这个模拟：

我认为可以使用pivot_longer或melt来处理它，但左上角的空单元格会引起问题，即实验结果的列标题从第4行开始，患者列从C列开始。

另外，患者列的名称始终从D列的位置开始，但显然实际名称“EXA_2665”、“EXA_2707”等是事先不知道的，它们将作为新发放的患者ID。

这是我到目前为止得到的代码。

library(tidyverse)
library(readxl)
library(dplyr)
library(tidyr)

##xls_data &lt;- read_excel(path = &quot;lab2_exafinal.xlsx&quot;, sheet = &quot;Labata&quot;)
## TEST hardcoded data for testing purposes
xls_data2 &lt;- read.table(text = &quot;
...1,...2,Patient-Id,EXA_2665,EXA_2707,EXA_2971
,,Tube,87019054,87065976,87093024
,,Tube-Pos,C12,D01,D02
LabId,LabDescr,WeightFact,,,
9,HbA1c,1.00,1.784,0.419,1.045
37,HDL-Chol,1.00,0.684,2.172,1.954
38,LDL-Chol,1.00,2.438,1.995,0.806
41,Glucose,1.00,1.366,1.47,0.978
&quot;, header = TRUE, sep = &quot;,&quot;, na.strings = &quot;&quot;)  

library(reshape)
df_melt &lt;- melt(xls_data, id.vars=c(&quot;...1&quot;, &quot;...2&quot;))
# error: names do not match previous names

pivoted_data &lt;- pivot_longer(
  xls_data,
  cols = names(xls_data[3,]),
  names_to = &quot;patient&quot;,
  values_to = &quot;value&quot; 
  )
#no error but not quite the desired result

EDIT: @Jon-Spring建议使用dput来分享由read_excel函数读取Excel文件时产生的数据的确切格式/结构（顺便说一下，它还将1.045读取为1.044999等，但现在这是一个单独的问题），另外请注意，患者标签名称可能包含“-”字符。

#xls_data &lt;- read_excel(path = &quot;lab2_exafinal.xlsx&quot;, sheet = &quot;Labata&quot;)
#dput(xls_data)
xls_data &lt;- structure(
  list(
    ...1 = c(NA, NA, &quot;LabId&quot;, &quot;9&quot;, &quot;37&quot;, &quot;38&quot;, &quot;41&quot;),
    ...2 = c(NA, NA, &quot;LabDescr&quot;, &quot;HbA1c&quot;, &quot;HDL-Chol&quot;, &quot;LDL-Chol&quot;, &quot;Glucose&quot;),
    ...3 = c(&quot;Tube&quot;, &quot;Tube-Pos&quot;, &quot;WeightFact&quot;, &quot;1,00&quot;, &quot;1,00&quot;, &quot;1,00&quot;, &quot;1,00&quot;),
    EXA_2665 = c(&quot;87019054&quot;, &quot;C12&quot;, NA, &quot;1.784&quot;, &quot;0.68400000000000005&quot;, &quot;2.4380000000000002&quot;, &quot;1.3660000000000001&quot;),
    EXA_2707 = c(&quot;87065976&quot;, &quot;D01&quot;, NA, &quot;0.41899999999999998&quot;, &quot;2.1720000000000002&quot;, &quot;1.9950000000000001&quot;, &quot;1.47&quot;),
    EXA_2971 = c(&quot;87093024&quot;, &quot;D02&quot;, NA, &quot;1.0449999999999999&quot;, &quot;1.954&quot;, &quot;0.80600000000000005&quot;, &quot;0.97799999999999998&quot;)
    ),
    class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;),
    row.names = c(NA, -7L)
  )

当我使用readxl::read_excel函数读取Excel表格时，第一列将是...1，...2。因此，当使用melt时，我认为这些应该是id.vars参数（？），但它会产生一个错误消息。pivot_longer的结果看起来更接近我想要的结果，但问题是有额外的标头行，我不知道如何使用（或仅删除？）它们。

我可以想到其他方法来做到这一点，但我不知道在Rstudio中如何做到：

以某种方式尝试将其读取为两个单独的数据框，患者（C1..F3）和实验室值（A4..F8），然后合并它们。
或者首先将患者标头名称（如EXA_2665等）复制到空行4，然后重新解释A4..F8区域。

还有其他方法来最好地处理这个问题吗？

英文:

I've got a number of Excel sheets, each with a matrix of about 120 lab results and 150 patients. However, the lab results are stored as rows starting at row 4, and the patients are stored with extra header column data in rows 1 to 3. So the Excel sheet look like this:

And I want to read it and convert it to a longer/normalised dataframe, basically like this mockup:

I think it could be processed with pivot_longer or melt, but the empty cells in the top left are causing trouble, i.e. the column headers of the lab result start at row 4, and the patient colums start at column C.

Btw the patient column names always start at position of column D, but obviously the actual names EXA_2665, EXA_2707 etc are not known beforehand, they will be newly issued patient IDs.

Here is the code I've got so far.

library(tidyverse)
library(readxl)
library(dplyr)
library(tidyr)

##xls_data &lt;- read_excel(path = &quot;lab2_exafinal.xlsx&quot;, sheet = &quot;Labata&quot;)
## TEST hardcoded data for testing purposes
xls_data2 &lt;- read.table(text = &quot;
...1,...2,Patient-Id,EXA_2665,EXA_2707,EXA_2971
,,Tube,87019054,87065976,87093024
,,Tube-Pos,C12,D01,D02
LabId,LabDescr,WeightFact,,,
9,HbA1c,1.00,1.784,0.419,1.045
37,HDL-Chol,1.00,0.684,2.172,1.954
38,LDL-Chol,1.00,2.438,1.995,0.806
41,Glucose,1.00,1.366,1.47,0.978
&quot;, header = TRUE, sep = &quot;,&quot;, na.strings = &quot;&quot;)  

library(reshape)
df_melt &lt;- melt(xls_data, id.vars=c(&quot;...1&quot;, &quot;...2&quot;))
# error: names do not match previous names

pivoted_data &lt;- pivot_longer(
  xls_data,
  cols = names(xls_data[3,]),
  names_to = &quot;patient&quot;,
  values_to = &quot;value&quot; 
  )
#no error but not quite the desired result

EDIT: @Jon-Spring suggested using dput to share the exact formatting/structure of data as resulted by the read_excel function reading the Excel file (btw it also read 1.045 as 1.044999 etc but for now that's a separate issue) btw also notice that the patient labelnames may include the - character.

#xls_data &lt;- read_excel(path = &quot;lab2_exafinal.xlsx&quot;, sheet = &quot;Labata&quot;)
#dput(xls_data)
xls_data &lt;- structure(
  list(
    ...1 = c(NA, NA, &quot;LabId&quot;, &quot;9&quot;, &quot;37&quot;, &quot;38&quot;, &quot;41&quot;),
    ...2 = c(NA, NA, &quot;LabDescr&quot;, &quot;HbA1c&quot;, &quot;HDL-Chol&quot;, &quot;LDL-Chol&quot;, &quot;Glucose&quot;),
    ...3 = c(&quot;Tube&quot;, &quot;Tube-Pos&quot;, &quot;WeightFact&quot;, &quot;1,00&quot;, &quot;1,00&quot;, &quot;1,00&quot;, &quot;1,00&quot;),
    EXA_2665 = c(&quot;87019054&quot;, &quot;C12&quot;, NA, &quot;1.784&quot;, &quot;0.68400000000000005&quot;, &quot;2.4380000000000002&quot;, &quot;1.3660000000000001&quot;),
    EXA_2707 = c(&quot;87065976&quot;, &quot;D01&quot;, NA, &quot;0.41899999999999998&quot;, &quot;2.1720000000000002&quot;, &quot;1.9950000000000001&quot;, &quot;1.47&quot;),
    EXA_2971 = c(&quot;87093024&quot;, &quot;D02&quot;, NA, &quot;1.0449999999999999&quot;, &quot;1.954&quot;, &quot;0.80600000000000005&quot;, &quot;0.97799999999999998&quot;)
    ),
    class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;),
    row.names = c(NA, -7L)
  )

When I read the Excel sheet using readxl::read_excel the first columns will be ...1, ...2. So when using melt I think those should be the id.vars parameters(?) but it gives an error message. The result of pivot_longer looks closer to what I'm trying to do, but the problem is there are extra header rows andd I don't know how to use (or just remove?) those.

I can think of other ways to do it, but I don't know how to do that in Rstudio:

Somehow try to read is as two separate data frames, the patients (C1..F3) and the lab values (A4..F8) and then merge it.
Or first copy the patient header names EXA_2665 etc to the empty row 4, and then re-interpret just the A4..F8 region

Any other idea how to best approach this?

答案1

得分: 2

The unheadr包提供了一些用于重新排列具有此类特征的表格的工具，这些特征可能使它们对人类读者更紧凑，但在代码中解析时可能更加棘手。

这里，沿着您第二个想法的思路，我们可以将四行标题合并在一起（第一行作为标题，接下来的三行作为数据行）。我指定了“-”作为分隔符，因为“PatientId”中已经出现了一个非分隔符。

然后，我们可以将这些合并的标题使用pivot_longer转换为多个names列，同时保留每个值。

library(unheadr)
xls_data2 |&gt;
  # 取前三行并将它们与标题合并
  mash_colnames(n_name_rows = 3, sep = &quot;-&quot;) |&gt;
  # 重塑除前三列之外的所有内容，将合并的标题分隔成新的列名
  # 1) 第三个原始标题
  # 2-4) 第三列中的前三个数据行
  pivot_longer(-c(1:3), names_sep = &quot;-&quot;, 
               names_to = c(colnames(xls_data2)[3], xls_data2[1:3, 3])) |&gt;
  dplyr::rename(LabId = 1, LabDescr = 2, WeightFact2 = 3) 
  # 使用dplyr::rename以避免与`reshape::rename`冲突

结果如下：

# 一个tibble：12 × 8
   LabId LabDescr WeightFact2 PatientId Tube     TubePos WeightFact value
   &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;       &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;   &lt;chr&gt;      &lt;chr&gt;
 1 9     HbA1c    1.00        EXA_2665  87019054 C12     NA         1.784
 2 9     HbA1c    1.00        EXA_2707  87065976 D01     NA         0.419
 3 9     HbA1c    1.00        EXA_2971  87093024 D02     NA         1.045
 4 37    HDL-Chol 1.00        EXA_2665  87019054 C12     NA         0.684
 5 37    HDL-Chol 1.00        EXA_2707  87065976 D01     NA         2.172
 6 37    HDL-Chol 1.00        EXA_2971  87093024 D02     NA         1.954
 7 38    LDL-Chol 1.00        EXA_2665  87019054 C12     NA         2.438
 8 38    LDL-Chol 1.00        EXA_2707  87065976 D01     NA         1.995
 9 38    LDL-Chol 1.00        EXA_2971  87093024 D02     NA         0.806
10 41    Glucose  1.00        EXA_2665  87019054 C12     NA         1.366
11 41    Glucose  1.00        EXA_2707  87065976 D01     NA         1.47 
12 41    Glucose  1.00        EXA_2971  87093024 D02     NA         0.978

带有dput(xls_data)的编辑示例：

xls_data |&gt;
  mash_colnames(n_name_rows = 3, sep = &quot;@&quot;) |&gt;
  pivot_longer(-c(1:3), names_sep = &quot;@&quot;, 
               names_to = c(colnames(xls_data2)[3], xls_data2[1:3, 3])) |&gt;
  dplyr::rename(LabId = 1, LabDescr = 2, WeightFact2 = 3) |&gt;
  dplyr::mutate(across(value, as.numeric))

结果如下：

# 一个tibble：12 × 8
   LabId LabDescr WeightFact2 PatientId Tube     TubePos WeightFact value
   &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;       &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt;
 1 9     HbA1c    1,00        EXA_2665  87019054 C12     NA         1.78 
 2 9     HbA1c    1,00        EXA_2707  87065976 D01     NA         0.419
 3 9     HbA1c    1,00        EXA_2971  87093024 D02     NA         1.04 
 4 37    HDL-Chol 1,00        EXA_2665  87019054 C12     NA         0.684
 5 37    HDL-Chol 1,00        EXA_2707  87065976 D01     NA         2.17 
 6 37    HDL-Chol 1,00        EXA_2971  87093024 D02     NA         1.95 
 7 38    LDL-Chol 1,00        EXA_2665  87019054 C12     NA         2.44 
 8 38    LDL-Chol 1,00        EXA_2707  87065976 D01     NA         2.00 
 9 38    LDL-Chol 1,00        EXA_2971  87093024 D02     NA         0.806
10 41    Glucose  1,00        EXA_2665  87019054 C12     NA         1.37 
11 41    Glucose  1,00        EXA_2707  87065976 D01     NA         1.47 
12 41    Glucose  1,00        EXA_2971  87093024 D

<details>
<summary>英文:</summary>

The `unheadr` package offers some tools for reshaping tables with these sorts of features that might make them more compact for human readers but trickier to parse with code. 

Here, along the lines of your 2nd idea, we can combine the four rows of headers (the first of which comes in as a header, the next three as data rows). I specified `-` as the separator because `_` already appears as a non-separator in `PatientId`.

Then we can `pivot_longer` those combined headers into multiple `names` columns alongside each value.


    library(unheadr)
    xls_data2 |&gt;
      # take the first three rows and combine them with the header
      mash_colnames(n_name_rows = 3, sep = &quot;-&quot;) |&gt;
      # reshape everything besides the first three columns, separating the 
      # combined headers into new columns named
      # 1) the third original header
      # 2-4) the first three data rows in the third column
      pivot_longer(-c(1:3), names_sep = &quot;-&quot;, 
                   names_to = c(colnames(xls_data2)[3], xls_data2[1:3, 3])) |&gt;
      dplyr::rename(LabId = 1, LabDescr = 2, WeightFact2 = 3) 
      # dplyr::rename to avoid conflict with `reshape::rename`

Result  


    # A tibble: 12 &#215; 8
       LabId LabDescr WeightFact2 PatientId Tube     TubePos WeightFact value
       &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;       &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;   &lt;chr&gt;      &lt;chr&gt;
     1 9     HbA1c    1.00        EXA_2665  87019054 C12     NA         1.784
     2 9     HbA1c    1.00        EXA_2707  87065976 D01     NA         0.419
     3 9     HbA1c    1.00        EXA_2971  87093024 D02     NA         1.045
     4 37    HDL-Chol 1.00        EXA_2665  87019054 C12     NA         0.684
     5 37    HDL-Chol 1.00        EXA_2707  87065976 D01     NA         2.172
     6 37    HDL-Chol 1.00        EXA_2971  87093024 D02     NA         1.954
     7 38    LDL-Chol 1.00        EXA_2665  87019054 C12     NA         2.438
     8 38    LDL-Chol 1.00        EXA_2707  87065976 D01     NA         1.995
     9 38    LDL-Chol 1.00        EXA_2971  87093024 D02     NA         0.806
    10 41    Glucose  1.00        EXA_2665  87019054 C12     NA         1.366
    11 41    Glucose  1.00        EXA_2707  87065976 D01     NA         1.47 
    12 41    Glucose  1.00        EXA_2971  87093024 D02     NA         0.978


------

Edit with `dput(xls_data)` example:

    xls_data |&gt;
      mash_colnames(n_name_rows = 3, sep = &quot;@&quot;) |&gt;
      pivot_longer(-c(1:3), names_sep = &quot;@&quot;, 
                   names_to = c(colnames(xls_data2)[3], xls_data2[1:3, 3])) |&gt;
      dplyr::rename(LabId = 1, LabDescr = 2, WeightFact2 = 3) |&gt;
      dplyr::mutate(across(value, as.numeric))

Result

    # A tibble: 12 &#215; 8
       LabId LabDescr WeightFact2 PatientId Tube     TubePos WeightFact value
       &lt;chr&gt; &lt;chr&gt;    &lt;chr&gt;       &lt;chr&gt;     &lt;chr&gt;    &lt;chr&gt;   &lt;chr&gt;      &lt;dbl&gt;
     1 9     HbA1c    1,00        EXA_2665  87019054 C12     NA         1.78 
     2 9     HbA1c    1,00        EXA_2707  87065976 D01     NA         0.419
     3 9     HbA1c    1,00        EXA_2971  87093024 D02     NA         1.04 
     4 37    HDL-Chol 1,00        EXA_2665  87019054 C12     NA         0.684
     5 37    HDL-Chol 1,00        EXA_2707  87065976 D01     NA         2.17 
     6 37    HDL-Chol 1,00        EXA_2971  87093024 D02     NA         1.95 
     7 38    LDL-Chol 1,00        EXA_2665  87019054 C12     NA         2.44 
     8 38    LDL-Chol 1,00        EXA_2707  87065976 D01     NA         2.00 
     9 38    LDL-Chol 1,00        EXA_2971  87093024 D02     NA         0.806
    10 41    Glucose  1,00        EXA_2665  87019054 C12     NA         1.37 
    11 41    Glucose  1,00        EXA_2707  87065976 D01     NA         1.47 
    12 41    Glucose  1,00        EXA_2971  87093024 D02     NA         0.978


</details>



# 答案2
**得分**: 1

以下是翻译好的部分：

如果你感兴趣，下面有一种PowerQuery的方法。它是动态的，可以适应任何数量的PatientID数据列。

将数据导入，不包括任何标题

```let Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"保留行" = Table.FirstN(Table.RemoveColumns(Source,{"Column1", "Column2"}),3),
#"提升标题" = Table.PromoteHeaders(Table.Transpose(#"保留行"), [PromoteAllScalars=true]),
#"添加自定义" = Table.AddColumn(#"提升标题", "Custom", each {1 .. Table.RowCount(Source)-4}),
Part1 = Table.ExpandListColumn(#"添加自定义", "Custom"),
#"提升标题2" = Table.PromoteHeaders(Table.Skip(Source,3), [PromoteAllScalars=true]),
leading=3, groupsof=1,
#"添加自定义1" = Table.AddColumn(#"提升标题2", "Custom", each List.Split( List.RemoveFirstN(Record.ToList( _),leading), groupsof) ),
#"添加自定义2" = Table.AddColumn(#"添加自定义1", "Custom0", each Text.Combine(List.Transform(List.FirstN(Record.ToList(_),leading), each Text.From(_)),"|")),
#"移除其他列" = Table.SelectColumns(#"添加自定义2",{"Custom0", "Custom"}),
#"展开自定义" = Table.ExpandListColumn( #"移除其他列", "Custom"),
#"提取值" = Table.TransformColumns(#"展开自定义", {"Custom", each Text.Combine(List.Transform(_, Text.From), "|"), type text}),
#"合并列" = Table.CombineColumns(#"提取值",{"Custom0", "Custom"},Combiner.CombineTextByDelimiter("|", QuoteStyle.None),"Custom"),
#"按分隔符拆分列" = Table.SplitColumn(#"合并列", "Custom", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), List.FirstN(Table.ColumnNames(Source),leading+groupsof)),
Part2 = Table.RenameColumns(#"按分隔符拆分列",{{"Column1", "LabID"}, {"Column2", "LabDescr"}, {"Column3", "WeightFact"}, {"Column4", "Value"}}),
Combine=Table.FromColumns(Table.ToColumns(Part2) & Table.ToColumns(Part1),Table.ColumnNames(Part2)&Table.ColumnNames(Part1)),
#"移除列2" = Table.RemoveColumns(Combine,{"Custom"})
in #"移除列2"

英文:

If you feel like it there is a powerquery method as below. Its dynamic and would accommodate any number of PatientID data columns

Bring the data in without any headers

let Source = Excel.CurrentWorkbook(){[Name=&quot;Table1&quot;]}[Content],
#&quot;Keep Rows&quot; = Table.FirstN(Table.RemoveColumns(Source,{&quot;Column1&quot;, &quot;Column2&quot;}),3),
#&quot;Promoted Headers&quot; = Table.PromoteHeaders(Table.Transpose(#&quot;Keep Rows&quot;), [PromoteAllScalars=true]),
#&quot;Added Custom&quot; = Table.AddColumn(#&quot;Promoted Headers&quot;, &quot;Custom&quot;, each {1 .. Table.RowCount(Source)-4}),
Part1 = Table.ExpandListColumn(#&quot;Added Custom&quot;, &quot;Custom&quot;),
#&quot;Promoted Headers2&quot; = Table.PromoteHeaders(Table.Skip(Source,3), [PromoteAllScalars=true]),
leading=3, groupsof=1,
#&quot;Added Custom1&quot; = Table.AddColumn(#&quot;Promoted Headers2&quot;, &quot;Custom&quot;, each List.Split( List.RemoveFirstN(Record.ToList( _),leading), groupsof) ),
#&quot;Added Custom2&quot; = Table.AddColumn(#&quot;Added Custom1&quot;, &quot;Custom0&quot;, each Text.Combine(List.Transform(List.FirstN(Record.ToList(_),leading), each Text.From(_)),&quot;|&quot;)),
#&quot;Removed Other Columns&quot; = Table.SelectColumns(#&quot;Added Custom2&quot;,{&quot;Custom0&quot;, &quot;Custom&quot;}),
#&quot;Expanded Custom&quot; = Table.ExpandListColumn( #&quot;Removed Other Columns&quot;, &quot;Custom&quot;),
#&quot;Extracted Values&quot; = Table.TransformColumns(#&quot;Expanded Custom&quot;, {&quot;Custom&quot;, each Text.Combine(List.Transform(_, Text.From), &quot;|&quot;), type text}),
#&quot;Merged Columns&quot; = Table.CombineColumns(#&quot;Extracted Values&quot;,{&quot;Custom0&quot;, &quot;Custom&quot;},Combiner.CombineTextByDelimiter(&quot;|&quot;, QuoteStyle.None),&quot;Custom&quot;),
#&quot;Split Column by Delimiter&quot; = Table.SplitColumn(#&quot;Merged Columns&quot;, &quot;Custom&quot;, Splitter.SplitTextByDelimiter(&quot;|&quot;, QuoteStyle.Csv), List.FirstN(Table.ColumnNames(Source),leading+groupsof)),
Part2 = Table.RenameColumns(#&quot;Split Column by Delimiter&quot;,{{&quot;Column1&quot;, &quot;LabID&quot;}, {&quot;Column2&quot;, &quot;LabDescr&quot;}, {&quot;Column3&quot;, &quot;WeightFact&quot;}, {&quot;Column4&quot;, &quot;Value&quot;}}),
Combine=Table.FromColumns(Table.ToColumns(Part2) &amp; Table.ToColumns(Part1),Table.ColumnNames(Part2)&amp;Table.ColumnNames(Part1)),
#&quot;Removed Columns2&quot; = Table.RemoveColumns(Combine,{&quot;Custom&quot;})
in #&quot;Removed Columns2&quot;

答案3

得分: 0

如果您想要一个“纯粹”的 tidyverse 方法，可以使用以下代码：

library(dplyr)
library(tidyr)

xls_data <- structure(
   list(
      ...1 = c(NA, NA, "LabId", "9", "37", "38", "41"),
      ...2 = c(NA, NA, "LabDescr", "HbA1c", "HDL-Chol", "LDL-Chol", "Glucose"),
      ...3 = c("Tube", "TubePos", "WeightFact", "1,00", "1,00", "1,00", "1,00"),
      EXA_2665 = c("87019054", "C12", NA, "1.784", "0.68400000000000005", "2.4380000000000002", "1.3660000000000001"),
      EXA_2707 = c("87065976", "D01", NA, "0.41899999999999998", "2.1720000000000002", "1.9950000000000001", "1.47"),
      EXA_2971 = c("87093024", "D02", NA, "1.0449999999999999", "1.954", "0.80600000000000005", "0.97799999999999998")
   ),
   class = c("tbl_df", "tbl", "data.frame"),
   row.names = c(NA, -7L)
)

# 1. 获取数据的起始位置
data_start <- xls_data %>%
   select(first_col = 1) %>%
   summarize(start = which(!is.na(first_col))[1L] + 1L) %>%
   pull(start)

# 2. 提取两个数据块

values <- xls_data %>%
   slice(data_start:n()) %>%
   set_names(
      c(
         xls_data %>%
            slice(data_start - 1L) %>%
            select(1:3) %>%
            unlist(),
         names(.)[-(1:3)]
      )
   ) %>%
   mutate(across(-(2:3), as.numeric))

meta <-  xls_data %>%
   slice(1:(data_start - 2L)) %>%
   select(-(1:2)) %>%
   set_names(c("PatientId", names(.)[-1L])) %>%
   bind_rows(names(.) %>%
                as.list() %>%
                set_names(.) %>%
                as.data.frame(),
             .
   ) %>%
   t() %>%
   `colnames<-`(.[1L, ]) %>%
   as_tibble() %>%
   slice(-1L)   

# 3. 重塑和连接
values %>%
   pivot_longer(-(1:3)) %>%
   inner_join(meta,
              c(name = "PatientId")) %>%
   rename(PatientId = name) %>%
   select(LabId:PatientId, Tube:TubePos, Value = value) %>%
   arrange(PatientId, LabId)

诚然，这比John的优秀提议要长得多，但是对于掌握 tidyverse 来说是一个不错的（并且冗长的）练习

英文:

If you want a "pure" tidyverse approach you can use the following:

library(dplyr)
library(tidyr)

xls_data &lt;- structure(
   list(
      ...1 = c(NA, NA, &quot;LabId&quot;, &quot;9&quot;, &quot;37&quot;, &quot;38&quot;, &quot;41&quot;),
      ...2 = c(NA, NA, &quot;LabDescr&quot;, &quot;HbA1c&quot;, &quot;HDL-Chol&quot;, &quot;LDL-Chol&quot;, &quot;Glucose&quot;),
      ...3 = c(&quot;Tube&quot;, &quot;TubePos&quot;, &quot;WeightFact&quot;, &quot;1,00&quot;, &quot;1,00&quot;, &quot;1,00&quot;, &quot;1,00&quot;),
      EXA_2665 = c(&quot;87019054&quot;, &quot;C12&quot;, NA, &quot;1.784&quot;, &quot;0.68400000000000005&quot;, &quot;2.4380000000000002&quot;, &quot;1.3660000000000001&quot;),
      EXA_2707 = c(&quot;87065976&quot;, &quot;D01&quot;, NA, &quot;0.41899999999999998&quot;, &quot;2.1720000000000002&quot;, &quot;1.9950000000000001&quot;, &quot;1.47&quot;),
      EXA_2971 = c(&quot;87093024&quot;, &quot;D02&quot;, NA, &quot;1.0449999999999999&quot;, &quot;1.954&quot;, &quot;0.80600000000000005&quot;, &quot;0.97799999999999998&quot;)
   ),
   class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;),
   row.names = c(NA, -7L)
)

# 1. Get start of the data
data_start &lt;- xls_data %&gt;%
   select(first_col = 1) %&gt;%
   summarize(start = which(!is.na(first_col))[1L] + 1L) %&gt;% 
   pull(start)

# 2. Extract the 2 data blocks

values &lt;- xls_data %&gt;% 
   slice(data_start:n()) %&gt;% 
   set_names(
      c(
         xls_data %&gt;% 
            slice(data_start - 1L) %&gt;% 
            select(1:3) %&gt;% 
            unlist(),
         names(.)[-(1:3)]
      )
   ) %&gt;% 
   mutate(across(-(2:3), as.numeric)) 

meta &lt;-  xls_data %&gt;% 
   slice(1:(data_start - 2L)) %&gt;% 
   select(-(1:2)) %&gt;% 
   set_names(c(&quot;PatientId&quot;, names(.)[-1L])) %&gt;% 
   bind_rows(names(.) %&gt;% 
                as.list() %&gt;% 
                set_names(.) %&gt;% 
                as.data.frame(),
             .
   ) %&gt;% 
   t() %&gt;% 
   `colnames&lt;-`(.[1L, ]) %&gt;% 
   as_tibble() %&gt;% 
   slice(-1L)   

# 3. Reshape and join
values %&gt;% 
   pivot_longer(-(1:3)) %&gt;% 
   inner_join(meta,
              c(name = &quot;PatientId&quot;)) %&gt;% 
   rename(PatientId = name) %&gt;% 
   select(LabId:PatientId, Tube:TubePos, Value = value) %&gt;% 
   arrange(PatientId, LabId)

Admittedly, much longer code than the excellent proposal of John, but a good (and verbose) exercise to master the tidyverse

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何在Excel工作表中将具有多个标题行的值矩阵进行”pivot_wider/melt”操作？

问题

答案1

答案3

最佳方法以编程方式扩展和重新排序R中的数组？

如何在R中按连续的开始时间和结束时间分组？

为什么 nls 函数对于相同的模型和类似的数据集返回如此不同的值？

将 JSON 列表转换为数据框。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论