如何分组和合并数据

huangapple go评论54阅读模式
英文:

How to group and combine data

问题

我无法合并数据框中的行。

我有这样的输入:

  element tool time    disease
    AAK1   SE   12       SARS
    AAK1   RI   12       SARS
    AAK1 A5SS   12       SARS
    AAK1   SE   12       MERS
    AAK1   RI   12       MERS
    AAK1  MXE   12       MERS
   ABCC1   SE   12 SARS-CoV-2
   ABCC1  MXE   12 SARS-CoV-2
  ABCC10   SE   12       MERS
  ABCC10   A3SS 12       MERS

我想要这样的输出:

element  time    disease                  SE   RI    MXE   A3SS  A5SS
AAK1       12    SARS and MERS             1     1     0     0     0
AAK1       12    SARS                      0     0     0     0     1
AAK1       12    MERS                      0     0     1     0     0
ABCC1      12    SARS-CoV-2                1     0     1     0     0
ABCC10     12    MERS                      1     0     0     1     0

我期望函数执行以下操作,首先按element分组,您将得到以下结果:

element tool time    disease
AAK1   SE   12       SARS
AAK1   RI   12       SARS
AAK1 A5SS   12       SARS
AAK1   SE   12       MERS
AAK1  MXE   12       MERS
AAK1   RI   12       MERS
  • tool列拆分为5列,
    SE RI MXE A3SS A5SS

  • 检查是否可以合并disease列,使用SE RI MXE A3SS A5SS。

在这个示例中,只有SE和RI可以合并,因为它们同时出现在SARS和MERS(disease列)中,无法合并的观测值,请单独写出。

element  time    disease                     SE   RI    MXE   A3SS  A5SS
AAK1      12    SARS and MERS                1     1     0     0     0
AAK1      12    SARS                         0     0     0     0     1
AAK1      12    MERS                         0     0     1     0     0

以这种方式在每个element上运行循环,并给出一个如上所述的数据框作为输出。

英文:

I am unable to combine rows in a data frame.

I have input like this:

  element tool time    disease
    AAK1   SE   12       SARS
    AAK1   RI   12       SARS
    AAK1 A5SS   12       SARS
    AAK1   SE   12       MERS
    AAK1   RI   12       MERS
    AAK1  MXE   12       MERS
   ABCC1   SE   12 SARS-CoV-2
   ABCC1  MXE   12 SARS-CoV-2
  ABCC10   SE   12       MERS
  ABCC10   A3SS 12       MERS

I want output like this:

element  time    disease                  SE   RI    MXE   A3SS  A5SS
AAK1       12    SARS and MERS             1     1     0     0     0
AAK1       12    SARS                      0     0     0     0     1
AAK1       12    MERS                      0     0     1     0     0
ABCC1      12    SARS-CoV-2                1     0     1     0     0
ABCC10     12    MERS                      1     0     0     1     0

What I expect the function to do is, first group_by element, you will get this:

element tool time    disease
AAK1   SE   12       SARS
AAK1   RI   12       SARS
AAK1 A5SS   12       SARS
AAK1   SE   12       MERS
AAK1  MXE   12       MERS
AAK1   RI   12       MERS
  • Split the tool column into 5 columns,
    SE RI MXE A3SS A5SS

  • check whether you can combine the disease column or not, using SE RI MXE A3SS A5SS.

In this example, only SE and RI can be combined, because they are present in both SARS and MERS (disease column), the observations which cannot be combined, write them separately.

element  time    disease                     SE   RI    MXE   A3SS  A5SS
AAK1      12    SARS and MERS                1     1     0     0     0
AAK1      12    SARS                         0     0     0     0     1
AAK1      12    MERS                         0     0     1     0     0

In this way run the loop on each element, and give one dataframe as output, which I mentioned above.

答案1

得分: 4

库(tidyverse)

df |> 
  按(element, tool, time) |> 
  汇总(disease = paste0(sort(disease), collapse = " 和 ")) |> 
  展开(names_from = tool, values_from = tool, values_fn = length, values_fill = 0) |> 
  不分组()
英文:

You can use summarize with a collapse in paste to stick disease that shares element, tool and time together. Then pivot_wider to spread out the columns, with their length in the tool column as the fill parameter (also missing values are filled with zero).

library(tidyverse)

df |> 
  group_by(element, tool, time) |> 
  summarize(disease = paste0(sort(disease), collapse = " and ")) |> 
  pivot_wider(names_from = tool, values_from = tool, values_fn = length, values_fill = 0) |> 
  ungroup()

# A tibble: 5 × 8
  element  time disease        A5SS   MXE    RI    SE  A3SS
  <chr>   <int> <chr>         <int> <int> <int> <int> <int>
1 AAK1       12 SARS              1     0     0     0     0
2 AAK1       12 MERS              0     1     0     0     0
3 AAK1       12 SARS and MERS     0     0     1     1     0
4 ABCC1      12 SARS-CoV-2        0     1     0     1     0
5 ABCC10     12 MERS              0     0     0     1     1

</details>



huangapple
  • 本文由 发表于 2023年5月25日 13:23:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76329154.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定