如何分组和合并数据

huangapple go评论88阅读模式
英文:

How to group and combine data

问题

我无法合并数据框中的行。

我有这样的输入:

  1. element tool time disease
  2. AAK1 SE 12 SARS
  3. AAK1 RI 12 SARS
  4. AAK1 A5SS 12 SARS
  5. AAK1 SE 12 MERS
  6. AAK1 RI 12 MERS
  7. AAK1 MXE 12 MERS
  8. ABCC1 SE 12 SARS-CoV-2
  9. ABCC1 MXE 12 SARS-CoV-2
  10. ABCC10 SE 12 MERS
  11. ABCC10 A3SS 12 MERS

我想要这样的输出:

  1. element time disease SE RI MXE A3SS A5SS
  2. AAK1 12 SARS and MERS 1 1 0 0 0
  3. AAK1 12 SARS 0 0 0 0 1
  4. AAK1 12 MERS 0 0 1 0 0
  5. ABCC1 12 SARS-CoV-2 1 0 1 0 0
  6. ABCC10 12 MERS 1 0 0 1 0

我期望函数执行以下操作,首先按element分组,您将得到以下结果:

  1. element tool time disease
  2. AAK1 SE 12 SARS
  3. AAK1 RI 12 SARS
  4. AAK1 A5SS 12 SARS
  5. AAK1 SE 12 MERS
  6. AAK1 MXE 12 MERS
  7. AAK1 RI 12 MERS
  • tool列拆分为5列,
    SE RI MXE A3SS A5SS

  • 检查是否可以合并disease列,使用SE RI MXE A3SS A5SS。

在这个示例中,只有SE和RI可以合并,因为它们同时出现在SARS和MERS(disease列)中,无法合并的观测值,请单独写出。

  1. element time disease SE RI MXE A3SS A5SS
  2. AAK1 12 SARS and MERS 1 1 0 0 0
  3. AAK1 12 SARS 0 0 0 0 1
  4. AAK1 12 MERS 0 0 1 0 0

以这种方式在每个element上运行循环,并给出一个如上所述的数据框作为输出。

英文:

I am unable to combine rows in a data frame.

I have input like this:

  1. element tool time disease
  2. AAK1 SE 12 SARS
  3. AAK1 RI 12 SARS
  4. AAK1 A5SS 12 SARS
  5. AAK1 SE 12 MERS
  6. AAK1 RI 12 MERS
  7. AAK1 MXE 12 MERS
  8. ABCC1 SE 12 SARS-CoV-2
  9. ABCC1 MXE 12 SARS-CoV-2
  10. ABCC10 SE 12 MERS
  11. ABCC10 A3SS 12 MERS

I want output like this:

  1. element time disease SE RI MXE A3SS A5SS
  2. AAK1 12 SARS and MERS 1 1 0 0 0
  3. AAK1 12 SARS 0 0 0 0 1
  4. AAK1 12 MERS 0 0 1 0 0
  5. ABCC1 12 SARS-CoV-2 1 0 1 0 0
  6. ABCC10 12 MERS 1 0 0 1 0

What I expect the function to do is, first group_by element, you will get this:

  1. element tool time disease
  2. AAK1 SE 12 SARS
  3. AAK1 RI 12 SARS
  4. AAK1 A5SS 12 SARS
  5. AAK1 SE 12 MERS
  6. AAK1 MXE 12 MERS
  7. AAK1 RI 12 MERS
  • Split the tool column into 5 columns,
    SE RI MXE A3SS A5SS

  • check whether you can combine the disease column or not, using SE RI MXE A3SS A5SS.

In this example, only SE and RI can be combined, because they are present in both SARS and MERS (disease column), the observations which cannot be combined, write them separately.

  1. element time disease SE RI MXE A3SS A5SS
  2. AAK1 12 SARS and MERS 1 1 0 0 0
  3. AAK1 12 SARS 0 0 0 0 1
  4. AAK1 12 MERS 0 0 1 0 0

In this way run the loop on each element, and give one dataframe as output, which I mentioned above.

答案1

得分: 4

  1. 库(tidyverse)
  2. df |>
  3. 按(element, tool, time) |>
  4. 汇总(disease = paste0(sort(disease), collapse = " ")) |>
  5. 展开(names_from = tool, values_from = tool, values_fn = length, values_fill = 0) |>
  6. 不分组()
英文:

You can use summarize with a collapse in paste to stick disease that shares element, tool and time together. Then pivot_wider to spread out the columns, with their length in the tool column as the fill parameter (also missing values are filled with zero).

  1. library(tidyverse)
  2. df |>
  3. group_by(element, tool, time) |>
  4. summarize(disease = paste0(sort(disease), collapse = " and ")) |>
  5. pivot_wider(names_from = tool, values_from = tool, values_fn = length, values_fill = 0) |>
  6. ungroup()
  7. # A tibble: 5 × 8
  8. element time disease A5SS MXE RI SE A3SS
  9. <chr> <int> <chr> <int> <int> <int> <int> <int>
  10. 1 AAK1 12 SARS 1 0 0 0 0
  11. 2 AAK1 12 MERS 0 1 0 0 0
  12. 3 AAK1 12 SARS and MERS 0 0 1 1 0
  13. 4 ABCC1 12 SARS-CoV-2 0 1 0 1 0
  14. 5 ABCC10 12 MERS 0 0 0 1 1
  15. </details>

huangapple
  • 本文由 发表于 2023年5月25日 13:23:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76329154.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定