英文:
How to group and combine data
问题
我无法合并数据框中的行。
我有这样的输入:
element tool time disease
AAK1 SE 12 SARS
AAK1 RI 12 SARS
AAK1 A5SS 12 SARS
AAK1 SE 12 MERS
AAK1 RI 12 MERS
AAK1 MXE 12 MERS
ABCC1 SE 12 SARS-CoV-2
ABCC1 MXE 12 SARS-CoV-2
ABCC10 SE 12 MERS
ABCC10 A3SS 12 MERS
我想要这样的输出:
element time disease SE RI MXE A3SS A5SS
AAK1 12 SARS and MERS 1 1 0 0 0
AAK1 12 SARS 0 0 0 0 1
AAK1 12 MERS 0 0 1 0 0
ABCC1 12 SARS-CoV-2 1 0 1 0 0
ABCC10 12 MERS 1 0 0 1 0
我期望函数执行以下操作,首先按element
分组,您将得到以下结果:
element tool time disease
AAK1 SE 12 SARS
AAK1 RI 12 SARS
AAK1 A5SS 12 SARS
AAK1 SE 12 MERS
AAK1 MXE 12 MERS
AAK1 RI 12 MERS
-
将
tool
列拆分为5列,
SE RI MXE A3SS A5SS -
检查是否可以合并
disease
列,使用SE RI MXE A3SS A5SS。
在这个示例中,只有SE和RI可以合并,因为它们同时出现在SARS和MERS(disease列)中,无法合并的观测值,请单独写出。
element time disease SE RI MXE A3SS A5SS
AAK1 12 SARS and MERS 1 1 0 0 0
AAK1 12 SARS 0 0 0 0 1
AAK1 12 MERS 0 0 1 0 0
以这种方式在每个element
上运行循环,并给出一个如上所述的数据框作为输出。
英文:
I am unable to combine rows in a data frame.
I have input like this:
element tool time disease
AAK1 SE 12 SARS
AAK1 RI 12 SARS
AAK1 A5SS 12 SARS
AAK1 SE 12 MERS
AAK1 RI 12 MERS
AAK1 MXE 12 MERS
ABCC1 SE 12 SARS-CoV-2
ABCC1 MXE 12 SARS-CoV-2
ABCC10 SE 12 MERS
ABCC10 A3SS 12 MERS
I want output like this:
element time disease SE RI MXE A3SS A5SS
AAK1 12 SARS and MERS 1 1 0 0 0
AAK1 12 SARS 0 0 0 0 1
AAK1 12 MERS 0 0 1 0 0
ABCC1 12 SARS-CoV-2 1 0 1 0 0
ABCC10 12 MERS 1 0 0 1 0
What I expect the function to do is, first group_by
element, you will get this:
element tool time disease
AAK1 SE 12 SARS
AAK1 RI 12 SARS
AAK1 A5SS 12 SARS
AAK1 SE 12 MERS
AAK1 MXE 12 MERS
AAK1 RI 12 MERS
-
Split the tool column into 5 columns,
SE RI MXE A3SS A5SS -
check whether you can combine the disease column or not, using SE RI MXE A3SS A5SS.
In this example, only SE and RI can be combined, because they are present in both SARS and MERS (disease column), the observations which cannot be combined, write them separately.
element time disease SE RI MXE A3SS A5SS
AAK1 12 SARS and MERS 1 1 0 0 0
AAK1 12 SARS 0 0 0 0 1
AAK1 12 MERS 0 0 1 0 0
In this way run the loop on each element, and give one dataframe as output, which I mentioned above.
答案1
得分: 4
库(tidyverse)
df |>
按(element, tool, time) |>
汇总(disease = paste0(sort(disease), collapse = " 和 ")) |>
展开(names_from = tool, values_from = tool, values_fn = length, values_fill = 0) |>
不分组()
英文:
You can use summarize
with a collapse
in paste
to stick disease that shares element
, tool
and time
together. Then pivot_wider
to spread out the columns, with their length
in the tool
column as the fill parameter (also missing values are filled with zero).
library(tidyverse)
df |>
group_by(element, tool, time) |>
summarize(disease = paste0(sort(disease), collapse = " and ")) |>
pivot_wider(names_from = tool, values_from = tool, values_fn = length, values_fill = 0) |>
ungroup()
# A tibble: 5 × 8
element time disease A5SS MXE RI SE A3SS
<chr> <int> <chr> <int> <int> <int> <int> <int>
1 AAK1 12 SARS 1 0 0 0 0
2 AAK1 12 MERS 0 1 0 0 0
3 AAK1 12 SARS and MERS 0 0 1 1 0
4 ABCC1 12 SARS-CoV-2 0 1 0 1 0
5 ABCC10 12 MERS 0 0 0 1 1
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论