英文:
Is there an R function for collapsing characters into one cell if they have a matching character in another cell?
问题
我有一个包含两列字符的数据框,如下所示:
name | gene |
---|---|
GO:00001 | Gene_1 |
GO:00001 | Gene_2 |
GO:00002 | Gene_3 |
GO:00002 | Gene_4 |
GO:00002 | Gene_5 |
但我需要合并列,使“name”列不重复,并且“gene”列包含与相同“name”匹配的每个基因,用逗号和空格分隔,如下所示:
name | gene |
---|---|
GO:00001 | Gene_1, Gene_2 |
GO:00002 | Gene_3, Gene_4, Gene_5 |
我已经查阅了有关melt、collapse和summarize的文档,但无法弄清楚如何使用字符执行此操作。非常感谢任何帮助!
英文:
I have a dataframe with two columns of characters that looks like this:
name | gene |
---|---|
GO:00001 | Gene_1 |
GO:00001 | Gene_2 |
GO:00002 | Gene_3 |
GO:00002 | Gene_4 |
GO:00002 | Gene_5 |
But I need to collapse the columns so that the "name" column isn't repetitive and the "gene" column contains each gene that matches to the same "name", separated by a comma and a space, like so:
name | gene |
---|---|
GO:00001 | Gene_1, Gene_2 |
GO:00002 | Gene_3, Gene_4, Gene_5 |
I have looked into the documentation for melt, collapse, and summarize, but I can't figure out how to do this with characters. Any help is much appreciated!!
答案1
得分: 0
Using dplyr:
> df %>%
group_by(name) %>%
summarise(gene = paste0(gene, collapse = ","))
# A tibble: 2 × 2
name gene
<chr> <chr>
1 GO:00001 Gene_1,Gene_2
2 GO:00002 Gene_3,Gene_4,Gene_5
Using R base:
aggregate(gene ~ name, FUN = paste0, data = df)
英文:
Using dplyr:
> df %>%
group_by(name) %>%
summarise(gene = paste0(gene, collapse = ","))
# A tibble: 2 × 2
name gene
<chr> <chr>
1 GO:00001 Gene_1,Gene_2
2 GO:00002 Gene_3,Gene_4,Gene_5
Using R base
aggregate(gene ~ name, FUN= paste0, data=df)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论