英文:
How to reshape a dataframe with duplicated rows into rownames and colnames
问题
我一直在努力重塑以下数据框:
geneSymbol <- c(rep("gene1",4),rep("gene2",4),rep("gene3",4))
Sample_name <- rep(c("sample1","sample2","sample3","sample4"),3)
log2FC <- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
df <- data.frame(geneSymbol, Sample_name, log2FC)
在这里,'geneSymbol'和'Sample_name'列有重复的行。我一直在尝试将这个数据框重塑为一个以'geneSymbol'作为行名称,以'Sample_name'作为列名称的数据框,应该如下所示:
sample1 sample2 sample3 sample4
gene1 1.50 -1.00 0.50 0.20
gene2 -0.30 -0.70 -0.12 0.33
gene3 0.64 -0.17 2.30 -1.70
我手动创建了这个表格,但我不知道我需要使用哪个函数来从df
创建这个数据框或表格,因为我的数据框有数百行。如果有人能帮我解决这个问题,我会非常感激。
最好的祝愿,
TJ
英文:
I have been struggling with reshaping the following dataframe:
geneSymbol <- c(rep("gene1",4),rep("gene2",4),rep("gene3",4))
Sample_name <- rep(c("sample1","sample2","sample3","sample4"),3)
log2FC <- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
df <- data.frame(geneSymbol, Sample_name, log2FC)
> df
geneSymbol Sample_name log2FC
1 gene1 sample1 1.50
2 gene1 sample2 -1.00
3 gene1 sample3 0.50
4 gene1 sample4 0.20
5 gene2 sample1 -0.30
6 gene2 sample2 -0.70
7 gene2 sample3 -0.12
8 gene2 sample4 0.33
9 gene3 sample1 0.64
10 gene3 sample2 -0.17
11 gene3 sample3 2.30
12 gene3 sample4 -1.70
where the 'geneSymbol' and 'Sample_name' columns have duplicated rows for each. I have been trying to reshape this dataframe into a dataframe which has the 'geneSymbol' as its rownames and the 'Sample_name' as its colnames, which should look as follows:
sample1 sample2 sample3 sample4
gene1 1.50 -1.00 0.50 0.20
gene2 -0.30 -0.70 -0.12 0.33
gene3 0.64 -0.17 2.30 -1.70
I manually crete this table myself, but I have no idea which function I need to use to make this dataframe or table from df
with lines of code as I have hundreds of rows in my dataframe. I would really appreciate it if anyone can help this out for me.
Best wishes,
TJ
答案1
得分: 2
使用 tidyr:
tidyr::pivot_wider(df, values_from = 'log2FC', names_from = 'Sample_name')
翻译后的结果如下:
geneSymbol sample1 sample2 sample3 sample4
gene1 1.5 -1 0.5 0.2
gene2 -0.3 -0.7 -0.12 0.33
gene3 0.64 -0.17 2.3 -1.7
英文:
using tidyr:
tidyr::pivot_wider(df,values_from = 'log2FC',names_from = 'Sample_name')
geneSymbol sample1 sample2 sample3 sample4
gene1 1.5 -1 0.5 0.2
gene2 -0.3 -0.7 -0.12 0.33
gene3 0.64 -0.17 2.3 -1.7
答案2
得分: 1
xtabs(log2FC ~ geneSymbol + Sample_name, df)
Sample_name
geneSymbol sample1 sample2 sample3 sample4
gene1 1.50 -1.00 0.50 0.20
gene2 -0.30 -0.70 -0.12 0.33
gene3 0.64 -0.17 2.30 -1.70
英文:
xtabs(log2FC ~ geneSymbol + Sample_name, df)
Sample_name
geneSymbol sample1 sample2 sample3 sample4
gene1 1.50 -1.00 0.50 0.20
gene2 -0.30 -0.70 -0.12 0.33
gene3 0.64 -0.17 2.30 -1.70
答案3
得分: 1
使用 acast
函数
library(reshape2)
acast(df, geneSymbol ~ Sample_name, value.var = 'log2FC')
sample1 sample2 sample3 sample4
gene1 1.50 -1.00 0.50 0.20
gene2 -0.30 -0.70 -0.12 0.33
gene3 0.64 -0.17 2.30 -1.70
英文:
Using acast
library(reshape2)
acast(df, geneSymbol ~ Sample_name, value.var = 'log2FC')
sample1 sample2 sample3 sample4
gene1 1.50 -1.00 0.50 0.20
gene2 -0.30 -0.70 -0.12 0.33
gene3 0.64 -0.17 2.30 -1.70
答案4
得分: 0
以下是使用 data.table
中的 dcast
函数创建的等效代码示例:
library(data.table)
setDT(df)
dcast(df, geneSymbol ~ Sample_name, value.var = "log2FC")
geneSymbol sample1 sample2 sample3 sample4
1: gene1 1.50 -1.00 0.50 0.20
2: gene2 -0.30 -0.70 -0.12 0.33
3: gene3 0.64 -0.17 2.30 -1.70
希望这对你有所帮助。
英文:
Here is the data.table
pendant using dcast
:
library(data.table)
setDT(df)
dcast(df, geneSymbol ~ Sample_name, value.var = "log2FC")
geneSymbol sample1 sample2 sample3 sample4
1: gene1 1.50 -1.00 0.50 0.20
2: gene2 -0.30 -0.70 -0.12 0.33
3: gene3 0.64 -0.17 2.30 -1.70
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论