如何将具有重复行的数据框重塑为行名称和列名称

huangapple go评论59阅读模式
英文:

How to reshape a dataframe with duplicated rows into rownames and colnames

问题

我一直在努力重塑以下数据框:

geneSymbol <- c(rep("gene1",4),rep("gene2",4),rep("gene3",4))
Sample_name <- rep(c("sample1","sample2","sample3","sample4"),3)
log2FC <- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
df <- data.frame(geneSymbol, Sample_name, log2FC)

在这里,'geneSymbol'和'Sample_name'列有重复的行。我一直在尝试将这个数据框重塑为一个以'geneSymbol'作为行名称,以'Sample_name'作为列名称的数据框,应该如下所示:

      sample1  sample2  sample3  sample4
gene1    1.50    -1.00     0.50     0.20
gene2   -0.30    -0.70    -0.12     0.33
gene3    0.64    -0.17     2.30    -1.70

我手动创建了这个表格,但我不知道我需要使用哪个函数来从df创建这个数据框或表格,因为我的数据框有数百行。如果有人能帮我解决这个问题,我会非常感激。

最好的祝愿,
TJ

英文:

I have been struggling with reshaping the following dataframe:

geneSymbol &lt;- c(rep(&quot;gene1&quot;,4),rep(&quot;gene2&quot;,4),rep(&quot;gene3&quot;,4))
Sample_name &lt;- rep(c(&quot;sample1&quot;,&quot;sample2&quot;,&quot;sample3&quot;,&quot;sample4&quot;),3)
log2FC &lt;- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
df &lt;- data.frame(geneSymbol, Sample_name, log2FC)
&gt; df
   geneSymbol Sample_name log2FC
1       gene1     sample1   1.50
2       gene1     sample2  -1.00
3       gene1     sample3   0.50
4       gene1     sample4   0.20
5       gene2     sample1  -0.30
6       gene2     sample2  -0.70
7       gene2     sample3  -0.12
8       gene2     sample4   0.33
9       gene3     sample1   0.64
10      gene3     sample2  -0.17
11      gene3     sample3   2.30
12      gene3     sample4  -1.70

where the 'geneSymbol' and 'Sample_name' columns have duplicated rows for each. I have been trying to reshape this dataframe into a dataframe which has the 'geneSymbol' as its rownames and the 'Sample_name' as its colnames, which should look as follows:

      sample1  sample2  sample3  sample4
gene1    1.50    -1.00     0.50     0.20
gene2   -0.30    -0.70    -0.12     0.33
gene3    0.64    -0.17     2.30    -1.70

I manually crete this table myself, but I have no idea which function I need to use to make this dataframe or table from df with lines of code as I have hundreds of rows in my dataframe. I would really appreciate it if anyone can help this out for me.

Best wishes,
TJ

答案1

得分: 2

使用 tidyr

tidyr::pivot_wider(df, values_from = 'log2FC', names_from = 'Sample_name')

翻译后的结果如下:

geneSymbol sample1 sample2 sample3 sample4
gene1         1.5    -1       0.5     0.2 
gene2        -0.3    -0.7    -0.12    0.33
gene3         0.64   -0.17    2.3    -1.7
英文:

using tidyr:

  tidyr::pivot_wider(df,values_from =  &#39;log2FC&#39;,names_from = &#39;Sample_name&#39;)

  geneSymbol sample1 sample2 sample3 sample4
  gene1         1.5    -1       0.5     0.2 
  gene2        -0.3    -0.7    -0.12    0.33
  gene3         0.64   -0.17    2.3    -1.7 

答案2

得分: 1

xtabs(log2FC ~ geneSymbol + Sample_name, df)

          Sample_name
geneSymbol sample1 sample2 sample3 sample4
     gene1    1.50   -1.00    0.50    0.20
     gene2   -0.30   -0.70   -0.12    0.33
     gene3    0.64   -0.17    2.30   -1.70
英文:
xtabs(log2FC ~ geneSymbol + Sample_name, df)

          Sample_name
geneSymbol sample1 sample2 sample3 sample4
     gene1    1.50   -1.00    0.50    0.20
     gene2   -0.30   -0.70   -0.12    0.33
     gene3    0.64   -0.17    2.30   -1.70

答案3

得分: 1

使用 acast 函数

library(reshape2)
acast(df, geneSymbol ~ Sample_name, value.var = 'log2FC')
      sample1 sample2 sample3 sample4
gene1    1.50   -1.00    0.50    0.20
gene2   -0.30   -0.70   -0.12    0.33
gene3    0.64   -0.17    2.30   -1.70
英文:

Using acast

library(reshape2)
acast(df, geneSymbol ~ Sample_name, value.var = &#39;log2FC&#39;)
      sample1 sample2 sample3 sample4
gene1    1.50   -1.00    0.50    0.20
gene2   -0.30   -0.70   -0.12    0.33
gene3    0.64   -0.17    2.30   -1.70

答案4

得分: 0

以下是使用 data.table 中的 dcast 函数创建的等效代码示例:

library(data.table)

setDT(df)
dcast(df, geneSymbol ~ Sample_name, value.var = "log2FC")
   geneSymbol sample1 sample2 sample3 sample4
1:      gene1    1.50   -1.00    0.50    0.20
2:      gene2   -0.30   -0.70   -0.12    0.33
3:      gene3    0.64   -0.17    2.30   -1.70

希望这对你有所帮助。

英文:

Here is the data.table pendant using dcast:

library(data.table)

setDT(df)
dcast(df, geneSymbol ~ Sample_name, value.var = &quot;log2FC&quot;)
   geneSymbol sample1 sample2 sample3 sample4
1:      gene1    1.50   -1.00    0.50    0.20
2:      gene2   -0.30   -0.70   -0.12    0.33
3:      gene3    0.64   -0.17    2.30   -1.70

huangapple
  • 本文由 发表于 2023年2月9日 01:03:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定