如何将具有重复行的数据框重塑为行名称和列名称

huangapple go评论86阅读模式
英文:

How to reshape a dataframe with duplicated rows into rownames and colnames

问题

我一直在努力重塑以下数据框:

  1. geneSymbol <- c(rep("gene1",4),rep("gene2",4),rep("gene3",4))
  2. Sample_name <- rep(c("sample1","sample2","sample3","sample4"),3)
  3. log2FC <- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
  4. df <- data.frame(geneSymbol, Sample_name, log2FC)

在这里,'geneSymbol'和'Sample_name'列有重复的行。我一直在尝试将这个数据框重塑为一个以'geneSymbol'作为行名称,以'Sample_name'作为列名称的数据框,应该如下所示:

  1. sample1 sample2 sample3 sample4
  2. gene1 1.50 -1.00 0.50 0.20
  3. gene2 -0.30 -0.70 -0.12 0.33
  4. gene3 0.64 -0.17 2.30 -1.70

我手动创建了这个表格,但我不知道我需要使用哪个函数来从df创建这个数据框或表格,因为我的数据框有数百行。如果有人能帮我解决这个问题,我会非常感激。

最好的祝愿,
TJ

英文:

I have been struggling with reshaping the following dataframe:

  1. geneSymbol &lt;- c(rep(&quot;gene1&quot;,4),rep(&quot;gene2&quot;,4),rep(&quot;gene3&quot;,4))
  2. Sample_name &lt;- rep(c(&quot;sample1&quot;,&quot;sample2&quot;,&quot;sample3&quot;,&quot;sample4&quot;),3)
  3. log2FC &lt;- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
  4. df &lt;- data.frame(geneSymbol, Sample_name, log2FC)
  5. &gt; df
  6. geneSymbol Sample_name log2FC
  7. 1 gene1 sample1 1.50
  8. 2 gene1 sample2 -1.00
  9. 3 gene1 sample3 0.50
  10. 4 gene1 sample4 0.20
  11. 5 gene2 sample1 -0.30
  12. 6 gene2 sample2 -0.70
  13. 7 gene2 sample3 -0.12
  14. 8 gene2 sample4 0.33
  15. 9 gene3 sample1 0.64
  16. 10 gene3 sample2 -0.17
  17. 11 gene3 sample3 2.30
  18. 12 gene3 sample4 -1.70

where the 'geneSymbol' and 'Sample_name' columns have duplicated rows for each. I have been trying to reshape this dataframe into a dataframe which has the 'geneSymbol' as its rownames and the 'Sample_name' as its colnames, which should look as follows:

  1. sample1 sample2 sample3 sample4
  2. gene1 1.50 -1.00 0.50 0.20
  3. gene2 -0.30 -0.70 -0.12 0.33
  4. gene3 0.64 -0.17 2.30 -1.70

I manually crete this table myself, but I have no idea which function I need to use to make this dataframe or table from df with lines of code as I have hundreds of rows in my dataframe. I would really appreciate it if anyone can help this out for me.

Best wishes,
TJ

答案1

得分: 2

使用 tidyr

  1. tidyr::pivot_wider(df, values_from = 'log2FC', names_from = 'Sample_name')

翻译后的结果如下:

  1. geneSymbol sample1 sample2 sample3 sample4
  2. gene1 1.5 -1 0.5 0.2
  3. gene2 -0.3 -0.7 -0.12 0.33
  4. gene3 0.64 -0.17 2.3 -1.7
英文:

using tidyr:

  1. tidyr::pivot_wider(df,values_from = &#39;log2FC&#39;,names_from = &#39;Sample_name&#39;)
  2. geneSymbol sample1 sample2 sample3 sample4
  3. gene1 1.5 -1 0.5 0.2
  4. gene2 -0.3 -0.7 -0.12 0.33
  5. gene3 0.64 -0.17 2.3 -1.7

答案2

得分: 1

  1. xtabs(log2FC ~ geneSymbol + Sample_name, df)
  2. Sample_name
  3. geneSymbol sample1 sample2 sample3 sample4
  4. gene1 1.50 -1.00 0.50 0.20
  5. gene2 -0.30 -0.70 -0.12 0.33
  6. gene3 0.64 -0.17 2.30 -1.70
英文:
  1. xtabs(log2FC ~ geneSymbol + Sample_name, df)
  2. Sample_name
  3. geneSymbol sample1 sample2 sample3 sample4
  4. gene1 1.50 -1.00 0.50 0.20
  5. gene2 -0.30 -0.70 -0.12 0.33
  6. gene3 0.64 -0.17 2.30 -1.70

答案3

得分: 1

使用 acast 函数

  1. library(reshape2)
  2. acast(df, geneSymbol ~ Sample_name, value.var = 'log2FC')
  3. sample1 sample2 sample3 sample4
  4. gene1 1.50 -1.00 0.50 0.20
  5. gene2 -0.30 -0.70 -0.12 0.33
  6. gene3 0.64 -0.17 2.30 -1.70
英文:

Using acast

  1. library(reshape2)
  2. acast(df, geneSymbol ~ Sample_name, value.var = &#39;log2FC&#39;)
  3. sample1 sample2 sample3 sample4
  4. gene1 1.50 -1.00 0.50 0.20
  5. gene2 -0.30 -0.70 -0.12 0.33
  6. gene3 0.64 -0.17 2.30 -1.70

答案4

得分: 0

以下是使用 data.table 中的 dcast 函数创建的等效代码示例:

  1. library(data.table)
  2. setDT(df)
  3. dcast(df, geneSymbol ~ Sample_name, value.var = "log2FC")
  1. geneSymbol sample1 sample2 sample3 sample4
  2. 1: gene1 1.50 -1.00 0.50 0.20
  3. 2: gene2 -0.30 -0.70 -0.12 0.33
  4. 3: gene3 0.64 -0.17 2.30 -1.70

希望这对你有所帮助。

英文:

Here is the data.table pendant using dcast:

  1. library(data.table)
  2. setDT(df)
  3. dcast(df, geneSymbol ~ Sample_name, value.var = &quot;log2FC&quot;)
  1. geneSymbol sample1 sample2 sample3 sample4
  2. 1: gene1 1.50 -1.00 0.50 0.20
  3. 2: gene2 -0.30 -0.70 -0.12 0.33
  4. 3: gene3 0.64 -0.17 2.30 -1.70

huangapple
  • 本文由 发表于 2023年2月9日 01:03:52
  • 转载请务必保留本文链接:https://go.coder-hub.com/75389234.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定