2023年2月9日 01:03:52go评论86阅读模式

英文:

How to reshape a dataframe with duplicated rows into rownames and colnames

问题

我一直在努力重塑以下数据框：

geneSymbol <- c(rep("gene1",4),rep("gene2",4),rep("gene3",4))
Sample_name <- rep(c("sample1","sample2","sample3","sample4"),3)
log2FC <- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
df <- data.frame(geneSymbol, Sample_name, log2FC)

在这里，'geneSymbol'和'Sample_name'列有重复的行。我一直在尝试将这个数据框重塑为一个以'geneSymbol'作为行名称，以'Sample_name'作为列名称的数据框，应该如下所示：

      sample1  sample2  sample3  sample4
gene1    1.50    -1.00     0.50     0.20
gene2   -0.30    -0.70    -0.12     0.33
gene3    0.64    -0.17     2.30    -1.70

我手动创建了这个表格，但我不知道我需要使用哪个函数来从df创建这个数据框或表格，因为我的数据框有数百行。如果有人能帮我解决这个问题，我会非常感激。

最好的祝愿，
TJ

英文:

I have been struggling with reshaping the following dataframe:

geneSymbol &lt;- c(rep(&quot;gene1&quot;,4),rep(&quot;gene2&quot;,4),rep(&quot;gene3&quot;,4))
Sample_name &lt;- rep(c(&quot;sample1&quot;,&quot;sample2&quot;,&quot;sample3&quot;,&quot;sample4&quot;),3)
log2FC &lt;- c(1.5,-1.0,0.5,0.2,-0.3,-0.7,-0.12,0.33,0.64,-0.17,2.3,-1.7)
df &lt;- data.frame(geneSymbol, Sample_name, log2FC)
&gt; df
   geneSymbol Sample_name log2FC
1       gene1     sample1   1.50
2       gene1     sample2  -1.00
3       gene1     sample3   0.50
4       gene1     sample4   0.20
5       gene2     sample1  -0.30
6       gene2     sample2  -0.70
7       gene2     sample3  -0.12
8       gene2     sample4   0.33
9       gene3     sample1   0.64
10      gene3     sample2  -0.17
11      gene3     sample3   2.30
12      gene3     sample4  -1.70

where the 'geneSymbol' and 'Sample_name' columns have duplicated rows for each. I have been trying to reshape this dataframe into a dataframe which has the 'geneSymbol' as its rownames and the 'Sample_name' as its colnames, which should look as follows:

      sample1  sample2  sample3  sample4
gene1    1.50    -1.00     0.50     0.20
gene2   -0.30    -0.70    -0.12     0.33
gene3    0.64    -0.17     2.30    -1.70

I manually crete this table myself, but I have no idea which function I need to use to make this dataframe or table from df with lines of code as I have hundreds of rows in my dataframe. I would really appreciate it if anyone can help this out for me.

Best wishes,
TJ

答案1

得分: 2

使用 tidyr：

tidyr::pivot_wider(df, values_from = 'log2FC', names_from = 'Sample_name')

翻译后的结果如下：

geneSymbol sample1 sample2 sample3 sample4
gene1         1.5    -1       0.5     0.2 
gene2        -0.3    -0.7    -0.12    0.33
gene3         0.64   -0.17    2.3    -1.7

英文:

using tidyr:

  tidyr::pivot_wider(df,values_from =  &#39;log2FC&#39;,names_from = &#39;Sample_name&#39;)
  geneSymbol sample1 sample2 sample3 sample4
  gene1         1.5    -1       0.5     0.2 
  gene2        -0.3    -0.7    -0.12    0.33
  gene3         0.64   -0.17    2.3    -1.7

答案2

得分: 1

xtabs(log2FC ~ geneSymbol + Sample_name, df)
          Sample_name
geneSymbol sample1 sample2 sample3 sample4
     gene1    1.50   -1.00    0.50    0.20
     gene2   -0.30   -0.70   -0.12    0.33
     gene3    0.64   -0.17    2.30   -1.70

英文:

xtabs(log2FC ~ geneSymbol + Sample_name, df)
          Sample_name
geneSymbol sample1 sample2 sample3 sample4
     gene1    1.50   -1.00    0.50    0.20
     gene2   -0.30   -0.70   -0.12    0.33
     gene3    0.64   -0.17    2.30   -1.70

答案3

得分: 1

使用 acast 函数

library(reshape2)
acast(df, geneSymbol ~ Sample_name, value.var = 'log2FC')
      sample1 sample2 sample3 sample4
gene1    1.50   -1.00    0.50    0.20
gene2   -0.30   -0.70   -0.12    0.33
gene3    0.64   -0.17    2.30   -1.70

英文:

Using acast

library(reshape2)
acast(df, geneSymbol ~ Sample_name, value.var = &#39;log2FC&#39;)
      sample1 sample2 sample3 sample4
gene1    1.50   -1.00    0.50    0.20
gene2   -0.30   -0.70   -0.12    0.33
gene3    0.64   -0.17    2.30   -1.70

答案4

得分: 0

以下是使用 data.table 中的 dcast 函数创建的等效代码示例：

library(data.table)
setDT(df)
dcast(df, geneSymbol ~ Sample_name, value.var = "log2FC")

   geneSymbol sample1 sample2 sample3 sample4
1:      gene1    1.50   -1.00    0.50    0.20
2:      gene2   -0.30   -0.70   -0.12    0.33
3:      gene3    0.64   -0.17    2.30   -1.70

希望这对你有所帮助。

英文:

Here is the data.table pendant using dcast:

library(data.table)
setDT(df)
dcast(df, geneSymbol ~ Sample_name, value.var = &quot;log2FC&quot;)

   geneSymbol sample1 sample2 sample3 sample4
1:      gene1    1.50   -1.00    0.50    0.20
2:      gene2   -0.30   -0.70   -0.12    0.33
3:      gene3    0.64   -0.17    2.30   -1.70

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

如何将具有重复行的数据框重塑为行名称和列名称

问题

答案1

答案2

答案3

答案4

在R中运行密集矩阵计算时，使用多个嵌套的for循环是否有不利之处？

error in R plot_usmap() when trying to color counties by outcome variable

R terra/raster: 上传 netcdf 文件会改变分辨率

如何根据另一个类似（但不等同）的矩阵对矩阵的行进行排序？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。