英文:
adding a new column to a datframe for 3 condition cases
问题
I have a dataframe like this:
geneID baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000000003.14 2700.791337 -0.345466785 0.202389477 -1.706940451 0.087833121 0.001
ENSG00000000419.12 1571.143316 -0.348258736 0.150807514 -2.309293001 0.020927328 0.120478416
ENSG00000000457.13 526.2282051 -0.051250213 0.180482116 -0.283962835 0.776438862 0.003
ENSG00000000460.16 1108.138705 -0.078538637 0.167859597 -0.467882913 0.639868323 0.827329552
ENSG00000001036.13 2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10 1325.447272 0.89 0.154875429 -0.423289781 0.672083849 0.0004
ENSG00000001167.14 1829.828657 -0.221749678 0.153100403 -1.448393819 0.147506943 0.386446872
ENSG00000001460.17 641.7582879 -0.252419377 0.183602552 -1.374814095 0.169189087 0.417816879
I want to add a column named threshold such that if
df$log2FoldChange > 0 & df$padj < 0.05 this should be labeled up
df$log2FoldChange < 0 & df$padj < 0.05 this should be labeled down
and anything else as NS
So for the above table, output should look like this:
geneID baseMean log2FoldChange lfcSE stat pvalue padj threshold
ENSG00000000003.14 2700.791337 -0.345466785 0.202389477 -1.706940451 0.087833121 0.001 down
ENSG00000000419.12 1571.143316 -0.348258736 0.150807514 -2.309293001 0.020927328 0.120478416 NS
ENSG00000000457.13 526.2282051 -0.051250213 0.180482116 -0.283962835 0.776438862 0.003 down
ENSG00000000460.16 1108.138705 -0.078538637 0.167859597 -0.467882913 0.639868323 0.827329552 NS
ENSG00000001036.13 2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774 NS
ENSG00000001084.10 1325.447272 0.89 0.154875429 -0.423289781 0.672083849 0.0004 up
ENSG00000001167.14 1829.828657 -0.221749678 0.153100403 -1.448393819 0.147506943 0.386446872 NS
ENSG00000001460.17 641.7582879 -0.252419377 0.183602552 -1.374814095 0.169189087 0.417816879 NS
I tried this but of course it is not doing what I want:
dat <- mutate(dat, threshold = if_else(dat$padj <= 0.05 & dat$log2FoldChange > 0, "up", "NS"))
dat <- mutate(dat, threshold = if_else(dat$padj <= 0.05 & dat$log2FoldChange < 0, "down", "NS"))
英文:
I have a dataframe like this:
geneID baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000000003.14 2700.791337 -0.345466785 0.202389477 -1.706940451 0.087833121 0.001
ENSG00000000419.12 1571.143316 -0.348258736 0.150807514 -2.309293001 0.020927328 0.120478416
ENSG00000000457.13 526.2282051 -0.051250213 0.180482116 -0.283962835 0.776438862 0.003
ENSG00000000460.16 1108.138705 -0.078538637 0.167859597 -0.467882913 0.639868323 0.827329552
ENSG00000001036.13 2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10 1325.447272 0.89 0.154875429 -0.423289781 0.672083849 0.0004
ENSG00000001167.14 1829.828657 -0.221749678 0.153100403 -1.448393819 0.147506943 0.386446872
ENSG00000001460.17 641.7582879 -0.252419377 0.183602552 -1.374814095 0.169189087 0.417816879
I want to add a column named threshold such that if
df$log2FoldChange > 0 & df$padj < 0.05 this should be labeled up
df$log2FoldChange < 0 & df$padj < 0.05 this should be labeled down
and anything else as NS
So for the above table, output should look like this:
geneID baseMean log2FoldChange lfcSE stat pvalue padj threshold
ENSG00000000003.14 2700.791337 -0.345466785 0.202389477 -1.706940451 0.087833121 0.001 down
ENSG00000000419.12 1571.143316 -0.348258736 0.150807514 -2.309293001 0.020927328 0.120478416 NS
ENSG00000000457.13 526.2282051 -0.051250213 0.180482116 -0.283962835 0.776438862 0.003 down
ENSG00000000460.16 1108.138705 -0.078538637 0.167859597 -0.467882913 0.639868323 0.827329552 NS
ENSG00000001036.13 2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774 NS
ENSG00000001084.10 1325.447272 0.89 0.154875429 -0.423289781 0.672083849 0.0004 up
ENSG00000001167.14 1829.828657 -0.221749678 0.153100403 -1.448393819 0.147506943 0.386446872 NS
ENSG00000001460.17 641.7582879 -0.252419377 0.183602552 -1.374814095 0.169189087 0.417816879 NS
I tried this but of course it is not doing what I want:
dat <- mutate(dat,threshold=if_else(dat$padj <= 0.05 & dat$log2FoldChange > 0,"up","NS"))
dat <- mutate(dat,threshold=if_else(dat$padj <= 0.05 & dat$log2FoldChange < 0,"down","NS"))
答案1
得分: 2
以下是翻译好的部分:
library(dplyr)
df <- read.table(text = "geneID baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000000003.14 2700.791337 -0.345466785 0.202389477 -1.706940451 0.087833121 0.001
ENSG00000000419.12 1571.143316 -0.348258736 0.150807514 -2.309293001 0.020927328 0.120478416
ENSG00000000457.13 526.2282051 -0.051250213 0.180482116 -0.283962835 0.776438862 0.003
ENSG00000000460.16 1108.138705 -0.078538637 0.167859597 -0.467882913 0.639868323 0.827329552
ENSG00000001036.13 2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10 1325.447272 0.89 0.154875429 -0.423289781 0.672083849 0.0004
ENSG00000001167.14 1829.828657 -0.221749678 0.153100403 -1.448393819 0.147506943 0.386446872
ENSG00000001460.17 641.7582879 -0.252419377 0.183602552 -1.374814095 0.169189087 0.417816879",
header = TRUE)
dat <- mutate(df, threshold = case_when(padj <= 0.05 & log2FoldChange > 0 ~ "up",
padj <= 0.05 & log2FoldChange < 0 ~ "down",
TRUE ~ "NS"))
dat
<sup>创建于2023年03月07日,使用 reprex v2.0.2</sup>
英文:
One option is to use case_when()
from the dplyr package to do both "up" and "down" (or else "NS") in one step, e.g.
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- read.table(text = "geneID baseMean log2FoldChange lfcSE stat pvalue padj
ENSG00000000003.14 2700.791337 -0.345466785 0.202389477 -1.706940451 0.087833121 0.001
ENSG00000000419.12 1571.143316 -0.348258736 0.150807514 -2.309293001 0.020927328 0.120478416
ENSG00000000457.13 526.2282051 -0.051250213 0.180482116 -0.283962835 0.776438862 0.003
ENSG00000000460.16 1108.138705 -0.078538637 0.167859597 -0.467882913 0.639868323 0.827329552
ENSG00000001036.13 2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10 1325.447272 0.89 0.154875429 -0.423289781 0.672083849 0.0004
ENSG00000001167.14 1829.828657 -0.221749678 0.153100403 -1.448393819 0.147506943 0.386446872
ENSG00000001460.17 641.7582879 -0.252419377 0.183602552 -1.374814095 0.169189087 0.417816879",
header = TRUE)
dat <- mutate(df,threshold = case_when(padj <= 0.05 & log2FoldChange > 0 ~ "up",
padj <= 0.05 & log2FoldChange < 0 ~ "down",
TRUE ~ "NS"))
dat
#> geneID baseMean log2FoldChange lfcSE stat pvalue
#> 1 ENSG00000000003.14 2700.7913 -0.34546678 0.2023895 -1.7069405 0.08783312
#> 2 ENSG00000000419.12 1571.1433 -0.34825874 0.1508075 -2.3092930 0.02092733
#> 3 ENSG00000000457.13 526.2282 -0.05125021 0.1804821 -0.2839628 0.77643886
#> 4 ENSG00000000460.16 1108.1387 -0.07853864 0.1678596 -0.4678829 0.63986832
#> 5 ENSG00000001036.13 2662.1320 0.12141941 0.1752099 0.6929940 0.48831330
#> 6 ENSG00000001084.10 1325.4473 0.89000000 0.1548754 -0.4232898 0.67208385
#> 7 ENSG00000001167.14 1829.8287 -0.22174968 0.1531004 -1.4483938 0.14750694
#> 8 ENSG00000001460.17 641.7583 -0.25241938 0.1836026 -1.3748141 0.16918909
#> padj threshold
#> 1 0.0010000 down
#> 2 0.1204784 NS
#> 3 0.0030000 down
#> 4 0.8273296 NS
#> 5 0.7288428 NS
#> 6 0.0004000 up
#> 7 0.3864469 NS
#> 8 0.4178169 NS
<sup>Created on 2023-03-07 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论