在数据框中添加新列以满足3种条件情况。

huangapple go评论69阅读模式
英文:

adding a new column to a datframe for 3 condition cases

问题

I have a dataframe like this:

geneID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
ENSG00000000003.14	2700.791337	-0.345466785	0.202389477	-1.706940451	0.087833121	0.001
ENSG00000000419.12	1571.143316	-0.348258736	0.150807514	-2.309293001	0.020927328	0.120478416
ENSG00000000457.13	526.2282051	-0.051250213	0.180482116	-0.283962835	0.776438862	0.003
ENSG00000000460.16	1108.138705	-0.078538637	0.167859597	-0.467882913	0.639868323	0.827329552
ENSG00000001036.13	2662.132047	0.121419414	0.175209898	0.692994033	0.488313296	0.728842774
ENSG00000001084.10	1325.447272	0.89	0.154875429	-0.423289781	0.672083849	0.0004
ENSG00000001167.14	1829.828657	-0.221749678	0.153100403	-1.448393819	0.147506943	0.386446872
ENSG00000001460.17	641.7582879	-0.252419377	0.183602552	-1.374814095	0.169189087	0.417816879

I want to add a column named threshold such that if

df$log2FoldChange > 0 & df$padj < 0.05 this should be labeled up
df$log2FoldChange < 0 & df$padj < 0.05 this should be labeled down
and anything else as NS

So for the above table, output should look like this:

geneID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj	threshold
ENSG00000000003.14	2700.791337	-0.345466785	0.202389477	-1.706940451	0.087833121	0.001	down
ENSG00000000419.12	1571.143316	-0.348258736	0.150807514	-2.309293001	0.020927328	0.120478416	NS
ENSG00000000457.13	526.2282051	-0.051250213	0.180482116	-0.283962835	0.776438862	0.003	down
ENSG00000000460.16	1108.138705	-0.078538637	0.167859597	-0.467882913	0.639868323	0.827329552	NS
ENSG00000001036.13	2662.132047	0.121419414	0.175209898	0.692994033	0.488313296	0.728842774	NS
ENSG00000001084.10	1325.447272	0.89	0.154875429	-0.423289781	0.672083849	0.0004	up
ENSG00000001167.14	1829.828657	-0.221749678	0.153100403	-1.448393819	0.147506943	0.386446872	NS
ENSG00000001460.17	641.7582879	-0.252419377	0.183602552	-1.374814095	0.169189087	0.417816879	NS

I tried this but of course it is not doing what I want:

dat <- mutate(dat, threshold = if_else(dat$padj <= 0.05 & dat$log2FoldChange > 0, "up", "NS"))
dat <- mutate(dat, threshold = if_else(dat$padj <= 0.05 & dat$log2FoldChange < 0, "down", "NS"))
英文:

I have a dataframe like this:

geneID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj
ENSG00000000003.14	2700.791337	-0.345466785	0.202389477	-1.706940451	0.087833121	0.001
ENSG00000000419.12	1571.143316	-0.348258736	0.150807514	-2.309293001	0.020927328	0.120478416
ENSG00000000457.13	526.2282051	-0.051250213	0.180482116	-0.283962835	0.776438862	0.003
ENSG00000000460.16	1108.138705	-0.078538637	0.167859597	-0.467882913	0.639868323	0.827329552
ENSG00000001036.13	2662.132047	0.121419414	0.175209898	0.692994033	0.488313296	0.728842774
ENSG00000001084.10	1325.447272	0.89	0.154875429	-0.423289781	0.672083849	0.0004
ENSG00000001167.14	1829.828657	-0.221749678	0.153100403	-1.448393819	0.147506943	0.386446872
ENSG00000001460.17	641.7582879	-0.252419377	0.183602552	-1.374814095	0.169189087	0.417816879

I want to add a column named threshold such that if

df$log2FoldChange &gt; 0 &amp; df$padj &lt; 0.05 this should be labeled up
df$log2FoldChange &lt; 0 &amp; df$padj &lt; 0.05 this should be labeled down
and anything else as NS

So for the above table, output should look like this:

geneID	baseMean	log2FoldChange	lfcSE	stat	pvalue	padj	threshold
ENSG00000000003.14	2700.791337	-0.345466785	0.202389477	-1.706940451	0.087833121	0.001	down
ENSG00000000419.12	1571.143316	-0.348258736	0.150807514	-2.309293001	0.020927328	0.120478416	NS
ENSG00000000457.13	526.2282051	-0.051250213	0.180482116	-0.283962835	0.776438862	0.003	down
ENSG00000000460.16	1108.138705	-0.078538637	0.167859597	-0.467882913	0.639868323	0.827329552	NS
ENSG00000001036.13	2662.132047	0.121419414	0.175209898	0.692994033	0.488313296	0.728842774	NS
ENSG00000001084.10	1325.447272	0.89	0.154875429	-0.423289781	0.672083849	0.0004	up
ENSG00000001167.14	1829.828657	-0.221749678	0.153100403	-1.448393819	0.147506943	0.386446872	NS
ENSG00000001460.17	641.7582879	-0.252419377	0.183602552	-1.374814095	0.169189087	0.417816879	NS

I tried this but of course it is not doing what I want:

dat &lt;- mutate(dat,threshold=if_else(dat$padj &lt;= 0.05 &amp; dat$log2FoldChange &gt; 0,&quot;up&quot;,&quot;NS&quot;))
dat &lt;- mutate(dat,threshold=if_else(dat$padj &lt;= 0.05 &amp; dat$log2FoldChange &lt; 0,&quot;down&quot;,&quot;NS&quot;))

答案1

得分: 2

以下是翻译好的部分:

library(dplyr)
df <- read.table(text = "geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001
ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416
ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003
ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552
ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004
ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872
ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879",
header = TRUE)

dat <- mutate(df, threshold = case_when(padj <= 0.05 & log2FoldChange > 0 ~ "up",
                                       padj <= 0.05 & log2FoldChange < 0 ~ "down",
                                       TRUE ~ "NS"))
dat

<sup>创建于2023年03月07日,使用 reprex v2.0.2</sup>

英文:

One option is to use case_when() from the dplyr package to do both "up" and "down" (or else "NS") in one step, e.g.

library(dplyr)
#&gt; 
#&gt; Attaching package: &#39;dplyr&#39;
#&gt; The following objects are masked from &#39;package:stats&#39;:
#&gt; 
#&gt;     filter, lag
#&gt; The following objects are masked from &#39;package:base&#39;:
#&gt; 
#&gt;     intersect, setdiff, setequal, union

df &lt;- read.table(text = &quot;geneID  baseMean    log2FoldChange  lfcSE   stat    pvalue  padj
ENSG00000000003.14  2700.791337 -0.345466785    0.202389477 -1.706940451    0.087833121 0.001
ENSG00000000419.12  1571.143316 -0.348258736    0.150807514 -2.309293001    0.020927328 0.120478416
ENSG00000000457.13  526.2282051 -0.051250213    0.180482116 -0.283962835    0.776438862 0.003
ENSG00000000460.16  1108.138705 -0.078538637    0.167859597 -0.467882913    0.639868323 0.827329552
ENSG00000001036.13  2662.132047 0.121419414 0.175209898 0.692994033 0.488313296 0.728842774
ENSG00000001084.10  1325.447272 0.89    0.154875429 -0.423289781    0.672083849 0.0004
ENSG00000001167.14  1829.828657 -0.221749678    0.153100403 -1.448393819    0.147506943 0.386446872
ENSG00000001460.17  641.7582879 -0.252419377    0.183602552 -1.374814095    0.169189087 0.417816879&quot;,
header = TRUE)

dat &lt;- mutate(df,threshold = case_when(padj &lt;= 0.05 &amp; log2FoldChange &gt; 0 ~ &quot;up&quot;,
                                       padj &lt;= 0.05 &amp; log2FoldChange &lt; 0 ~ &quot;down&quot;,
                                       TRUE ~ &quot;NS&quot;))
dat
#&gt;               geneID  baseMean log2FoldChange     lfcSE       stat     pvalue
#&gt; 1 ENSG00000000003.14 2700.7913    -0.34546678 0.2023895 -1.7069405 0.08783312
#&gt; 2 ENSG00000000419.12 1571.1433    -0.34825874 0.1508075 -2.3092930 0.02092733
#&gt; 3 ENSG00000000457.13  526.2282    -0.05125021 0.1804821 -0.2839628 0.77643886
#&gt; 4 ENSG00000000460.16 1108.1387    -0.07853864 0.1678596 -0.4678829 0.63986832
#&gt; 5 ENSG00000001036.13 2662.1320     0.12141941 0.1752099  0.6929940 0.48831330
#&gt; 6 ENSG00000001084.10 1325.4473     0.89000000 0.1548754 -0.4232898 0.67208385
#&gt; 7 ENSG00000001167.14 1829.8287    -0.22174968 0.1531004 -1.4483938 0.14750694
#&gt; 8 ENSG00000001460.17  641.7583    -0.25241938 0.1836026 -1.3748141 0.16918909
#&gt;        padj threshold
#&gt; 1 0.0010000      down
#&gt; 2 0.1204784        NS
#&gt; 3 0.0030000      down
#&gt; 4 0.8273296        NS
#&gt; 5 0.7288428        NS
#&gt; 6 0.0004000        up
#&gt; 7 0.3864469        NS
#&gt; 8 0.4178169        NS

<sup>Created on 2023-03-07 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年3月7日 09:03:33
  • 转载请务必保留本文链接:https://go.coder-hub.com/75657192.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定