多组时间序列治疗数据的差异基因表达

huangapple go评论99阅读模式
英文:

Multi-group differential gene expression for time-series treatment data

问题

这是一个示例数据集:

df = data.frame(genes = c("A", "B", "C", "D", "E"),
                KO_0min_Rep1 = c(0, 1, 2, 6, 6),
                KO_0min_Rep2 = c(0, 3, 2, 3, 6),
                KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
                KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
                KO_120min_Rep1 = c(0, 1, 1, 6, 5),
                KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
                WT_0min_Rep1 = c(0, 1, 2, 6, 6),
                WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
                WT_60min_Rep1 = c(0, 1, 2, 9, 6),
                WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
                WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
                WT_120min_Rep2 = c(0, 1.2, 2, 6, 2))

数据框具有多列,其中 "genes" 列有超过 9000 个基因,所有其他列都是不同的条件和处理。实验设计如下:我有两种细胞类型:野生型(WT)和基因敲除型(KO)。我对这两种细胞类型进行了 DNA 损伤剂的处理,处理时间分别为 0 分钟、60 分钟和 120 分钟。对于这些条件和处理组合,我还有两个重复。

我想知道在处理后基因发生显著改变的基因,更重要的是在处理时间点之间以及 WT 和 KO 条件之间有何差异。

这是我尝试的方法:

lme4_dat = df %>%
  tidyr::gather(conditions, value, -genes) %>%
  dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
                                   grepl("WT", conditions) ~ "WT")) %>%
  dplyr::mutate( time = case_when(grepl("0", conditions) ~ "0",
                                  grepl("60", conditions) ~ "60",
                                  grepl("120", conditions) ~ "120" )) %>%
  dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
                                       grepl("_Rep2", conditions) ~ "Rep2"))

然后我尝试拟合一个线性混合效应模型:

lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)

显然,这个方法不起作用。我不确定我是否做得正确?还是有更好的替代方法?

任何指导将不胜感激。谢谢。

英文:

This is an example dataset:

df = data.frame(genes = c("A", "B", "C", "D", "E"),
                KO_0min_Rep1 = c(0, 1, 2, 6, 6),
                KO_0min_Rep2 = c(0, 3, 2, 3, 6),
                KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
                KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
                KO_120min_Rep1 = c(0, 1, 1, 6, 5),
                KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
                WT_0min_Rep1 = c(0, 1, 2, 6, 6),
                WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
                WT_60min_Rep1 = c(0, 1, 2, 9, 6),
                WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
                WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
                WT_120min_Rep2 = c(0, 1.2, 2, 6, 2)  )

The data-frame has several columns, of which the "genes" column has >9000 genes and all other columns are various conditions and treatments. The experimental design is as follows: I have two kinds of cell types: wild type (WT) and knockout (KO). To both of these cell types I treated cells with a DNA damaging agent for 0 minutes, 60 minutes, and 120 minutes. I also have two replicates for these conditions and treatments combinations.

I want to know the genes that are significantly altered after the treatments, but more importantly between the WT and KO conditions over the treatment time points.

This is what I have tried:

lme4_dat = df %>%
  tidyr::gather(conditions, value, -genes) %>%
  dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
                                   grepl("WT", conditions) ~ "WT")) %>%
  dplyr::mutate( time = case_when(grepl("UT", conditions) ~ "0",
                                  grepl("60", conditions) ~ "60",
                                  grepl("120", conditions) ~ "120" )) %>%
  dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
                                       grepl("_Rep2", conditions) ~ "Rep2"))

Then I try to fit a linear mixed-effects model

lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)

It's obviously not working. I am not sure if I am doing it correctly? Or is there a better alternative?

Any guidance will be much appreciated. Thank you.

答案1

得分: 2

I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.

lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit <- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)

Output:

Error: genes
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  4  310.4   77.59               

Error: genes:time
          Df Sum Sq Mean Sq F value Pr(>F)
time       2  2.617   1.309   0.424  0.668
Residuals  8 24.678   3.085               

Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
group       1   2.48  2.4807   1.892  0.176
time:group  2   0.21  0.1047   0.080  0.923
Residuals  42  55.07  1.3112   

The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60, the response increased before dropping at 120.

英文:

I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.

lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit &lt;- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)

Output

Error: genes
          Df Sum Sq Mean Sq F value Pr(&gt;F)
Residuals  4  310.4   77.59               

Error: genes:time
          Df Sum Sq Mean Sq F value Pr(&gt;F)
time       2  2.617   1.309   0.424  0.668
Residuals  8 24.678   3.085               

Error: Within
           Df Sum Sq Mean Sq F value Pr(&gt;F)
group       1   2.48  2.4807   1.892  0.176
time:group  2   0.21  0.1047   0.080  0.923
Residuals  42  55.07  1.3112   

The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60 the response increased before dropping at 120.

多组时间序列治疗数据的差异基因表达

多组时间序列治疗数据的差异基因表达

huangapple
  • 本文由 发表于 2023年5月30日 07:02:32
  • 转载请务必保留本文链接:https://go.coder-hub.com/76360740.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定