英文:
Multi-group differential gene expression for time-series treatment data
问题
这是一个示例数据集:
df = data.frame(genes = c("A", "B", "C", "D", "E"),
KO_0min_Rep1 = c(0, 1, 2, 6, 6),
KO_0min_Rep2 = c(0, 3, 2, 3, 6),
KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
KO_120min_Rep1 = c(0, 1, 1, 6, 5),
KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
WT_0min_Rep1 = c(0, 1, 2, 6, 6),
WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
WT_60min_Rep1 = c(0, 1, 2, 9, 6),
WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
WT_120min_Rep2 = c(0, 1.2, 2, 6, 2))
数据框具有多列,其中 "genes" 列有超过 9000 个基因,所有其他列都是不同的条件和处理。实验设计如下:我有两种细胞类型:野生型(WT)和基因敲除型(KO)。我对这两种细胞类型进行了 DNA 损伤剂的处理,处理时间分别为 0 分钟、60 分钟和 120 分钟。对于这些条件和处理组合,我还有两个重复。
我想知道在处理后基因发生显著改变的基因,更重要的是在处理时间点之间以及 WT 和 KO 条件之间有何差异。
这是我尝试的方法:
lme4_dat = df %>%
tidyr::gather(conditions, value, -genes) %>%
dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
grepl("WT", conditions) ~ "WT")) %>%
dplyr::mutate( time = case_when(grepl("0", conditions) ~ "0",
grepl("60", conditions) ~ "60",
grepl("120", conditions) ~ "120" )) %>%
dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
grepl("_Rep2", conditions) ~ "Rep2"))
然后我尝试拟合一个线性混合效应模型:
lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)
显然,这个方法不起作用。我不确定我是否做得正确?还是有更好的替代方法?
任何指导将不胜感激。谢谢。
英文:
This is an example dataset:
df = data.frame(genes = c("A", "B", "C", "D", "E"),
KO_0min_Rep1 = c(0, 1, 2, 6, 6),
KO_0min_Rep2 = c(0, 3, 2, 3, 6),
KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
KO_120min_Rep1 = c(0, 1, 1, 6, 5),
KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
WT_0min_Rep1 = c(0, 1, 2, 6, 6),
WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
WT_60min_Rep1 = c(0, 1, 2, 9, 6),
WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
WT_120min_Rep2 = c(0, 1.2, 2, 6, 2) )
The data-frame has several columns, of which the "genes" column has >9000 genes and all other columns are various conditions and treatments. The experimental design is as follows: I have two kinds of cell types: wild type (WT) and knockout (KO). To both of these cell types I treated cells with a DNA damaging agent for 0 minutes, 60 minutes, and 120 minutes. I also have two replicates for these conditions and treatments combinations.
I want to know the genes that are significantly altered after the treatments, but more importantly between the WT and KO conditions over the treatment time points.
This is what I have tried:
lme4_dat = df %>%
tidyr::gather(conditions, value, -genes) %>%
dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
grepl("WT", conditions) ~ "WT")) %>%
dplyr::mutate( time = case_when(grepl("UT", conditions) ~ "0",
grepl("60", conditions) ~ "60",
grepl("120", conditions) ~ "120" )) %>%
dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
grepl("_Rep2", conditions) ~ "Rep2"))
Then I try to fit a linear mixed-effects model
lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)
It's obviously not working. I am not sure if I am doing it correctly? Or is there a better alternative?
Any guidance will be much appreciated. Thank you.
答案1
得分: 2
I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.
lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit <- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)
Output:
Error: genes
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 310.4 77.59
Error: genes:time
Df Sum Sq Mean Sq F value Pr(>F)
time 2 2.617 1.309 0.424 0.668
Residuals 8 24.678 3.085
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
group 1 2.48 2.4807 1.892 0.176
time:group 2 0.21 0.1047 0.080 0.923
Residuals 42 55.07 1.3112
The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60, the response increased before dropping at 120.
英文:
I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.
lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit <- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)
Output
Error: genes
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 4 310.4 77.59
Error: genes:time
Df Sum Sq Mean Sq F value Pr(>F)
time 2 2.617 1.309 0.424 0.668
Residuals 8 24.678 3.085
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
group 1 2.48 2.4807 1.892 0.176
time:group 2 0.21 0.1047 0.080 0.923
Residuals 42 55.07 1.3112
The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60 the response increased before dropping at 120.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论