2023年5月30日 07:02:32go评论121阅读模式

英文:

Multi-group differential gene expression for time-series treatment data

问题

这是一个示例数据集：

df = data.frame(genes = c("A", "B", "C", "D", "E"),
                KO_0min_Rep1 = c(0, 1, 2, 6, 6),
                KO_0min_Rep2 = c(0, 3, 2, 3, 6),
                KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
                KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
                KO_120min_Rep1 = c(0, 1, 1, 6, 5),
                KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
                WT_0min_Rep1 = c(0, 1, 2, 6, 6),
                WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
                WT_60min_Rep1 = c(0, 1, 2, 9, 6),
                WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
                WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
                WT_120min_Rep2 = c(0, 1.2, 2, 6, 2))

数据框具有多列，其中 "genes" 列有超过 9000 个基因，所有其他列都是不同的条件和处理。实验设计如下：我有两种细胞类型：野生型（WT）和基因敲除型（KO）。我对这两种细胞类型进行了 DNA 损伤剂的处理，处理时间分别为 0 分钟、60 分钟和 120 分钟。对于这些条件和处理组合，我还有两个重复。

我想知道在处理后基因发生显著改变的基因，更重要的是在处理时间点之间以及 WT 和 KO 条件之间有何差异。

这是我尝试的方法：

lme4_dat = df %>%
  tidyr::gather(conditions, value, -genes) %>%
  dplyr::mutate( group = case_when(grepl("KO", conditions) ~ "KO",
                                   grepl("WT", conditions) ~ "WT")) %>%
  dplyr::mutate( time = case_when(grepl("0", conditions) ~ "0",
                                  grepl("60", conditions) ~ "60",
                                  grepl("120", conditions) ~ "120" )) %>%
  dplyr::mutate( replicate = case_when(grepl("_Rep1", conditions) ~ "Rep1",
                                       grepl("_Rep2", conditions) ~ "Rep2"))

然后我尝试拟合一个线性混合效应模型：

lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)

显然，这个方法不起作用。我不确定我是否做得正确？还是有更好的替代方法？

任何指导将不胜感激。谢谢。

英文:

This is an example dataset:

df = data.frame(genes = c(&quot;A&quot;, &quot;B&quot;, &quot;C&quot;, &quot;D&quot;, &quot;E&quot;),
                KO_0min_Rep1 = c(0, 1, 2, 6, 6),
                KO_0min_Rep2 = c(0, 3, 2, 3, 6),
                KO_60min_Rep1 = c(0, 0.3, 2, 9.1, 6),
                KO_60min_Rep2 = c(0, 1.3, 2, 6.4, 6),
                KO_120min_Rep1 = c(0, 1, 1, 6, 5),
                KO_120min_Rep2 = c(0, 1, 2.1, 6.8, 5.2),
                WT_0min_Rep1 = c(0, 1, 2, 6, 6),
                WT_0min_Rep2 = c(0, 1, 1.6, 3, 6),
                WT_60min_Rep1 = c(0, 1, 2, 9, 6),
                WT_60min_Rep2 = c(0, 0.3, 2, 6, 2),
                WT_120min_Rep1 = c(0, 1.9, 2, 2, 6),
                WT_120min_Rep2 = c(0, 1.2, 2, 6, 2)  )

The data-frame has several columns, of which the "genes" column has >9000 genes and all other columns are various conditions and treatments. The experimental design is as follows: I have two kinds of cell types: wild type (WT) and knockout (KO). To both of these cell types I treated cells with a DNA damaging agent for 0 minutes, 60 minutes, and 120 minutes. I also have two replicates for these conditions and treatments combinations.

I want to know the genes that are significantly altered after the treatments, but more importantly between the WT and KO conditions over the treatment time points.

This is what I have tried:

lme4_dat = df %&gt;%
  tidyr::gather(conditions, value, -genes) %&gt;%
  dplyr::mutate( group = case_when(grepl(&quot;KO&quot;, conditions) ~ &quot;KO&quot;,
                                   grepl(&quot;WT&quot;, conditions) ~ &quot;WT&quot;)) %&gt;%
  dplyr::mutate( time = case_when(grepl(&quot;UT&quot;, conditions) ~ &quot;0&quot;,
                                  grepl(&quot;60&quot;, conditions) ~ &quot;60&quot;,
                                  grepl(&quot;120&quot;, conditions) ~ &quot;120&quot; )) %&gt;%
  dplyr::mutate( replicate = case_when(grepl(&quot;_Rep1&quot;, conditions) ~ &quot;Rep1&quot;,
                                       grepl(&quot;_Rep2&quot;, conditions) ~ &quot;Rep2&quot;))

Then I try to fit a linear mixed-effects model

lme4_model = lme4::lmer(value ~ conditions * time + (1|genes) + (1|replicate), data = lme4_dat)

It's obviously not working. I am not sure if I am doing it correctly? Or is there a better alternative?

Any guidance will be much appreciated. Thank you.

答案1

得分: 2

I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.

lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit <- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)

Output:

Error: genes
          Df Sum Sq Mean Sq F value Pr(>F)
Residuals  4  310.4   77.59               
Error: genes:time
          Df Sum Sq Mean Sq F value Pr(>F)
time       2  2.617   1.309   0.424  0.668
Residuals  8 24.678   3.085               
Error: Within
           Df Sum Sq Mean Sq F value Pr(>F)
group       1   2.48  2.4807   1.892  0.176
time:group  2   0.21  0.1047   0.080  0.923
Residuals  42  55.07  1.3112

The group (KO/WT) is borderline significant. The time is less so. The interaction isn't significant with a p.value of .923, which you can also tell by looking at parallel lines in the interaction plot. At time 60, the response increased before dropping at 120.

英文:

I would suggest repeated measures ANOVA with one within-groups and one between-groups factor.

lme4_dat$time[is.na(lme4_dat$time)] = 0
lme4_dat$time = as.factor(lme4_dat$time)
fit &lt;- aov(value ~ time*group + Error(genes/time), lme4_dat)
summary(fit)
library(HH)
interaction2wt(value ~ time*group, lme4_dat)
interaction.plot(lme4_dat$time, lme4_dat$group, lme4_dat$value)

Output

Error: genes
          Df Sum Sq Mean Sq F value Pr(&gt;F)
Residuals  4  310.4   77.59               
Error: genes:time
          Df Sum Sq Mean Sq F value Pr(&gt;F)
time       2  2.617   1.309   0.424  0.668
Residuals  8 24.678   3.085               
Error: Within
           Df Sum Sq Mean Sq F value Pr(&gt;F)
group       1   2.48  2.4807   1.892  0.176
time:group  2   0.21  0.1047   0.080  0.923
Residuals  42  55.07  1.3112

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

多组时间序列治疗数据的差异基因表达

问题

答案1

Openxlsx 条件格式化分隔列

在R包工作流中，如何将图像添加到Shiny应用程序中？

state_choropleth不适用于>1个州。

Shiny: 过滤功能不太好用，下载功能也有问题。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。