英文:
fisher exact test for 2 consecutive rows in data frame R
问题
For the given data frame, you want to perform Fisher exact tests for each position defined by the chromosome
, start
, and end
columns, using the count_unmethylated
and count_methylated
data for both "tumor" and "normal" groups. Here's the translation of your request:
对于给定的数据框,您想要针对由chromosome
、start
和end
列定义的每个位置,使用"tumor"和"normal"组的count_unmethylated
和count_methylated
数据执行Fisher精确度测试。以下是您的要求的翻译:
我有一个数据框,在其中对于一个站点,我有tumor
和normal
计数数据。我想要使用每个位置的tumor
和normal
的count_unmethylated
和count_methylated
来进行Fisher精确度测试,位置由chromosome start end
定义。
因此,对于第一个位置:
chromosome start end
1 10469 10469
我希望按以下方式进行Fisher精确度测试:
count_unmethylated count_methylated
norm 0 2
tum 1 3
并对其余的chromosome start end
位置执行相同操作。
我尝试了来自先前代码的解决方案,但进行了修改,但没有成功:
head(tumNorm_dt_merged_long) %>%
group_by(chromosome, start, end) %>%
summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(), starts_with('count_'))), p.adjust.method = "BH")), ncol=2)) %>%
unnest_wider(data) %>%
unnest(c(group:p.adj.signif)) -> Fisher_result
我的数据如下:
dput(head(tumNorm_dt_merged_long))
structure(list(chromosome = c("1", "1", "1", "1", "1", "1"),
start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
group = c("norm", "tum", "norm", "tum", "norm", "tum"), count_methylated = c(2,
3, 3, 2, 1, 2), count_unmethylated = c(0, 1, 0, 0, 1, 2),
methylation_percentage = c(100, 75, 100, 100, 50, 50)), row.names = c(NA,
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x130baa0>, sorted = c("chromosome",
"start", "end", "group"))
英文:
i have data frame where for a 1 site i have tumor
and normal
count data. I want to do fisher exact test using the count_unmethylated
and count_methylated
for tumor and normal for each position chromosome start end
.
so for the first position;
chromosome start end
1 10469 10469
i want to conduct fisher extact test this way
count_unmethylated count_methylated
norm 0 2
tum 1 3
and do it for the rest of loci chromosome start end
i tried solution from previous code with modification but didn't work:
https://stackoverflow.com/questions/66216780/row-wise-fisher-exact-test-grouped-by-samples-in-r
head(tumNorm_dt_merged_long) %>%
group_by(chromosome, start, end) %>%
summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(),
starts_with('count_'))), p.adjust.method = "BH"), ncol=2)) %>%
unnest_wider(data) %>%
unnest(c(group:p.adj.signif)) -> Fisher_result
my data looks like this
dput(head(tumNorm_dt_merged_long))
structure(list(chromosome = c("1", "1", "1", "1", "1", "1"),
start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
group = c("norm", "tum", "norm", "tum", "norm", "tum"), count_methylated = c(2,
3, 3, 2, 1, 2), count_unmethylated = c(0, 1, 0, 0, 1, 2),
methylation_percentage = c(100, 75, 100, 100, 50, 50)), row.names = c(NA,
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x130baa0>, sorted = c("chromosome",
"start", "end", "group"))
答案1
得分: 1
这是使用基本的R解决方案。根据起始列拆分数据框,假设每个唯一的起始值只有2行。然后使用lapply循环计算第5和第6列的Fisher's测试。
tumNorm_dt_merged_long <- structure(list(chromosome = c("1", "1", "1", "1", "1", "1"),
start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
group = c("norm", "tum", "norm", "tum", "norm", "tum"),
count_methylated = c(2, 3, 3, 2, 1, 2),
count_unmethylated = c(0, 1, 0, 0, 1, 2),
methylation_percentage = c(100, 75, 100, 100, 50, 50)),
row.names = c(NA, -6L), class = c("data.table", "data.frame"), sorted = c("chromosome", "start", "end", "group"))
dflist <- split(tumNorm_dt_merged_long, tumNorm_dt_merged_long$start)
output <- lapply(dflist, function(x){
print(x)
results <- fisher.test(x[, c(5,6)])
print(results)
results
})
希望这对你有帮助!
英文:
Here is a solution using base R. Split the data frame based on the start column, assumes just 2 rows per unique start value. The use the lapply loop to calculate the Fisher's test on columns 5 & 6.
tumNorm_dt_merged_long <- structure(list(chromosome = c("1", "1", "1", "1", "1", "1"),
start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L),
group = c("norm", "tum", "norm", "tum", "norm", "tum"),
count_methylated = c(2, 3, 3, 2, 1, 2),
count_unmethylated = c(0, 1, 0, 0, 1, 2),
methylation_percentage = c(100, 75, 100, 100, 50, 50)),
row.names = c(NA, -6L), class = c("data.table", "data.frame"), sorted = c("chromosome", "start", "end", "group"))
dflist <- split(tumNorm_dt_merged_long, tumNorm_dt_merged_long$start)
output <-lapply(dflist, function(x){
print(x)
results <- fisher.test(x[ , c(5,6)])
print(results)
results
})
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论