fisher exact test for 2 consecutive rows in data frame R

huangapple go评论63阅读模式
英文:

fisher exact test for 2 consecutive rows in data frame R

问题

For the given data frame, you want to perform Fisher exact tests for each position defined by the chromosome, start, and end columns, using the count_unmethylated and count_methylated data for both "tumor" and "normal" groups. Here's the translation of your request:

对于给定的数据框,您想要针对由chromosomestartend列定义的每个位置,使用"tumor"和"normal"组的count_unmethylatedcount_methylated数据执行Fisher精确度测试。以下是您的要求的翻译:


我有一个数据框,在其中对于一个站点,我有tumornormal计数数据。我想要使用每个位置的tumornormalcount_unmethylatedcount_methylated来进行Fisher精确度测试,位置由chromosome start end定义。

因此,对于第一个位置:

chromosome start   end
1          10469   10469

我希望按以下方式进行Fisher精确度测试:

              count_unmethylated  count_methylated
  norm         0      2
  tum          1      3

并对其余的chromosome start end位置执行相同操作。

我尝试了来自先前代码的解决方案,但进行了修改,但没有成功:

head(tumNorm_dt_merged_long) %>%
  group_by(chromosome, start, end) %>%
  summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(), starts_with('count_'))), p.adjust.method = "BH")), ncol=2)) %>%
  unnest_wider(data) %>%
  unnest(c(group:p.adj.signif)) -> Fisher_result

我的数据如下:

 dput(head(tumNorm_dt_merged_long))
structure(list(chromosome = c("1", "1", "1", "1", "1", "1"), 
    start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
    end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
    group = c("norm", "tum", "norm", "tum", "norm", "tum"), count_methylated = c(2, 
    3, 3, 2, 1, 2), count_unmethylated = c(0, 1, 0, 0, 1, 2), 
    methylation_percentage = c(100, 75, 100, 100, 50, 50)), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"), .internal.selfref = <pointer: 0x130baa0>, sorted = c("chromosome", 
"start", "end", "group"))

英文:

i have data frame where for a 1 site i have tumor and normal count data. I want to do fisher exact test using the count_unmethylated and count_methylated for tumor and normal for each position chromosome start end.

so for the first position;

chromosome start   end
1          10469   10469

i want to conduct fisher extact test this way

              count_unmethylated  count_methylated
  norm         0      2
  tum          1      3

and do it for the rest of loci chromosome start end

i tried solution from previous code with modification but didn't work:
https://stackoverflow.com/questions/66216780/row-wise-fisher-exact-test-grouped-by-samples-in-r

head(tumNorm_dt_merged_long) %&gt;%
  group_by(chromosome,    start,      end) %&gt;% 
  summarise(data = list(row_wise_fisher_test(as.matrix(select(cur_data(), 
                        starts_with(&#39;count_&#39;))), p.adjust.method = &quot;BH&quot;), ncol=2)) %&gt;%
  unnest_wider(data) %&gt;%
  unnest(c(group:p.adj.signif)) -&gt; Fisher_result

my data looks like this

 dput(head(tumNorm_dt_merged_long))
structure(list(chromosome = c(&quot;1&quot;, &quot;1&quot;, &quot;1&quot;, &quot;1&quot;, &quot;1&quot;, &quot;1&quot;), 
    start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
    end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
    group = c(&quot;norm&quot;, &quot;tum&quot;, &quot;norm&quot;, &quot;tum&quot;, &quot;norm&quot;, &quot;tum&quot;), count_methylated = c(2, 
    3, 3, 2, 1, 2), count_unmethylated = c(0, 1, 0, 0, 1, 2), 
    methylation_percentage = c(100, 75, 100, 100, 50, 50)), row.names = c(NA, 
-6L), class = c(&quot;data.table&quot;, &quot;data.frame&quot;), .internal.selfref = &lt;pointer: 0x130baa0&gt;, sorted = c(&quot;chromosome&quot;, 
&quot;start&quot;, &quot;end&quot;, &quot;group&quot;))

答案1

得分: 1

这是使用基本的R解决方案。根据起始列拆分数据框,假设每个唯一的起始值只有2行。然后使用lapply循环计算第5和第6列的Fisher's测试。

tumNorm_dt_merged_long <- structure(list(chromosome = c("1", "1", "1", "1", "1", "1"), 
               start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
               end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
               group = c("norm", "tum", "norm", "tum", "norm", "tum"), 
               count_methylated = c(2, 3, 3, 2, 1, 2), 
               count_unmethylated = c(0, 1, 0, 0, 1, 2), 
               methylation_percentage = c(100, 75, 100, 100, 50, 50)), 
          row.names = c(NA, -6L), class = c("data.table", "data.frame"), sorted = c("chromosome", "start", "end", "group"))

dflist <- split(tumNorm_dt_merged_long, tumNorm_dt_merged_long$start)

output <- lapply(dflist, function(x){
   print(x)
   results <- fisher.test(x[, c(5,6)])
   print(results)
   results
})

希望这对你有帮助!

英文:

Here is a solution using base R. Split the data frame based on the start column, assumes just 2 rows per unique start value. The use the lapply loop to calculate the Fisher's test on columns 5 & 6.

tumNorm_dt_merged_long &lt;- structure(list(chromosome = c(&quot;1&quot;, &quot;1&quot;, &quot;1&quot;, &quot;1&quot;, &quot;1&quot;, &quot;1&quot;), 
               start = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
               end = c(10469L, 10469L, 10470L, 10470L, 10471L, 10471L), 
               group = c(&quot;norm&quot;, &quot;tum&quot;, &quot;norm&quot;, &quot;tum&quot;, &quot;norm&quot;, &quot;tum&quot;), 
               count_methylated = c(2, 3, 3, 2, 1, 2), 
               count_unmethylated = c(0, 1, 0, 0, 1, 2), 
               methylation_percentage = c(100, 75, 100, 100, 50, 50)), 
          row.names = c(NA, -6L), class = c(&quot;data.table&quot;, &quot;data.frame&quot;), sorted = c(&quot;chromosome&quot;, &quot;start&quot;, &quot;end&quot;, &quot;group&quot;))

dflist &lt;- split(tumNorm_dt_merged_long, tumNorm_dt_merged_long$start)

output &lt;-lapply(dflist, function(x){
   print(x)
   results &lt;- fisher.test(x[ , c(5,6)])
   print(results)
   results
})

huangapple
  • 本文由 发表于 2023年6月8日 01:24:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76425730.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定