使用combn函数创建用于分组成对、行内比较的特定功能。

huangapple go评论83阅读模式
英文:

Using combn to make specific functions for grouped pair-wise, row-wise comparisons

问题

这是我正在处理的数据集的一个小部分。

我试图编写一个程序,该程序将执行所有成对比较(按'nodepair'分组),跨'ES'组进行比较。

我想编写一系列函数,用于具体比较每一对行。例如,当V1:V9对于两个ES都大于0时,应该得到1,表示存在数据。

我想象中的输出应该是这样的:

 dat3 <- read.table(text = "
	nodepair1 nodepair2  V1  V2  V3  V4  V5  V6  V7  V8  V9    
	A1_A1(45) A1_A1(46)   0     0    1     0     0     0     0     0     1        
  ", header = TRUE)

不幸的是,我没有取得太多进展:

 dat2 <- dat2 %>%
   group_by(nodepair) %>%
   col2 = t(combn(nodepair,2)))

我相当确定我需要在这里使用'combn',但我对这个函数非常陌生,无法弄清楚它。

英文:

This is a small section of a dataset I'm working on.

dat2 &lt;- read.table(text = &quot;
   nodepair  V1  V2  V3  V4  V5  V6  V7  V8  V9 ES   
 1 A1_A1        0    21     0     0     0     0     0     0    78 45   
 2 A2_A1        0     0     0     0     0     0     0     0    99 45   
 3 A2_A2        0     1     0     0     0     0     0     0    98 45   
 4 A3_A1        0     0     0     0     0     6     1     3    89 45   
 5 A3_A2        0     0     0     0     0     0     0     0    99 45   
 6 A1_A1        0    20     0     0     0     0     0     0    65 46   
 7 A2_A1        0     0     0     0     0     0     0     0    85 46   
 8 A2_A2        0     1     0     0     0     0     0     0    84 46   
 9 A3_A1        0     0     0     0     2     6     3     3    71 46   
 10 A3_A2        0     0     0     0     0     0     0     0    85 46   
 11 A1_A1        0    25     0     0     0     0     0     0    45 47   
 12 A2_A1        0     0     0     0     0     0     0     0    70 47   
 13 A2_A2        0     1     0     0     0     0     0     0    69 47   
 14 A3_A1        0     0     0     0     0     8     0     1    61 47   
 15 A3_A2        0     0     0     0     0     0     0     0    70 47   
 16 A1_A1        0    37     0     0     0     0     0     0    77 48   
 17 A2_A1        0     0     0     0     0     0     0     0   114 48   
 18 A2_A2        0     0     0     0     0     0     0     0   114 48   
 19 A3_A1        0     0     0     0     2     9     0     3   100 48   
 20 A3_A2        0     0     0     0     0     0     0     0   114 48   
 &quot;, header = TRUE)

I'm trying to write a program that will do all pairwise comparisons (grouped by the nodepair) across the 'ES' groups.

I'd like to write a series of functions to specifically compare each pair of rows. For example, when V1:V9 is > 0 for both ESs, this should result in 1, indicating presence of data.

I'm imagining the output to look something like this:

 dat3 &lt;- read.table(text = &quot;
	nodepair1 nodepair2  V1  V2  V3  V4  V5  V6  V7  V8  V9    
	A1_A1(45) A1_A1(46)   0     0    1     0     0     0     0     0     1        
  &quot;, header = TRUE)

etc.

Unfortunately, I haven't gotten very far:

 dat2 &lt;- dat2 %&gt;%
   group_by(nodepair) %&gt;%
   col2 = t(combn(nodepair,2)))

I'm pretty sure I need 'combn' here, but I'm very new to this function and can't figure it out.

答案1

得分: 1

现在,TO已经澄清了他们的问题,我提出以下解决方案:

library(tidyverse)

ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)

dat2 %>%
  group_split(nodepair) %>%
  map(.x = _,
      .f = \(df) df %>%
        map(.x = 1:length(ES_combs),
            .f = ~df %>%
               filter(ES %in% ES_combs[[.x]]) %>%
               summarize(nodepair = first(nodepair),
                         ES_1 = ES[1],
                         ES_2 = ES[2],
                         across(V1:V9, ~as.numeric(all(. > 0)))))) %>%
  bind_rows()

这将产生以下结果:

# A tibble: 30 × 12
   nodepair ES_1  ES_2    V1    V2    V3    V4    V5    V6    V7    V8    V9
   <chr>    <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
 1 A1_A1      45    46     0     1     0     0     0     0     0     0     1
 2 A1_A1      45    47     0     1     0     0     0     0     0     0     1
 3 A1_A1      45    48     0     1     0     0     0     0     0     0     1
 4 A1_A1      46    47     0     1     0     0     0     0     0     0     1
 5 A1_A1      46    48     0     1     0     0     0     0     0     0     1
 6 A1_A1      47    48     0     1     0     0     0     0     0     0     1
 7 A2_A1      45    46     0     0     0     0     0     0     0     0     1
 8 A2_A1      45    47     0     0     0     0     0     0     0     0     1
 9 A2_A1      45    48     0     0     0     0     0     0     0     0     1
10 A2_A1      46    47     0     0     0     0     0     0     0     0     1
# ... with 20 more rows

这可能需要一些解释:

  • 我们首先创建了数据帧中所有ES的成对组合,并将其分配给一个名为ES_combs的列表对象。
  • 然后,我们将您的数据按nodepair组拆分为一个列表,其中每个列表对象都是一个nodepair组的数据。
  • 接下来,我们启动外部map,遍历每个组的数据帧。在这里,定义匿名函数很重要,因为我们有一个内部map,所以不能两次使用.x参数。
  • 内部map获取来自ES_combs的每对组合,并将当前组的数据筛选为这两行。然后我们应用summarize部分。
  • 作为最后一步,我们使用bind_rows将所有内容合并到一个漂亮的tibble中,而不是拥有一个令人讨厌的长列表。
英文:

Now with the TO having clarified their question, I'd propose the following solution:

library(tidyverse)

ES_combs &lt;- combn(unique(dat2$ES), 2, simplify = FALSE)

dat2 |&gt; 
  group_split(nodepair) |&gt; 
  map(.x = _,
      .f = \(df) df |&gt; 
        map(.x = 1:length(ES_combs),
            .f = ~df |&gt; 
               filter(ES %in% ES_combs[[.x]]) |&gt; 
               summarize(nodepair = first(nodepair),
                         ES_1 = ES[1],
                         ES_2 = ES[2],
                         across(V1:V9, ~as.numeric(all(. &gt;0)))))) |&gt; 
  bind_rows()

which gives:

# A tibble: 30 &#215; 12
   nodepair  ES_1  ES_2    V1    V2    V3    V4    V5    V6    V7    V8    V9
   &lt;chr&gt;    &lt;int&gt; &lt;int&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;
 1 A1_A1       45    46     0     1     0     0     0     0     0     0     1
 2 A1_A1       45    47     0     1     0     0     0     0     0     0     1
 3 A1_A1       45    48     0     1     0     0     0     0     0     0     1
 4 A1_A1       46    47     0     1     0     0     0     0     0     0     1
 5 A1_A1       46    48     0     1     0     0     0     0     0     0     1
 6 A1_A1       47    48     0     1     0     0     0     0     0     0     1
 7 A2_A1       45    46     0     0     0     0     0     0     0     0     1
 8 A2_A1       45    47     0     0     0     0     0     0     0     0     1
 9 A2_A1       45    48     0     0     0     0     0     0     0     0     1
10 A2_A1       46    47     0     0     0     0     0     0     0     0     1
11 A2_A1       46    48     0     0     0     0     0     0     0     0     1
12 A2_A1       47    48     0     0     0     0     0     0     0     0     1
13 A2_A2       45    46     0     1     0     0     0     0     0     0     1
14 A2_A2       45    47     0     1     0     0     0     0     0     0     1
15 A2_A2       45    48     0     0     0     0     0     0     0     0     1
16 A2_A2       46    47     0     1     0     0     0     0     0     0     1
17 A2_A2       46    48     0     0     0     0     0     0     0     0     1
18 A2_A2       47    48     0     0     0     0     0     0     0     0     1
19 A3_A1       45    46     0     0     0     0     0     1     1     1     1
20 A3_A1       45    47     0     0     0     0     0     1     0     1     1
21 A3_A1       45    48     0     0     0     0     0     1     0     1     1
22 A3_A1       46    47     0     0     0     0     0     1     0     1     1
23 A3_A1       46    48     0     0     0     0     1     1     0     1     1
24 A3_A1       47    48     0     0     0     0     0     1     0     1     1
25 A3_A2       45    46     0     0     0     0     0     0     0     0     1
26 A3_A2       45    47     0     0     0     0     0     0     0     0     1
27 A3_A2       45    48     0     0     0     0     0     0     0     0     1
28 A3_A2       46    47     0     0     0     0     0     0     0     0     1
29 A3_A2       46    48     0     0     0     0     0     0     0     0     1
30 A3_A2       47    48     0     0     0     0     0     0     0     0     1

This probably needs a bit of explanation:

  • We start with creating all pairwise combinations of ES in your data frame and assign it to a list object ES_combs
  • We then take your data and split it by nodepair group into a list, where each list object is the data for one nodepair group.
  • We then initiate the outer map where we go through each group's data frame. It is important here to define an anonymous function, because we have an inner map, so we can't use the .x parameter twice.
  • The inner map takes each combination pair from ES_combs and filters the current group's data to these two rows. We then apply the summarize part.
  • As a last step, we use bind_rows to merge everything into a nice tibble instead of having an annoyingly long list.

答案2

得分: 1

以下是我会如何执行它的方式:

library(tidyverse)

# 创建一个函数,用于比较一组行中的每一对
comparer <- function(comb) {
  bind_cols(
    nodepair1 = paste0(dat2[comb[1], 1], "(", dat2[comb[1], 11], ")"),
    nodepair2 = paste0(dat2[comb[2], 1], "(", dat2[comb[2], 11], ")"),
    dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
  )
}

combs <- combn(1:nrow(dat2), 2, simplify = FALSE)

# 然后将其应用于数据集的每一对行的组合
map_df(combs, comparer)

# 输出:
# A tibble: 190 × 11
   nodepair1 nodepair2 V1    V2    V3    V4    V5    V6    V7    V8    V9   
   <chr>     <chr>     <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
 1 A1_A1(45) A2_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 2 A1_A1(45) A2_A2(45) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 3 A1_A1(45) A3_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 4 A1_A1(45) A3_A2(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 5 A1_A1(45) A1_A1(46) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 6 A1_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 7 A1_A1(45) A2_A2(46) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 8 A1_A1(45) A3_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
 9 A1_A1(45) A3_A2(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
10 A1_A1(45) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
# ℹ 180 more rows

更新:如果您只想比较具有相同 'nodepair' 值的数据,您可以这样做:

processor <- function(x) {
  map_df(combn(x, 2, simplify = FALSE), comparer)
}

dat2 %>%
  mutate(n = row_number()) %>%
  summarise(comparisons = map(list(n), processor), .by = nodepair) %>%
  unnest(comparisons)

# 输出:
# A tibble: 30 × 12
   nodepair nodepair1 nodepair2 V1    V2    V3    V4    V5   
   <chr>    <chr>     <chr>     <lgl> <lgl> <lgl> <lgl> <lgl>
 1 A1_A1    A1_A1(45) A1_A1(46) FALSE TRUE  FALSE FALSE FALSE
 2 A1_A1    A1_A1(45) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE
 3 A1_A1    A1_A1(45) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
 4 A1_A1    A1_A1(46) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE
 5 A1_A1    A1_A1(46) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
 6 A1_A1    A1_A1(47) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
 7 A2_A1    A2_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE
 8 A2_A1    A2_A1(45) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
 9 A2_A1    A2_A1(45) A2_A1(48) FALSE FALSE FALSE FALSE FALSE
10 A2_A1    A2_A1(46) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
# ℹ 20 more rows

注意:在所有这些情况下,都使用了TRUE和FALSE。TRUE和FALSE等价于1和0 - 实际上,您可以通过执行TRUE + TRUE来看到,结果为2。我可以将 dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0 包装在 as.integer 中,但我想您会理解。

英文:

Here is how I would do it:

library(tidyverse)
# create a function which compares one combination of rows with each other
comparer &lt;- function(comb) {
bind_cols(
nodepair1 = paste0(dat2[comb[1], 1], &quot;(&quot;, dat2[comb[1], 11], &quot;)&quot;),
nodepair2 = paste0(dat2[comb[2], 1], &quot;(&quot;, dat2[comb[2], 11], &quot;)&quot;),
dat2[comb[1], 2:10] &gt; 0 &amp; dat2[comb[2], 2:10] &gt; 0
)
}
combs &lt;- combn(1:nrow(dat2), 2, simplify = FALSE)
# then apply that to each combination of rows for the dataset
map_df(combs, comparer)
# Output:
# A tibble: 190 &#215; 11
nodepair1 nodepair2 V1    V2    V3    V4    V5    V6    V7    V8    V9   
&lt;chr&gt;     &lt;chr&gt;     &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt;
1 A1_A1(45) A2_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
2 A1_A1(45) A2_A2(45) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
3 A1_A1(45) A3_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
4 A1_A1(45) A3_A2(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
5 A1_A1(45) A1_A1(46) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
6 A1_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
7 A1_A1(45) A2_A2(46) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
8 A1_A1(45) A3_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
9 A1_A1(45) A3_A2(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
10 A1_A1(45) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE FALSE FALSE FALSE TRUE 
# ℹ 180 more rows

Update: If you only want to compare ones with the same 'nodepair' value, you can do this:

processor &lt;- function(x) {
map_df(combn(x, 2, simplify = FALSE), comparer)
}
dat2 |&gt;
mutate(n = row_number()) |&gt;
summarise(comparisons = map(list(n), processor), .by = nodepair) |&gt;
unnest(comparisons)
# Output:
# A tibble: 30 &#215; 12
nodepair nodepair1 nodepair2 V1    V2    V3    V4    V5   
&lt;chr&gt;    &lt;chr&gt;     &lt;chr&gt;     &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt; &lt;lgl&gt;
1 A1_A1    A1_A1(45) A1_A1(46) FALSE TRUE  FALSE FALSE FALSE
2 A1_A1    A1_A1(45) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE
3 A1_A1    A1_A1(45) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
4 A1_A1    A1_A1(46) A1_A1(47) FALSE TRUE  FALSE FALSE FALSE
5 A1_A1    A1_A1(46) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
6 A1_A1    A1_A1(47) A1_A1(48) FALSE TRUE  FALSE FALSE FALSE
7 A2_A1    A2_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE
8 A2_A1    A2_A1(45) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
9 A2_A1    A2_A1(45) A2_A1(48) FALSE FALSE FALSE FALSE FALSE
10 A2_A1    A2_A1(46) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
# ℹ 20 more rows

Note: in all of these, it's using TRUE and FALSE. TRUE and FALSE are equivalent to 1 and 0 - indeed, you can see by doing TRUE + TRUE, which equals 2. I could wrap the dat2[comb[1], 2:10] &gt; 0 &amp; dat2[comb[2], 2:10] &gt; 0 in as.integer, but I figured you would understand.

huangapple
  • 本文由 发表于 2023年8月5日 13:35:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76840286.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定