英文:
Using combn to make specific functions for grouped pair-wise, row-wise comparisons
问题
这是我正在处理的数据集的一个小部分。
我试图编写一个程序,该程序将执行所有成对比较(按'nodepair'分组),跨'ES'组进行比较。
我想编写一系列函数,用于具体比较每一对行。例如,当V1:V9对于两个ES都大于0时,应该得到1,表示存在数据。
我想象中的输出应该是这样的:
dat3 <- read.table(text = "
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
A1_A1(45) A1_A1(46) 0 0 1 0 0 0 0 0 1
", header = TRUE)
不幸的是,我没有取得太多进展:
dat2 <- dat2 %>%
group_by(nodepair) %>%
col2 = t(combn(nodepair,2)))
我相当确定我需要在这里使用'combn',但我对这个函数非常陌生,无法弄清楚它。
英文:
This is a small section of a dataset I'm working on.
dat2 <- read.table(text = "
nodepair V1 V2 V3 V4 V5 V6 V7 V8 V9 ES
1 A1_A1 0 21 0 0 0 0 0 0 78 45
2 A2_A1 0 0 0 0 0 0 0 0 99 45
3 A2_A2 0 1 0 0 0 0 0 0 98 45
4 A3_A1 0 0 0 0 0 6 1 3 89 45
5 A3_A2 0 0 0 0 0 0 0 0 99 45
6 A1_A1 0 20 0 0 0 0 0 0 65 46
7 A2_A1 0 0 0 0 0 0 0 0 85 46
8 A2_A2 0 1 0 0 0 0 0 0 84 46
9 A3_A1 0 0 0 0 2 6 3 3 71 46
10 A3_A2 0 0 0 0 0 0 0 0 85 46
11 A1_A1 0 25 0 0 0 0 0 0 45 47
12 A2_A1 0 0 0 0 0 0 0 0 70 47
13 A2_A2 0 1 0 0 0 0 0 0 69 47
14 A3_A1 0 0 0 0 0 8 0 1 61 47
15 A3_A2 0 0 0 0 0 0 0 0 70 47
16 A1_A1 0 37 0 0 0 0 0 0 77 48
17 A2_A1 0 0 0 0 0 0 0 0 114 48
18 A2_A2 0 0 0 0 0 0 0 0 114 48
19 A3_A1 0 0 0 0 2 9 0 3 100 48
20 A3_A2 0 0 0 0 0 0 0 0 114 48
", header = TRUE)
I'm trying to write a program that will do all pairwise comparisons (grouped by the nodepair) across the 'ES' groups.
I'd like to write a series of functions to specifically compare each pair of rows. For example, when V1:V9 is > 0 for both ESs, this should result in 1, indicating presence of data.
I'm imagining the output to look something like this:
dat3 <- read.table(text = "
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
A1_A1(45) A1_A1(46) 0 0 1 0 0 0 0 0 1
", header = TRUE)
etc.
Unfortunately, I haven't gotten very far:
dat2 <- dat2 %>%
group_by(nodepair) %>%
col2 = t(combn(nodepair,2)))
I'm pretty sure I need 'combn' here, but I'm very new to this function and can't figure it out.
答案1
得分: 1
现在,TO已经澄清了他们的问题,我提出以下解决方案:
library(tidyverse)
ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)
dat2 %>%
group_split(nodepair) %>%
map(.x = _,
.f = \(df) df %>%
map(.x = 1:length(ES_combs),
.f = ~df %>%
filter(ES %in% ES_combs[[.x]]) %>%
summarize(nodepair = first(nodepair),
ES_1 = ES[1],
ES_2 = ES[2],
across(V1:V9, ~as.numeric(all(. > 0)))))) %>%
bind_rows()
这将产生以下结果:
# A tibble: 30 × 12
nodepair ES_1 ES_2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1_A1 45 46 0 1 0 0 0 0 0 0 1
2 A1_A1 45 47 0 1 0 0 0 0 0 0 1
3 A1_A1 45 48 0 1 0 0 0 0 0 0 1
4 A1_A1 46 47 0 1 0 0 0 0 0 0 1
5 A1_A1 46 48 0 1 0 0 0 0 0 0 1
6 A1_A1 47 48 0 1 0 0 0 0 0 0 1
7 A2_A1 45 46 0 0 0 0 0 0 0 0 1
8 A2_A1 45 47 0 0 0 0 0 0 0 0 1
9 A2_A1 45 48 0 0 0 0 0 0 0 0 1
10 A2_A1 46 47 0 0 0 0 0 0 0 0 1
# ... with 20 more rows
这可能需要一些解释:
- 我们首先创建了数据帧中所有ES的成对组合,并将其分配给一个名为
ES_combs的列表对象。 - 然后,我们将您的数据按nodepair组拆分为一个列表,其中每个列表对象都是一个nodepair组的数据。
- 接下来,我们启动外部
map,遍历每个组的数据帧。在这里,定义匿名函数很重要,因为我们有一个内部map,所以不能两次使用.x参数。 - 内部
map获取来自ES_combs的每对组合,并将当前组的数据筛选为这两行。然后我们应用summarize部分。 - 作为最后一步,我们使用
bind_rows将所有内容合并到一个漂亮的tibble中,而不是拥有一个令人讨厌的长列表。
英文:
Now with the TO having clarified their question, I'd propose the following solution:
library(tidyverse)
ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)
dat2 |>
group_split(nodepair) |>
map(.x = _,
.f = \(df) df |>
map(.x = 1:length(ES_combs),
.f = ~df |>
filter(ES %in% ES_combs[[.x]]) |>
summarize(nodepair = first(nodepair),
ES_1 = ES[1],
ES_2 = ES[2],
across(V1:V9, ~as.numeric(all(. >0)))))) |>
bind_rows()
which gives:
# A tibble: 30 × 12
nodepair ES_1 ES_2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1_A1 45 46 0 1 0 0 0 0 0 0 1
2 A1_A1 45 47 0 1 0 0 0 0 0 0 1
3 A1_A1 45 48 0 1 0 0 0 0 0 0 1
4 A1_A1 46 47 0 1 0 0 0 0 0 0 1
5 A1_A1 46 48 0 1 0 0 0 0 0 0 1
6 A1_A1 47 48 0 1 0 0 0 0 0 0 1
7 A2_A1 45 46 0 0 0 0 0 0 0 0 1
8 A2_A1 45 47 0 0 0 0 0 0 0 0 1
9 A2_A1 45 48 0 0 0 0 0 0 0 0 1
10 A2_A1 46 47 0 0 0 0 0 0 0 0 1
11 A2_A1 46 48 0 0 0 0 0 0 0 0 1
12 A2_A1 47 48 0 0 0 0 0 0 0 0 1
13 A2_A2 45 46 0 1 0 0 0 0 0 0 1
14 A2_A2 45 47 0 1 0 0 0 0 0 0 1
15 A2_A2 45 48 0 0 0 0 0 0 0 0 1
16 A2_A2 46 47 0 1 0 0 0 0 0 0 1
17 A2_A2 46 48 0 0 0 0 0 0 0 0 1
18 A2_A2 47 48 0 0 0 0 0 0 0 0 1
19 A3_A1 45 46 0 0 0 0 0 1 1 1 1
20 A3_A1 45 47 0 0 0 0 0 1 0 1 1
21 A3_A1 45 48 0 0 0 0 0 1 0 1 1
22 A3_A1 46 47 0 0 0 0 0 1 0 1 1
23 A3_A1 46 48 0 0 0 0 1 1 0 1 1
24 A3_A1 47 48 0 0 0 0 0 1 0 1 1
25 A3_A2 45 46 0 0 0 0 0 0 0 0 1
26 A3_A2 45 47 0 0 0 0 0 0 0 0 1
27 A3_A2 45 48 0 0 0 0 0 0 0 0 1
28 A3_A2 46 47 0 0 0 0 0 0 0 0 1
29 A3_A2 46 48 0 0 0 0 0 0 0 0 1
30 A3_A2 47 48 0 0 0 0 0 0 0 0 1
This probably needs a bit of explanation:
- We start with creating all pairwise combinations of ES in your data frame and assign it to a list object
ES_combs - We then take your data and split it by nodepair group into a list, where each list object is the data for one nodepair group.
- We then initiate the outer
mapwhere we go through each group's data frame. It is important here to define an anonymous function, because we have an innermap, so we can't use the .x parameter twice. - The inner
maptakes each combination pair fromES_combsand filters the current group's data to these two rows. We then apply the summarize part. - As a last step, we use
bind_rowsto merge everything into a nice tibble instead of having an annoyingly long list.
答案2
得分: 1
以下是我会如何执行它的方式:
library(tidyverse)
# 创建一个函数,用于比较一组行中的每一对
comparer <- function(comb) {
bind_cols(
nodepair1 = paste0(dat2[comb[1], 1], "(", dat2[comb[1], 11], ")"),
nodepair2 = paste0(dat2[comb[2], 1], "(", dat2[comb[2], 11], ")"),
dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
)
}
combs <- combn(1:nrow(dat2), 2, simplify = FALSE)
# 然后将其应用于数据集的每一对行的组合
map_df(combs, comparer)
# 输出:
# A tibble: 190 × 11
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1(45) A2_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
2 A1_A1(45) A2_A2(45) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
3 A1_A1(45) A3_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
4 A1_A1(45) A3_A2(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
5 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
6 A1_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
7 A1_A1(45) A2_A2(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
8 A1_A1(45) A3_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
9 A1_A1(45) A3_A2(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
10 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# ℹ 180 more rows
更新:如果您只想比较具有相同 'nodepair' 值的数据,您可以这样做:
processor <- function(x) {
map_df(combn(x, 2, simplify = FALSE), comparer)
}
dat2 %>%
mutate(n = row_number()) %>%
summarise(comparisons = map(list(n), processor), .by = nodepair) %>%
unnest(comparisons)
# 输出:
# A tibble: 30 × 12
nodepair nodepair1 nodepair2 V1 V2 V3 V4 V5
<chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE
2 A1_A1 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
3 A1_A1 A1_A1(45) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
4 A1_A1 A1_A1(46) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
5 A1_A1 A1_A1(46) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
6 A1_A1 A1_A1(47) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
7 A2_A1 A2_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE
8 A2_A1 A2_A1(45) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
9 A2_A1 A2_A1(45) A2_A1(48) FALSE FALSE FALSE FALSE FALSE
10 A2_A1 A2_A1(46) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
# ℹ 20 more rows
注意:在所有这些情况下,都使用了TRUE和FALSE。TRUE和FALSE等价于1和0 - 实际上,您可以通过执行TRUE + TRUE来看到,结果为2。我可以将 dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0 包装在 as.integer 中,但我想您会理解。
英文:
Here is how I would do it:
library(tidyverse)
# create a function which compares one combination of rows with each other
comparer <- function(comb) {
bind_cols(
nodepair1 = paste0(dat2[comb[1], 1], "(", dat2[comb[1], 11], ")"),
nodepair2 = paste0(dat2[comb[2], 1], "(", dat2[comb[2], 11], ")"),
dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
)
}
combs <- combn(1:nrow(dat2), 2, simplify = FALSE)
# then apply that to each combination of rows for the dataset
map_df(combs, comparer)
# Output:
# A tibble: 190 × 11
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1(45) A2_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
2 A1_A1(45) A2_A2(45) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
3 A1_A1(45) A3_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
4 A1_A1(45) A3_A2(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
5 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
6 A1_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
7 A1_A1(45) A2_A2(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
8 A1_A1(45) A3_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
9 A1_A1(45) A3_A2(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
10 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# ℹ 180 more rows
Update: If you only want to compare ones with the same 'nodepair' value, you can do this:
processor <- function(x) {
map_df(combn(x, 2, simplify = FALSE), comparer)
}
dat2 |>
mutate(n = row_number()) |>
summarise(comparisons = map(list(n), processor), .by = nodepair) |>
unnest(comparisons)
# Output:
# A tibble: 30 × 12
nodepair nodepair1 nodepair2 V1 V2 V3 V4 V5
<chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE
2 A1_A1 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
3 A1_A1 A1_A1(45) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
4 A1_A1 A1_A1(46) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
5 A1_A1 A1_A1(46) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
6 A1_A1 A1_A1(47) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
7 A2_A1 A2_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE
8 A2_A1 A2_A1(45) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
9 A2_A1 A2_A1(45) A2_A1(48) FALSE FALSE FALSE FALSE FALSE
10 A2_A1 A2_A1(46) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
# ℹ 20 more rows
Note: in all of these, it's using TRUE and FALSE. TRUE and FALSE are equivalent to 1 and 0 - indeed, you can see by doing TRUE + TRUE, which equals 2. I could wrap the dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0 in as.integer, but I figured you would understand.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论