英文:
Using combn to make specific functions for grouped pair-wise, row-wise comparisons
问题
这是我正在处理的数据集的一个小部分。
我试图编写一个程序,该程序将执行所有成对比较(按'nodepair'分组),跨'ES'组进行比较。
我想编写一系列函数,用于具体比较每一对行。例如,当V1:V9对于两个ES都大于0时,应该得到1,表示存在数据。
我想象中的输出应该是这样的:
dat3 <- read.table(text = "
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
A1_A1(45) A1_A1(46) 0 0 1 0 0 0 0 0 1
", header = TRUE)
不幸的是,我没有取得太多进展:
dat2 <- dat2 %>%
group_by(nodepair) %>%
col2 = t(combn(nodepair,2)))
我相当确定我需要在这里使用'combn',但我对这个函数非常陌生,无法弄清楚它。
英文:
This is a small section of a dataset I'm working on.
dat2 <- read.table(text = "
nodepair V1 V2 V3 V4 V5 V6 V7 V8 V9 ES
1 A1_A1 0 21 0 0 0 0 0 0 78 45
2 A2_A1 0 0 0 0 0 0 0 0 99 45
3 A2_A2 0 1 0 0 0 0 0 0 98 45
4 A3_A1 0 0 0 0 0 6 1 3 89 45
5 A3_A2 0 0 0 0 0 0 0 0 99 45
6 A1_A1 0 20 0 0 0 0 0 0 65 46
7 A2_A1 0 0 0 0 0 0 0 0 85 46
8 A2_A2 0 1 0 0 0 0 0 0 84 46
9 A3_A1 0 0 0 0 2 6 3 3 71 46
10 A3_A2 0 0 0 0 0 0 0 0 85 46
11 A1_A1 0 25 0 0 0 0 0 0 45 47
12 A2_A1 0 0 0 0 0 0 0 0 70 47
13 A2_A2 0 1 0 0 0 0 0 0 69 47
14 A3_A1 0 0 0 0 0 8 0 1 61 47
15 A3_A2 0 0 0 0 0 0 0 0 70 47
16 A1_A1 0 37 0 0 0 0 0 0 77 48
17 A2_A1 0 0 0 0 0 0 0 0 114 48
18 A2_A2 0 0 0 0 0 0 0 0 114 48
19 A3_A1 0 0 0 0 2 9 0 3 100 48
20 A3_A2 0 0 0 0 0 0 0 0 114 48
", header = TRUE)
I'm trying to write a program that will do all pairwise comparisons (grouped by the nodepair) across the 'ES' groups.
I'd like to write a series of functions to specifically compare each pair of rows. For example, when V1:V9 is > 0 for both ESs, this should result in 1, indicating presence of data.
I'm imagining the output to look something like this:
dat3 <- read.table(text = "
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
A1_A1(45) A1_A1(46) 0 0 1 0 0 0 0 0 1
", header = TRUE)
etc.
Unfortunately, I haven't gotten very far:
dat2 <- dat2 %>%
group_by(nodepair) %>%
col2 = t(combn(nodepair,2)))
I'm pretty sure I need 'combn' here, but I'm very new to this function and can't figure it out.
答案1
得分: 1
现在,TO已经澄清了他们的问题,我提出以下解决方案:
library(tidyverse)
ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)
dat2 %>%
group_split(nodepair) %>%
map(.x = _,
.f = \(df) df %>%
map(.x = 1:length(ES_combs),
.f = ~df %>%
filter(ES %in% ES_combs[[.x]]) %>%
summarize(nodepair = first(nodepair),
ES_1 = ES[1],
ES_2 = ES[2],
across(V1:V9, ~as.numeric(all(. > 0)))))) %>%
bind_rows()
这将产生以下结果:
# A tibble: 30 × 12
nodepair ES_1 ES_2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1_A1 45 46 0 1 0 0 0 0 0 0 1
2 A1_A1 45 47 0 1 0 0 0 0 0 0 1
3 A1_A1 45 48 0 1 0 0 0 0 0 0 1
4 A1_A1 46 47 0 1 0 0 0 0 0 0 1
5 A1_A1 46 48 0 1 0 0 0 0 0 0 1
6 A1_A1 47 48 0 1 0 0 0 0 0 0 1
7 A2_A1 45 46 0 0 0 0 0 0 0 0 1
8 A2_A1 45 47 0 0 0 0 0 0 0 0 1
9 A2_A1 45 48 0 0 0 0 0 0 0 0 1
10 A2_A1 46 47 0 0 0 0 0 0 0 0 1
# ... with 20 more rows
这可能需要一些解释:
- 我们首先创建了数据帧中所有ES的成对组合,并将其分配给一个名为
ES_combs
的列表对象。 - 然后,我们将您的数据按nodepair组拆分为一个列表,其中每个列表对象都是一个nodepair组的数据。
- 接下来,我们启动外部
map
,遍历每个组的数据帧。在这里,定义匿名函数很重要,因为我们有一个内部map
,所以不能两次使用.x参数。 - 内部
map
获取来自ES_combs
的每对组合,并将当前组的数据筛选为这两行。然后我们应用summarize部分。 - 作为最后一步,我们使用
bind_rows
将所有内容合并到一个漂亮的tibble中,而不是拥有一个令人讨厌的长列表。
英文:
Now with the TO having clarified their question, I'd propose the following solution:
library(tidyverse)
ES_combs <- combn(unique(dat2$ES), 2, simplify = FALSE)
dat2 |>
group_split(nodepair) |>
map(.x = _,
.f = \(df) df |>
map(.x = 1:length(ES_combs),
.f = ~df |>
filter(ES %in% ES_combs[[.x]]) |>
summarize(nodepair = first(nodepair),
ES_1 = ES[1],
ES_2 = ES[2],
across(V1:V9, ~as.numeric(all(. >0)))))) |>
bind_rows()
which gives:
# A tibble: 30 × 12
nodepair ES_1 ES_2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <int> <int> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 A1_A1 45 46 0 1 0 0 0 0 0 0 1
2 A1_A1 45 47 0 1 0 0 0 0 0 0 1
3 A1_A1 45 48 0 1 0 0 0 0 0 0 1
4 A1_A1 46 47 0 1 0 0 0 0 0 0 1
5 A1_A1 46 48 0 1 0 0 0 0 0 0 1
6 A1_A1 47 48 0 1 0 0 0 0 0 0 1
7 A2_A1 45 46 0 0 0 0 0 0 0 0 1
8 A2_A1 45 47 0 0 0 0 0 0 0 0 1
9 A2_A1 45 48 0 0 0 0 0 0 0 0 1
10 A2_A1 46 47 0 0 0 0 0 0 0 0 1
11 A2_A1 46 48 0 0 0 0 0 0 0 0 1
12 A2_A1 47 48 0 0 0 0 0 0 0 0 1
13 A2_A2 45 46 0 1 0 0 0 0 0 0 1
14 A2_A2 45 47 0 1 0 0 0 0 0 0 1
15 A2_A2 45 48 0 0 0 0 0 0 0 0 1
16 A2_A2 46 47 0 1 0 0 0 0 0 0 1
17 A2_A2 46 48 0 0 0 0 0 0 0 0 1
18 A2_A2 47 48 0 0 0 0 0 0 0 0 1
19 A3_A1 45 46 0 0 0 0 0 1 1 1 1
20 A3_A1 45 47 0 0 0 0 0 1 0 1 1
21 A3_A1 45 48 0 0 0 0 0 1 0 1 1
22 A3_A1 46 47 0 0 0 0 0 1 0 1 1
23 A3_A1 46 48 0 0 0 0 1 1 0 1 1
24 A3_A1 47 48 0 0 0 0 0 1 0 1 1
25 A3_A2 45 46 0 0 0 0 0 0 0 0 1
26 A3_A2 45 47 0 0 0 0 0 0 0 0 1
27 A3_A2 45 48 0 0 0 0 0 0 0 0 1
28 A3_A2 46 47 0 0 0 0 0 0 0 0 1
29 A3_A2 46 48 0 0 0 0 0 0 0 0 1
30 A3_A2 47 48 0 0 0 0 0 0 0 0 1
This probably needs a bit of explanation:
- We start with creating all pairwise combinations of ES in your data frame and assign it to a list object
ES_combs
- We then take your data and split it by nodepair group into a list, where each list object is the data for one nodepair group.
- We then initiate the outer
map
where we go through each group's data frame. It is important here to define an anonymous function, because we have an innermap
, so we can't use the .x parameter twice. - The inner
map
takes each combination pair fromES_combs
and filters the current group's data to these two rows. We then apply the summarize part. - As a last step, we use
bind_rows
to merge everything into a nice tibble instead of having an annoyingly long list.
答案2
得分: 1
以下是我会如何执行它的方式:
library(tidyverse)
# 创建一个函数,用于比较一组行中的每一对
comparer <- function(comb) {
bind_cols(
nodepair1 = paste0(dat2[comb[1], 1], "(", dat2[comb[1], 11], ")"),
nodepair2 = paste0(dat2[comb[2], 1], "(", dat2[comb[2], 11], ")"),
dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
)
}
combs <- combn(1:nrow(dat2), 2, simplify = FALSE)
# 然后将其应用于数据集的每一对行的组合
map_df(combs, comparer)
# 输出:
# A tibble: 190 × 11
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1(45) A2_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
2 A1_A1(45) A2_A2(45) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
3 A1_A1(45) A3_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
4 A1_A1(45) A3_A2(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
5 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
6 A1_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
7 A1_A1(45) A2_A2(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
8 A1_A1(45) A3_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
9 A1_A1(45) A3_A2(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
10 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# ℹ 180 more rows
更新:如果您只想比较具有相同 'nodepair' 值的数据,您可以这样做:
processor <- function(x) {
map_df(combn(x, 2, simplify = FALSE), comparer)
}
dat2 %>%
mutate(n = row_number()) %>%
summarise(comparisons = map(list(n), processor), .by = nodepair) %>%
unnest(comparisons)
# 输出:
# A tibble: 30 × 12
nodepair nodepair1 nodepair2 V1 V2 V3 V4 V5
<chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE
2 A1_A1 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
3 A1_A1 A1_A1(45) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
4 A1_A1 A1_A1(46) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
5 A1_A1 A1_A1(46) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
6 A1_A1 A1_A1(47) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
7 A2_A1 A2_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE
8 A2_A1 A2_A1(45) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
9 A2_A1 A2_A1(45) A2_A1(48) FALSE FALSE FALSE FALSE FALSE
10 A2_A1 A2_A1(46) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
# ℹ 20 more rows
注意:在所有这些情况下,都使用了TRUE和FALSE。TRUE和FALSE等价于1和0 - 实际上,您可以通过执行TRUE + TRUE来看到,结果为2。我可以将 dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
包装在 as.integer
中,但我想您会理解。
英文:
Here is how I would do it:
library(tidyverse)
# create a function which compares one combination of rows with each other
comparer <- function(comb) {
bind_cols(
nodepair1 = paste0(dat2[comb[1], 1], "(", dat2[comb[1], 11], ")"),
nodepair2 = paste0(dat2[comb[2], 1], "(", dat2[comb[2], 11], ")"),
dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
)
}
combs <- combn(1:nrow(dat2), 2, simplify = FALSE)
# then apply that to each combination of rows for the dataset
map_df(combs, comparer)
# Output:
# A tibble: 190 × 11
nodepair1 nodepair2 V1 V2 V3 V4 V5 V6 V7 V8 V9
<chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1(45) A2_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
2 A1_A1(45) A2_A2(45) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
3 A1_A1(45) A3_A1(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
4 A1_A1(45) A3_A2(45) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
5 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
6 A1_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
7 A1_A1(45) A2_A2(46) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
8 A1_A1(45) A3_A1(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
9 A1_A1(45) A3_A2(46) FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
10 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE
# ℹ 180 more rows
Update: If you only want to compare ones with the same 'nodepair' value, you can do this:
processor <- function(x) {
map_df(combn(x, 2, simplify = FALSE), comparer)
}
dat2 |>
mutate(n = row_number()) |>
summarise(comparisons = map(list(n), processor), .by = nodepair) |>
unnest(comparisons)
# Output:
# A tibble: 30 × 12
nodepair nodepair1 nodepair2 V1 V2 V3 V4 V5
<chr> <chr> <chr> <lgl> <lgl> <lgl> <lgl> <lgl>
1 A1_A1 A1_A1(45) A1_A1(46) FALSE TRUE FALSE FALSE FALSE
2 A1_A1 A1_A1(45) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
3 A1_A1 A1_A1(45) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
4 A1_A1 A1_A1(46) A1_A1(47) FALSE TRUE FALSE FALSE FALSE
5 A1_A1 A1_A1(46) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
6 A1_A1 A1_A1(47) A1_A1(48) FALSE TRUE FALSE FALSE FALSE
7 A2_A1 A2_A1(45) A2_A1(46) FALSE FALSE FALSE FALSE FALSE
8 A2_A1 A2_A1(45) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
9 A2_A1 A2_A1(45) A2_A1(48) FALSE FALSE FALSE FALSE FALSE
10 A2_A1 A2_A1(46) A2_A1(47) FALSE FALSE FALSE FALSE FALSE
# ℹ 20 more rows
Note: in all of these, it's using TRUE and FALSE. TRUE and FALSE are equivalent to 1 and 0 - indeed, you can see by doing TRUE + TRUE, which equals 2. I could wrap the dat2[comb[1], 2:10] > 0 & dat2[comb[2], 2:10] > 0
in as.integer
, but I figured you would understand.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论