创建成对的分组并保留分组ID。

huangapple go评论67阅读模式
英文:

Creating Pairs by Group and Keep the Groups ids

问题

以下是您要求的翻译部分:

从这个数据框中:

data<-data.frame(id_group=c("A","A","B","B","B","C","C", "C"),id_entity=c(1,2,2,3,4,2,5,1),nb_members=c(2,2,3,3,3,3,3,3))

使用以下代码,我成功获得了成对的连接:

m <- crossprod(table(data[-3]))
m[upper.tri(m, diag = TRUE)] <- 0
t_data<-subset(as.data.frame.table(m))
t_data <- t_data%>%subset(id_entity.1 != id_entity)

但是,我想保留连接所在的组ID信息:

id_entity id_entity.1 Freq   id_group
         2           1    2    A,C
         3           1    0    NA
         4           1    0    NA
         5           1    1    C
         1           2    0    NA
         3           2    1    B
         4           2    1    B
         5           2    1    C
         1           3    0    NA
         2           3    0    NA
         4           3    1    B
         5           3    0    NA
         1           4    0    NA
         2           4    0    NA
         3           4    0    NA
         5           4    0    NA
         1           5    0    NA
         2           5    0    NA
         3           5    0    NA
         4           5    0    NA

非常感谢您的帮助!

英文:

Here below an example of what I would like:

From this data frame:

data&lt;-data.frame(id_group=c(&quot;A&quot;,&quot;A&quot;,&quot;B&quot;,&quot;B&quot;,&quot;B&quot;,&quot;C&quot;,&quot;C&quot;, &quot;C&quot;),id_entity=c(1,2,2,3,4,2,5,1),nb_members=c(2,2,3,3,3,3,3,3))


  id_group id_entity nb_members
        A         1          2
        A         2          2
        B         2          3
        B         3          3
        B         4          3
        C         2          3
        C         5          3
        C         1          3

With the following code I manage to obtain the connections by pairs:

m &lt;- crossprod(table(data[-3]))
m[upper.tri(m, diag = TRUE)] &lt;-0
t_data&lt;-subset(as.data.frame.table(m))
t_data &lt;- t_data%&gt;%subset(id_entity.1 != id_entity)

id_entity id_entity.1 Freq
         2           1    2
         3           1    0
         4           1    0
         5           1    1
         1           2    0
         3           2    1
         4           2    1
         5           2    1
         1           3    0
         2           3    0
         4           3    1
         5           3    0
         1           4    0
         2           4    0
         3           4    0
         5           4    0
         1           5    0
         2           5    0
         3           5    0
         4           5    0

However, I would like to keep the information about the groups ids in which the connections are made:

id_entity id_entity.1 Freq   id_group
         2           1    2    A,C
         3           1    0    NA
         4           1    0    NA
         5           1    1    C
         1           2    0    NA
         3           2    1    B
         4           2    1    B
         5           2    1    C
         1           3    0    NA
         2           3    0    NA
         4           3    1    B
         5           3    0    NA
         1           4    0    NA
         2           4    0    NA
         3           4    0    NA
         5           4    0    NA
         1           5    0    NA
         2           5    0    NA
         3           5    0    NA
         4           5    0    NA

Thank you very much for your help!

答案1

得分: 1

以下是您要翻译的内容:

我通过创建一个表格,该表格镜像了在crossprod()中使用的表格,但在频率表中的非零值处有字母来完成这个任务。然后,您可以使用id_identityid_identity.1的信息来查找字母表的适当列。您希望从这两列的交叉值中拼接出结果。当频率计数为零时,您可以将字母值替换为NA

library(dplyr)
d <- data.frame(id_group = c("A", "A", "B", "B", "B", "C", "C", "C"), id_entity = c(1, 2, 2, 3, 4, 2, 5, 1), nb_members = c(2, 2, 3, 3, 3, 3, 3, 3))

tab <- table(d[-3])
tab2 <- apply(tab, 2, function(x) ifelse(x == 1, rownames(tab), ""))
m <- crossprod(table(d[-3]))
m[upper.tri(m, diag = TRUE)] <- 0
t_data <- as.data.frame.table(m)
t_data <- t_data %>% subset(id_entity.1 != id_entity)

t_data$pairs <- apply(t_data, 1, function(x) paste(intersect(tab2[, x[1]], tab2[, x[2]]), collapse = ","))
t_data$pairs <- gsub("^\\,", "", t_data$pairs)
t_data$pairs <- ifelse(t_data$Freq == 0, NA, t_data$pairs)
t_data
#>    id_entity id_entity.1 Freq pairs
#> 2          2           1    2   A,C
#> 3          3           1    0  <NA>
#> 4          4           1    0  <NA>
#> 5          5           1    1     C
#> 6          1           2    0  <NA>
#> 8          3           2    1     B
#> 9          4           2    1     B
#> 10         5           2    1     C
#> 11         1           3    0  <NA>
#> 12         2           3    0  <NA>
#> 14         4           3    1     B
#> 15         5           3    0  <NA>
#> 16         1           4    0  <NA>
#> 17         2           4    0  <NA>
#> 18         3           4    0  <NA>
#> 20         5           4    0  <NA>
#> 21         1           5    0  <NA>
#> 22         2           5    0  <NA>
#> 23         3           5    0  <NA>
#> 24         4           5    0  <NA>

创建于2023年5月17日,使用reprex v2.0.2

英文:

I accomplished this by making a table that mirrors the table used in crossprod(), but that has letters where there are non-zero values in the table of frequencies. Then, you can use information for id_identity and id_identity.1 to find the appropriate columns of the letter table. You want to past together the intersecting values from those two columns. You can replace the letter values with NA when the frequency count is zero.

library(dplyr)
d&lt;-data.frame(id_group=c(&quot;A&quot;,&quot;A&quot;,&quot;B&quot;,&quot;B&quot;,&quot;B&quot;,&quot;C&quot;,&quot;C&quot;, &quot;C&quot;),id_entity=c(1,2,2,3,4,2,5,1),nb_members=c(2,2,3,3,3,3,3,3))

tab &lt;- table(d[-3])
tab2 &lt;- apply(tab, 2, function(x)ifelse(x == 1, rownames(tab), &quot;&quot;))
m &lt;- crossprod(table(d[-3]))
m[upper.tri(m, diag = TRUE)] &lt;-0
t_data&lt;-as.data.frame.table(m)
t_data &lt;- t_data%&gt;%subset(id_entity.1 != id_entity)

t_data$pairs &lt;- apply(t_data, 1, function(x)paste(intersect(tab2[,x[1]], tab2[,x[2]]), collapse=&quot;,&quot;))
t_data$pairs &lt;- gsub(&quot;^\\,&quot;, &quot;&quot;, t_data$pairs)
t_data$pairs &lt;- ifelse(t_data$Freq == 0, NA, t_data$pairs)
t_data
#&gt;    id_entity id_entity.1 Freq pairs
#&gt; 2          2           1    2   A,C
#&gt; 3          3           1    0  &lt;NA&gt;
#&gt; 4          4           1    0  &lt;NA&gt;
#&gt; 5          5           1    1     C
#&gt; 6          1           2    0  &lt;NA&gt;
#&gt; 8          3           2    1     B
#&gt; 9          4           2    1     B
#&gt; 10         5           2    1     C
#&gt; 11         1           3    0  &lt;NA&gt;
#&gt; 12         2           3    0  &lt;NA&gt;
#&gt; 14         4           3    1     B
#&gt; 15         5           3    0  &lt;NA&gt;
#&gt; 16         1           4    0  &lt;NA&gt;
#&gt; 17         2           4    0  &lt;NA&gt;
#&gt; 18         3           4    0  &lt;NA&gt;
#&gt; 20         5           4    0  &lt;NA&gt;
#&gt; 21         1           5    0  &lt;NA&gt;
#&gt; 22         2           5    0  &lt;NA&gt;
#&gt; 23         3           5    0  &lt;NA&gt;
#&gt; 24         4           5    0  &lt;NA&gt;

<sup>Created on 2023-05-17 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年5月17日 18:31:41
  • 转载请务必保留本文链接:https://go.coder-hub.com/76271136.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定