英文:
Contract a dataframe of an edge list by summing the contracted edge weights from/to two nodes
问题
Here's the translated code part:
我有一个包含两对节点之间边权重数据的数据框 `df`:
df <- data.frame(c("A","A","B","B","C","C"),
c("B","C","A","C","A","B"),
c(2,3,6,4,9,1))
colnames(df) <- c("node_from", "node_to", "weight")
print(df)
# 输出:
node_from node_to weight
1 A B 2
2 A C 3
3 B A 6
4 B C 4
5 C A 9
6 C B 1
你想要合并节点 A 和 B 并将它们与其他节点的所有边权重相加,这里只有节点 C。结果应该是一个边列表,A 和 B 之间的边已消失,AB 现在是一个节点:
# 一些代码来合并节点 A 和 B
print(df_contracted)
# 输出:
node_from node_to weight
1 AB C 7
3 C AB 10
你是否有方法可以在更大的数据框上高效执行此操作?
我可以将数据框转换为实际的图,使用 igraph
包中的 graph_from_data_frame
和 contract
函数,但考虑到我必须多次执行此操作,我宁愿不必每次都进行转换和重新转换。
英文:
I have a dataframe df
that contains data on edge weights between two pairs of nodes:
df <- data.frame(c("A","A","B","B","C","C"),
c("B","C","A","C","A","B"),
c(2,3,6,4,9,1))
colnames(df) <- c("node_from", "node_to", "weight")
print(df)
# Output:
node_from node_to weight
1 A B 2
2 A C 3
3 B A 6
4 B C 4
5 C A 9
6 C B 1
I would like to contract this dataframe by merging nodes A and B and summing all edge weights to and from these nodes with any other node, in this case C only. The result should be an edge list where the edges between A and B have disappeared and AB is now one node:
# some code to merge nodes A and B
print(df_contracted)
# Output:
node_from node_to weight
1 AB C 7
3 C AB 10
Is there a way to do this efficiently for larger dataframes?
I could convert the dataframe to an actual graph using graph_from_data_frame
from the igraph
package and then the contract
function, but given that I have to do this operation multiple times I'd rather not have to convert it then reconvert it back every time.
答案1
得分: 4
base R
方法
使用基本的 R 语法,我们可以像下面这样使用 aggregate
+ subset
:
aggregate(
weight ~ .,
subset(
transform(
df,
node_from = gsub("A|B", "AB", node_from),
node_to = gsub("A|B", "AB", node_to)
),
node_from != node_to
),
sum
)
这将得到如下结果:
node_from node_to weight
1 C AB 10
2 AB C 7
igraph
方法
这是使用 igraph
中的 contract
函数的方法:
df %>%
graph_from_data_frame() %>%
contract(c(1, 1, 2), function(v) paste0(v, collapse = "")) %>%
simplify() %>%
get.data.frame()
这将得到如下结果:
from to weight
1 AB C 7
2 C AB 10
英文:
base R
approach
With base R we can use aggregate
+ subset
like below
aggregate(
weight ~ .,
subset(
transform(
df,
node_from = gsub("A|B", "AB", node_from),
node_to = gsub("A|B", "AB", node_to)
),
node_from != node_to
),
sum
)
which gives
node_from node_to weight
1 C AB 10
2 AB C 7
igraph
approach
Here is an option using contract
from igraph
df %>%
graph_from_data_frame() %>%
contract(c(1, 1, 2), function(v) paste0(v, collapse = "")) %>%
simplify() %>%
get.data.frame()
which gives
from to weight
1 AB C 7
2 C AB 10
答案2
得分: 2
以下是翻译好的代码部分:
library(dplyr)
to.merge <- c('A', 'B')
merged.name <- paste(to.merge, collapse='')
df %>%
mutate(across(c(node_from, node_to),
~ if_else(.x %in% to.merge, merged.name, .x))) %>%
group_by(node_from, node_to) %>%
summarise(weight = sum(weight), .groups = "drop") %>%
filter(node_from != node_to)
# # A tibble: 2 × 3
# node_from node_to weight
# <chr> <chr> <dbl>
# 1 AB C 7
# 2 C AB 10
希望这对您有帮助。
英文:
Here's a dplyr
solution:
library(dplyr)
to.merge <- c('A', 'B')
merged.name <- paste(to.merge, collapse='')
df %>%
mutate(across(c(node_from, node_to),
~ if_else(.x %in% to.merge, merged.name, .x))) %>%
group_by(node_from, node_to) %>%
summarise(weight = sum(weight), .groups = "drop") %>%
filter(node_from != node_to)
# # A tibble: 2 × 3
# node_from node_to weight
# <chr> <chr> <dbl>
# 1 AB C 7
# 2 C AB 10
It changes all from and to node names that are "A" or "B" to "AB", groups rows with the same combination of from_node
and to_node
, sums weights within these groups, and finally removes the AB<->AB self-loop.
答案3
得分: 2
You may subset
the AB and BA rows away, next sum
marize by
a grepl
on 'C'
, and rbind
.
subset(df, rowSums(sapply(df[1:2], grepl, pat='A|B')) != 2) |
{
(.) by(., grepl('C', .$node_from), (x) {
data.frame(t(sapply(x[1:2], (z) paste(unique(z), collapse=''))), weight=sum(x$weight))
})}() |
unname() |
do.call(what='rbind')
node_from node_to weight
1 AB C 7
2 C AB 10
Data:
df <- structure(list(node_from = c("A", "A", "B", "B", "C", "C"), node_to = c("B",
"C", "A", "C", "A", "B"), weight = c(2, 3, 6, 4, 9, 1)), class = "data.frame", row names = c(NA,
-6L))
英文:
You may subset
the AB and BA rows away, next sum
marize by
a grepl
on 'C'
, and rbind
.
subset(df, rowSums(sapply(df[1:2], grepl, pat='A|B')) != 2) |>
{\(.) by(., grepl('C', .$node_from), \(x) {
data.frame(t(sapply(x[1:2], \(z) paste(unique(z), collapse=''))), weight=sum(x$weight))
})}() |> unname() |> do.call(what='rbind')
# node_from node_to weight
# 1 AB C 7
# 2 C AB 10
Data:
df <- structure(list(node_from = c("A", "A", "B", "B", "C", "C"), node_to = c("B",
"C", "A", "C", "A", "B"), weight = c(2, 3, 6, 4, 9, 1)), class = "data.frame", row.names = c(NA,
-6L))
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论