英文:
Summarise node table
问题
I've translated the code portion for you:
我已经将代码部分翻译好了:
x <- data.frame(
old = c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
new = c("final1", "final1", "inter1", "inter3", "inter2", "final2", "final3")
)
old new
start1 final1
start2 final1
start3 inter1
start4 inter3
inter1 inter2
inter2 final2
inter3 final3
我希望直接得到每一行的“最终节点”。
在上面的示例中,它将是:
res <- data.frame(
old= c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
new = c("final1", "final1", "final2", "final3", "final3", "final2", "final3")
)
old new
start1 final1
start2 final1
start3 final2
start4 final3
inter1 final3
inter2 final2
inter3 final3
英文:
I've got a table such as below :
x <- data.frame(
old = c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
new = c("final1", "final1", "inter1", "inter3", "inter2", "final2", "final3")
)
old new
start1 final1
start2 final1
start3 inter1
start4 inter3
inter1 inter2
inter2 final2
inter3 final3
I would like to have directly the "final node" for each line.
On the example above it would be :
res <- data.frame(
old= c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
new = c("final1", "final1", "final2", "final3", "final3", "final2", "final3")
)
old new
start1 final1
start2 final1
start3 final2
start4 final3
inter1 final3
inter2 final2
inter3 final3
I guess something recursive must be done (knowing there can be multiple level) but I can't go through it.
答案1
得分: 3
以下是代码的中文翻译部分:
你可以在这里使用循环
while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x
old new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3
在这里,我们迭代,只要有任何 "new" 值在 "old" 列中。我们找到它们,然后使用 `match` 查找每个中间值的 "new" 值。我们循环,直到没有更多要更新。
如果您使用类似图的数据,`igraph` 库可能会有帮助。在这种情况下,您可以这样使用它
gg <- graph_from_data_frame(x)
plot(gg)
lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()
X1 X2
1 start1 final1
2 start2 final1
3 start3 final2
4 inter1 final2
5 inter2 final2
6 start4 final3
7 inter3 final3
它通过将输入视为有向图来工作。然后,我们将其分解为不重叠的部分,并对于每个部分,我们找到没有出度连接的节点。
<details>
<summary>英文:</summary>
You could use a loop here
while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x
old new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3
Here we iterate while any of the "new" values are in the "old" column. We find them, then use `match` to look up what the "new" value is each each of those intermediate values. We loop until there are no more to update.
And if you are using graph-like data, the `igraph` library can be helpful. Here's how you might use it in this case
gg <- graph_from_data_frame(x)
plot(gg)
lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()
X1 X2
1 start1 final1
2 start2 final1
3 start3 final2
4 inter1 final2
5 inter2 final2
6 start4 final3
7 inter3 final3
It works by treating the input as a directed graph. We then decompose it into non-overlapping parts and for each part, we find the node that has no out-going connections.
[![enter image description here][1]][1]
[1]: https://i.stack.imgur.com/HYbzm.png
</details>
# 答案2
**得分**: 1
If you would like to use `igraph`, you can try `subcomponent` like below (but this might be not efficient due to rowwise operations, e.g., `sapply`):
如果你想要使用 `igraph`,你可以尝试以下的 `subcomponent`(但这可能不是高效的,因为涉及到逐行操作,例如 `sapply`):
```R
g <- graph_from_data_frame(x)
x %>%
mutate(new = sapply(
old,
function(v) tail(names(subcomponent(g, v, "out")), 1)
))
which gives
这将产生以下结果:
old new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3
Probably a more efficient way is using membership
+ left_join
:
可能更高效的方法是使用 membership
+ left_join
:
x %>%
left_join(
{.} %>%
graph_from_data_frame() %>%
components() %>%
membership() %>%
enframe(),
by = join_by(old == name)
) %>%
group_by(value) %>%
mutate(new = grep("^final", new, value = TRUE)) %>%
ungroup() %>%
select(-value)
which gives
这将产生以下结果:
# A tibble: 7 × 2
old new
<chr> <chr>
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3
英文:
If you would like to use igraph
, you can try subcomponent
like below (but this might be not inefficient due to rowwise operations, e.g., sapply
)
g <- graph_from_data_frame(x)
x %>%
mutate(new = sapply(
old,
function(v) tail(names(subcomponent(g, v, "out")), 1)
))
which gives
old new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3
Probably a more efficient way is using membership
+ left_join
x %>%
left_join(
{.} %>%
graph_from_data_frame() %>%
components() %>%
membership() %>%
enframe(),
by = join_by(old == name)
) %>%
group_by(value) %>%
mutate(new = grep("^final", new, value = TRUE)) %>%
ungroup() %>%
select(-value)
which gives
# A tibble: 7 × 2
old new
<chr> <chr>
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论