2023年4月13日 22:41:43go评论106阅读模式

英文:

Summarise node table

问题

I've translated the code portion for you:

我已经将代码部分翻译好了：
x <- data.frame(
  old = c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
  new = c("final1", "final1", "inter1", "inter3", "inter2", "final2", "final3")
)
old    new
start1 final1
start2 final1
start3 inter1
start4 inter3
inter1 inter2
inter2 final2
inter3 final3
我希望直接得到每一行的“最终节点”。
在上面的示例中，它将是：
res <- data.frame(
  old= c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
  new = c("final1", "final1", "final2", "final3", "final3", "final2", "final3")
)
old    new
start1 final1
start2 final1
start3 final2
start4 final3
inter1 final3
inter2 final2
inter3 final3

英文:

I've got a table such as below :

x &lt;- data.frame(
  old = c(&quot;start1&quot;, &quot;start2&quot;, &quot;start3&quot;, &quot;start4&quot;, &quot;inter1&quot;, &quot;inter2&quot;, &quot;inter3&quot;),
  new = c(&quot;final1&quot;, &quot;final1&quot;, &quot;inter1&quot;, &quot;inter3&quot;, &quot;inter2&quot;, &quot;final2&quot;, &quot;final3&quot;)
)
    old    new
 start1 final1
 start2 final1
 start3 inter1
 start4 inter3
 inter1 inter2
 inter2 final2
 inter3 final3

I would like to have directly the "final node" for each line.
On the example above it would be :

res &lt;- data.frame(
  old= c(&quot;start1&quot;, &quot;start2&quot;, &quot;start3&quot;, &quot;start4&quot;, &quot;inter1&quot;, &quot;inter2&quot;, &quot;inter3&quot;),
  new = c(&quot;final1&quot;, &quot;final1&quot;, &quot;final2&quot;, &quot;final3&quot;, &quot;final3&quot;, &quot;final2&quot;, &quot;final3&quot;)
)
    old    new
 start1 final1
 start2 final1
 start3 final2
 start4 final3
 inter1 final3
 inter2 final2
 inter3 final3

I guess something recursive must be done (knowing there can be multiple level) but I can't go through it.

答案1

得分: 3

以下是代码的中文翻译部分：

你可以在这里使用循环

while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

在这里，我们迭代，只要有任何 "new" 值在 "old" 列中。我们找到它们，然后使用 `match` 查找每个中间值的 "new" 值。我们循环，直到没有更多要更新。
如果您使用类似图的数据，`igraph` 库可能会有帮助。在这种情况下，您可以这样使用它

gg <- graph_from_data_frame(x)
plot(gg)

lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

它通过将输入视为有向图来工作。然后，我们将其分解为不重叠的部分，并对于每个部分，我们找到没有出度连接的节点。
<details>
<summary>英文:</summary>
You could use a loop here

while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

Here we iterate while any of the &quot;new&quot; values are in the &quot;old&quot; column. We find them, then use `match` to look up what the &quot;new&quot; value is each each of those intermediate values. We loop until there are no more to update.
And if you are using graph-like data, the `igraph` library can be helpful. Here&#39;s how you might use it in this case

gg <- graph_from_data_frame(x)
plot(gg)

lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

It works by treating the input as a directed graph. We then decompose it into non-overlapping parts and for each part, we find the node that has no out-going connections.
[![enter image description here][1]][1]
  [1]: https://i.stack.imgur.com/HYbzm.png
</details>
# 答案2
**得分**: 1
If you would like to use `igraph`, you can try `subcomponent` like below (but this might be not efficient due to rowwise operations, e.g., `sapply`):
如果你想要使用 `igraph`，你可以尝试以下的 `subcomponent`（但这可能不是高效的，因为涉及到逐行操作，例如 `sapply`）：
```R
g <- graph_from_data_frame(x)
x %>%
  mutate(new = sapply(
    old,
    function(v) tail(names(subcomponent(g, v, "out")), 1)
  ))

which gives

这将产生以下结果：

     old    new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3

Probably a more efficient way is using membership + left_join:

可能更高效的方法是使用 membership + left_join：

x %>%
  left_join(
    {.} %>%
      graph_from_data_frame() %>%
      components() %>%
      membership() %>%
      enframe(),
    by = join_by(old == name)
  ) %>%
  group_by(value) %>%
  mutate(new = grep("^final", new, value = TRUE)) %>%
  ungroup() %>%
  select(-value)

which gives

这将产生以下结果：

# A tibble: 7 × 2
  old    new
  <chr>  <chr>
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3

英文:

If you would like to use igraph, you can try subcomponent like below (but this might be not inefficient due to rowwise operations, e.g., sapply)

g &lt;- graph_from_data_frame(x)
x %&gt;%
  mutate(new = sapply(
    old,
    function(v) tail(names(subcomponent(g, v, &quot;out&quot;)), 1)
  ))

which gives

     old    new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3

Probably a more efficient way is using membership + left_join

x %&gt;%
  left_join(
    {.} %&gt;%
      graph_from_data_frame() %&gt;%
      components() %&gt;%
      membership() %&gt;%
      enframe(),
    by = join_by(old == name)
  ) %&gt;%
  group_by(value) %&gt;%
  mutate(new = grep(&quot;^final&quot;, new, value = TRUE)) %&gt;%
  ungroup() %&gt;%
  select(-value)

which gives

# A tibble: 7 &#215; 2
  old    new
  &lt;chr&gt;  &lt;chr&gt;
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

问题

答案1

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

发表评论