Summarise node table

huangapple go评论53阅读模式
英文:

Summarise node table

问题

I've translated the code portion for you:

我已经将代码部分翻译好了:

x <- data.frame(
  old = c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
  new = c("final1", "final1", "inter1", "inter3", "inter2", "final2", "final3")
)

old    new
start1 final1
start2 final1
start3 inter1
start4 inter3
inter1 inter2
inter2 final2
inter3 final3

我希望直接得到每一行的“最终节点”。
在上面的示例中,它将是:

res <- data.frame(
  old= c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
  new = c("final1", "final1", "final2", "final3", "final3", "final2", "final3")
)

old    new
start1 final1
start2 final1
start3 final2
start4 final3
inter1 final3
inter2 final2
inter3 final3
英文:

I've got a table such as below :

x &lt;- data.frame(
  old = c(&quot;start1&quot;, &quot;start2&quot;, &quot;start3&quot;, &quot;start4&quot;, &quot;inter1&quot;, &quot;inter2&quot;, &quot;inter3&quot;),
  new = c(&quot;final1&quot;, &quot;final1&quot;, &quot;inter1&quot;, &quot;inter3&quot;, &quot;inter2&quot;, &quot;final2&quot;, &quot;final3&quot;)
)

    old    new
 start1 final1
 start2 final1
 start3 inter1
 start4 inter3
 inter1 inter2
 inter2 final2
 inter3 final3

I would like to have directly the "final node" for each line.
On the example above it would be :

res &lt;- data.frame(
  old= c(&quot;start1&quot;, &quot;start2&quot;, &quot;start3&quot;, &quot;start4&quot;, &quot;inter1&quot;, &quot;inter2&quot;, &quot;inter3&quot;),
  new = c(&quot;final1&quot;, &quot;final1&quot;, &quot;final2&quot;, &quot;final3&quot;, &quot;final3&quot;, &quot;final2&quot;, &quot;final3&quot;)
)

    old    new
 start1 final1
 start2 final1
 start3 final2
 start4 final3
 inter1 final3
 inter2 final2
 inter3 final3

I guess something recursive must be done (knowing there can be multiple level) but I can't go through it.

答案1

得分: 3

以下是代码的中文翻译部分:

你可以在这里使用循环

while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

在这里,我们迭代,只要有任何 "new" 值在 "old" 列中。我们找到它们,然后使用 `match` 查找每个中间值的 "new" 值。我们循环,直到没有更多要更新。

如果您使用类似图的数据,`igraph` 库可能会有帮助。在这种情况下,您可以这样使用它

gg <- graph_from_data_frame(x)
plot(gg)

lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

它通过将输入视为有向图来工作。然后,我们将其分解为不重叠的部分,并对于每个部分,我们找到没有出度连接的节点。

<details>
<summary>英文:</summary>

You could use a loop here

while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

Here we iterate while any of the &quot;new&quot; values are in the &quot;old&quot; column. We find them, then use `match` to look up what the &quot;new&quot; value is each each of those intermediate values. We loop until there are no more to update.

And if you are using graph-like data, the `igraph` library can be helpful. Here&#39;s how you might use it in this case

gg <- graph_from_data_frame(x)
plot(gg)

lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

It works by treating the input as a directed graph. We then decompose it into non-overlapping parts and for each part, we find the node that has no out-going connections.

[![enter image description here][1]][1]


  [1]: https://i.stack.imgur.com/HYbzm.png

</details>



# 答案2
**得分**: 1

If you would like to use `igraph`, you can try `subcomponent` like below (but this might be not efficient due to rowwise operations, e.g., `sapply`):

如果你想要使用 `igraph`,你可以尝试以下的 `subcomponent`(但这可能不是高效的,因为涉及到逐行操作,例如 `sapply`):

```R
g <- graph_from_data_frame(x)
x %>%
  mutate(new = sapply(
    old,
    function(v) tail(names(subcomponent(g, v, "out")), 1)
  ))

which gives

这将产生以下结果:

     old    new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3

Probably a more efficient way is using membership + left_join:

可能更高效的方法是使用 membership + left_join

x %>%
  left_join(
    {.} %>%
      graph_from_data_frame() %>%
      components() %>%
      membership() %>%
      enframe(),
    by = join_by(old == name)
  ) %>%
  group_by(value) %>%
  mutate(new = grep("^final", new, value = TRUE)) %>%
  ungroup() %>%
  select(-value)

which gives

这将产生以下结果:

# A tibble: 7 × 2
  old    new
  <chr>  <chr>
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3
英文:

If you would like to use igraph, you can try subcomponent like below (but this might be not inefficient due to rowwise operations, e.g., sapply)

g &lt;- graph_from_data_frame(x)
x %&gt;%
  mutate(new = sapply(
    old,
    function(v) tail(names(subcomponent(g, v, &quot;out&quot;)), 1)
  ))

which gives

     old    new
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3

Probably a more efficient way is using membership + left_join

x %&gt;%
  left_join(
    {.} %&gt;%
      graph_from_data_frame() %&gt;%
      components() %&gt;%
      membership() %&gt;%
      enframe(),
    by = join_by(old == name)
  ) %&gt;%
  group_by(value) %&gt;%
  mutate(new = grep(&quot;^final&quot;, new, value = TRUE)) %&gt;%
  ungroup() %&gt;%
  select(-value)

which gives

# A tibble: 7 &#215; 2
  old    new
  &lt;chr&gt;  &lt;chr&gt;
1 start1 final1
2 start2 final1
3 start3 final2
4 start4 final3
5 inter1 final2
6 inter2 final2
7 inter3 final3

huangapple
  • 本文由 发表于 2023年4月13日 22:41:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定