Summarise node table

huangapple go评论106阅读模式
英文:

Summarise node table

问题

I've translated the code portion for you:

  1. 我已经将代码部分翻译好了:
  2. x <- data.frame(
  3. old = c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
  4. new = c("final1", "final1", "inter1", "inter3", "inter2", "final2", "final3")
  5. )
  6. old new
  7. start1 final1
  8. start2 final1
  9. start3 inter1
  10. start4 inter3
  11. inter1 inter2
  12. inter2 final2
  13. inter3 final3
  14. 我希望直接得到每一行的“最终节点”。
  15. 在上面的示例中,它将是:
  16. res <- data.frame(
  17. old= c("start1", "start2", "start3", "start4", "inter1", "inter2", "inter3"),
  18. new = c("final1", "final1", "final2", "final3", "final3", "final2", "final3")
  19. )
  20. old new
  21. start1 final1
  22. start2 final1
  23. start3 final2
  24. start4 final3
  25. inter1 final3
  26. inter2 final2
  27. inter3 final3
英文:

I've got a table such as below :

  1. x &lt;- data.frame(
  2. old = c(&quot;start1&quot;, &quot;start2&quot;, &quot;start3&quot;, &quot;start4&quot;, &quot;inter1&quot;, &quot;inter2&quot;, &quot;inter3&quot;),
  3. new = c(&quot;final1&quot;, &quot;final1&quot;, &quot;inter1&quot;, &quot;inter3&quot;, &quot;inter2&quot;, &quot;final2&quot;, &quot;final3&quot;)
  4. )
  5. old new
  6. start1 final1
  7. start2 final1
  8. start3 inter1
  9. start4 inter3
  10. inter1 inter2
  11. inter2 final2
  12. inter3 final3

I would like to have directly the "final node" for each line.
On the example above it would be :

  1. res &lt;- data.frame(
  2. old= c(&quot;start1&quot;, &quot;start2&quot;, &quot;start3&quot;, &quot;start4&quot;, &quot;inter1&quot;, &quot;inter2&quot;, &quot;inter3&quot;),
  3. new = c(&quot;final1&quot;, &quot;final1&quot;, &quot;final2&quot;, &quot;final3&quot;, &quot;final3&quot;, &quot;final2&quot;, &quot;final3&quot;)
  4. )
  5. old new
  6. start1 final1
  7. start2 final1
  8. start3 final2
  9. start4 final3
  10. inter1 final3
  11. inter2 final2
  12. inter3 final3

I guess something recursive must be done (knowing there can be multiple level) but I can't go through it.

答案1

得分: 3

以下是代码的中文翻译部分:

  1. 你可以在这里使用循环

while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

  1. 在这里,我们迭代,只要有任何 "new" 值在 "old" 列中。我们找到它们,然后使用 `match` 查找每个中间值的 "new" 值。我们循环,直到没有更多要更新。
  2. 如果您使用类似图的数据,`igraph` 库可能会有帮助。在这种情况下,您可以这样使用它

gg <- graph_from_data_frame(x)
plot(gg)

lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

  1. 它通过将输入视为有向图来工作。然后,我们将其分解为不重叠的部分,并对于每个部分,我们找到没有出度连接的节点。
  2. <details>
  3. <summary>英文:</summary>
  4. You could use a loop here

while(length(toupdate <- which(x$new %in% x$old))>0) {
x$new[toupdate] <- x$new[match(x$new[toupdate], x$old)]
}
x

old new

1 start1 final1

2 start2 final1

3 start3 final2

4 start4 final3

5 inter1 final2

6 inter2 final2

7 inter3 final3

  1. Here we iterate while any of the &quot;new&quot; values are in the &quot;old&quot; column. We find them, then use `match` to look up what the &quot;new&quot; value is each each of those intermediate values. We loop until there are no more to update.
  2. And if you are using graph-like data, the `igraph` library can be helpful. Here&#39;s how you might use it in this case

gg <- graph_from_data_frame(x)
plot(gg)

lapply(decompose(gg), function(sg) {
av <- adjacent_vertices(sg, V(sg), "out")
cbind(V(sg)$name[lengths(av)!=0], names(av[lengths(av)==0]))
}) |> do.call("rbind", args=_) |> data.frame()

X1 X2

1 start1 final1

2 start2 final1

3 start3 final2

4 inter1 final2

5 inter2 final2

6 start4 final3

7 inter3 final3

  1. It works by treating the input as a directed graph. We then decompose it into non-overlapping parts and for each part, we find the node that has no out-going connections.
  2. [![enter image description here][1]][1]
  3. [1]: https://i.stack.imgur.com/HYbzm.png
  4. </details>
  5. # 答案2
  6. **得分**: 1
  7. If you would like to use `igraph`, you can try `subcomponent` like below (but this might be not efficient due to rowwise operations, e.g., `sapply`):
  8. 如果你想要使用 `igraph`,你可以尝试以下的 `subcomponent`(但这可能不是高效的,因为涉及到逐行操作,例如 `sapply`):
  9. ```R
  10. g <- graph_from_data_frame(x)
  11. x %>%
  12. mutate(new = sapply(
  13. old,
  14. function(v) tail(names(subcomponent(g, v, "out")), 1)
  15. ))

which gives

这将产生以下结果:

  1. old new
  2. 1 start1 final1
  3. 2 start2 final1
  4. 3 start3 final2
  5. 4 start4 final3
  6. 5 inter1 final2
  7. 6 inter2 final2
  8. 7 inter3 final3

Probably a more efficient way is using membership + left_join:

可能更高效的方法是使用 membership + left_join

  1. x %>%
  2. left_join(
  3. {.} %>%
  4. graph_from_data_frame() %>%
  5. components() %>%
  6. membership() %>%
  7. enframe(),
  8. by = join_by(old == name)
  9. ) %>%
  10. group_by(value) %>%
  11. mutate(new = grep("^final", new, value = TRUE)) %>%
  12. ungroup() %>%
  13. select(-value)

which gives

这将产生以下结果:

  1. # A tibble: 7 × 2
  2. old new
  3. <chr> <chr>
  4. 1 start1 final1
  5. 2 start2 final1
  6. 3 start3 final2
  7. 4 start4 final3
  8. 5 inter1 final2
  9. 6 inter2 final2
  10. 7 inter3 final3
英文:

If you would like to use igraph, you can try subcomponent like below (but this might be not inefficient due to rowwise operations, e.g., sapply)

  1. g &lt;- graph_from_data_frame(x)
  2. x %&gt;%
  3. mutate(new = sapply(
  4. old,
  5. function(v) tail(names(subcomponent(g, v, &quot;out&quot;)), 1)
  6. ))

which gives

  1. old new
  2. 1 start1 final1
  3. 2 start2 final1
  4. 3 start3 final2
  5. 4 start4 final3
  6. 5 inter1 final2
  7. 6 inter2 final2
  8. 7 inter3 final3

Probably a more efficient way is using membership + left_join

  1. x %&gt;%
  2. left_join(
  3. {.} %&gt;%
  4. graph_from_data_frame() %&gt;%
  5. components() %&gt;%
  6. membership() %&gt;%
  7. enframe(),
  8. by = join_by(old == name)
  9. ) %&gt;%
  10. group_by(value) %&gt;%
  11. mutate(new = grep(&quot;^final&quot;, new, value = TRUE)) %&gt;%
  12. ungroup() %&gt;%
  13. select(-value)

which gives

  1. # A tibble: 7 &#215; 2
  2. old new
  3. &lt;chr&gt; &lt;chr&gt;
  4. 1 start1 final1
  5. 2 start2 final1
  6. 3 start3 final2
  7. 4 start4 final3
  8. 5 inter1 final2
  9. 6 inter2 final2
  10. 7 inter3 final3

huangapple
  • 本文由 发表于 2023年4月13日 22:41:43
  • 转载请务必保留本文链接:https://go.coder-hub.com/76006775.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定