总结并保留原始数值。

huangapple go评论77阅读模式
英文:

Summarising and keeping the original values

问题

我有以下的 tibble 并且想要用“children”的总和替换NA值(例如,“c”的值等于“d”和“e”的总和。然后,“a”的值是“b”和“c”的总和。所以问题是如何总结并同时保留原始值。

mydata <- tibble(id = c("a", "b", "c", "d", "e"), 
                 value = c(NA, 1, NA, 2, 3),
                 parent = c(NA, "a", "a", "c", "c"), 
                 level = c(1,2,2,3,3))

最终结果应该是

      id    value parent level
      <chr>  <dbl> <chr>  <dbl>
    1 a         6 NA         1
    2 b         1 a          2
    3 c         5 a          2
    4 d         2 c          3
    5 e         3 c          3

我尝试过几种方法,但唯一有效的方法比较冗长和笨拙。我觉得在tidyverse中应该有一个简单的解决方案。有什么想法吗?(使用循环吗?原始问题有5个级别)。

祝好运!

Renger

英文:

I have the following tibble and want to replace the NA values with the sum of the "children" (e.g. the value for "c" is equal to the sum of "d" and "e". The value for "a" is then the sum of "b" and "c". So the problem is how to summarise and keep at the same time the original values.

mydata &lt;- tibble(id = c(&quot;a&quot;, &quot;b&quot;, &quot;c&quot;, &quot;d&quot;, &quot;e&quot;), 
                 value = c(NA, 1, NA, 2, 3),
                 parent = c(NA, &quot;a&quot;, &quot;a&quot;, &quot;c&quot;, &quot;c&quot;), 
                 level = c(1,2,2,3,3))

# A tibble: 5 x 4
  id    value parent level
  &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt;  &lt;dbl&gt;
1 a        NA NA         1
2 b         1 a          2
3 c        NA a          2
4 d         2 c          3
5 e         3 c          3

Final result should be

  id    value parent level
  &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt;  &lt;dbl&gt;
1 a         6 NA         1
2 b         1 a          2
3 c         5 a          2
4 d         2 c          3
5 e         3 c          3

I have tried several approaches but the only one that work is lengthy and rather clumsy. I have the feeling there should be an easy solution in tidyverse. Any ideas ? (in a loop? the original problem has 5 levels).

Cheers

Renger

答案1

得分: 3

你可以使用while()循环来迭代计算value

library(dplyr)

mydata %>%
  mutate(value = {
    while(anyNA(value)) {
      sub_id <- id[is.na(value)]
      ind <- parent %in% sub_id
      value[is.na(value)] <- tapply(value[ind], parent[ind], sum)[sub_id]
    }
    value
  })

# # A tibble: 5 × 4
#   id    value parent level
#   <chr> <dbl> <chr>  <dbl>
# 1 a         6 NA         1
# 2 b         1 a          2
# 3 c         5 a          2
# 4 d         2 c          3
# 5 e         3 c          3
英文:

You can use a while() loop to calculate value iteratively.

library(dplyr)

mydata %&gt;%
  mutate(value = {
    while(anyNA(value)) {
      sub_id &lt;- id[is.na(value)]
      ind &lt;- parent %in% sub_id
      value[is.na(value)] &lt;- tapply(value[ind], parent[ind], sum)[sub_id]
    }
    value
  })

# # A tibble: 5 &#215; 4
#   id    value parent level
#   &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt;  &lt;dbl&gt;
# 1 a         6 NA         1
# 2 b         1 a          2
# 3 c         5 a          2
# 4 d         2 c          3
# 5 e         3 c          3

答案2

得分: 1

谢谢,这就完成了任务。
以下是另一个例子,它使用了Maël的技巧:

loop_level <- max(mydata$level) - 1
mydata0 <- mydata
for (i in 1:loop_level) {
  v <- with(mydata, tapply(value, parent, sum, na.rm = TRUE)) 
  mydata0$value <- dplyr::coalesce(mydata0$value, v[match(mydata0$id, names(v))])
}
英文:

Thanks, that does the job.
Here is another example, that works using the trick by Maël

loop_level &lt;- max(mydata$level) - 1
mydata0 &lt;- mydata
for (i in 1:loop_level) {
  v &lt;- with(mydata, tapply(value, parent, sum, na.rm = TRUE)) 
  mydata0$value &lt;- dplyr::coalesce(mydata0$value, v[match(mydata0$id, names(v))])
}

答案3

得分: 1

mydata %>% mutate(sum=sum(value), .by=level) %>% fill(sum, .direction = 'up') %>%
mutate(value=ifelse(parent!='' & is.na(value), sum, value),
       sum2=sum(value), .by=level) %>% fill(sum2, .direction = 'up') %>%
  mutate(value=ifelse(is.na(parent) & is.na(value), sum2, value)) %>%
  select(-c(sum,sum2))

Created on 2023-07-24 with reprex v2.0.2

# A tibble: 5 × 4
  id    value parent level
1 a         6 NA         1
2 b         1 a          2
3 c         5 a          2
4 d         2 c          3
5 e         3 c          3

<details>
<summary>英文:</summary>

Please also try 


``` r
mydata %&gt;% mutate(sum=sum(value), .by=level) %&gt;% fill(sum, .direction = &#39;up&#39;) %&gt;% 
mutate(value=ifelse(parent!=&#39;&#39; &amp; is.na(value), sum, value),
       sum2=sum(value), .by=level) %&gt;% fill(sum2, .direction = &#39;up&#39;) %&gt;% 
  mutate(value=ifelse(is.na(parent) &amp; is.na(value), sum2, value)) %&gt;% select(-c(sum,sum2))

<sup>Created on 2023-07-24 with reprex v2.0.2</sup>

# A tibble: 5 &#215; 4
  id    value parent level
  &lt;chr&gt; &lt;dbl&gt; &lt;chr&gt;  &lt;dbl&gt;
1 a         6 &lt;NA&gt;       1
2 b         1 a          2
3 c         5 a          2
4 d         2 c          3
5 e         3 c          3

huangapple
  • 本文由 发表于 2023年7月24日 19:12:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/76753894.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定