英文:
Summarising and keeping the original values
问题
我有以下的 tibble 并且想要用“children”的总和替换NA
值(例如,“c”的值等于“d”和“e”的总和。然后,“a”的值是“b”和“c”的总和。所以问题是如何总结并同时保留原始值。
mydata <- tibble(id = c("a", "b", "c", "d", "e"),
value = c(NA, 1, NA, 2, 3),
parent = c(NA, "a", "a", "c", "c"),
level = c(1,2,2,3,3))
最终结果应该是
id value parent level
<chr> <dbl> <chr> <dbl>
1 a 6 NA 1
2 b 1 a 2
3 c 5 a 2
4 d 2 c 3
5 e 3 c 3
我尝试过几种方法,但唯一有效的方法比较冗长和笨拙。我觉得在tidyverse
中应该有一个简单的解决方案。有什么想法吗?(使用循环吗?原始问题有5个级别)。
祝好运!
Renger
英文:
I have the following tibble and want to replace the NA
values with the sum of the "children" (e.g. the value for "c" is equal to the sum of "d" and "e". The value for "a" is then the sum of "b" and "c". So the problem is how to summarise and keep at the same time the original values.
mydata <- tibble(id = c("a", "b", "c", "d", "e"),
value = c(NA, 1, NA, 2, 3),
parent = c(NA, "a", "a", "c", "c"),
level = c(1,2,2,3,3))
# A tibble: 5 x 4
id value parent level
<chr> <dbl> <chr> <dbl>
1 a NA NA 1
2 b 1 a 2
3 c NA a 2
4 d 2 c 3
5 e 3 c 3
Final result should be
id value parent level
<chr> <dbl> <chr> <dbl>
1 a 6 NA 1
2 b 1 a 2
3 c 5 a 2
4 d 2 c 3
5 e 3 c 3
I have tried several approaches but the only one that work is lengthy and rather clumsy. I have the feeling there should be an easy solution in tidyverse
. Any ideas ? (in a loop? the original problem has 5 levels).
Cheers
Renger
答案1
得分: 3
你可以使用while()
循环来迭代计算value
。
library(dplyr)
mydata %>%
mutate(value = {
while(anyNA(value)) {
sub_id <- id[is.na(value)]
ind <- parent %in% sub_id
value[is.na(value)] <- tapply(value[ind], parent[ind], sum)[sub_id]
}
value
})
# # A tibble: 5 × 4
# id value parent level
# <chr> <dbl> <chr> <dbl>
# 1 a 6 NA 1
# 2 b 1 a 2
# 3 c 5 a 2
# 4 d 2 c 3
# 5 e 3 c 3
英文:
You can use a while()
loop to calculate value
iteratively.
library(dplyr)
mydata %>%
mutate(value = {
while(anyNA(value)) {
sub_id <- id[is.na(value)]
ind <- parent %in% sub_id
value[is.na(value)] <- tapply(value[ind], parent[ind], sum)[sub_id]
}
value
})
# # A tibble: 5 × 4
# id value parent level
# <chr> <dbl> <chr> <dbl>
# 1 a 6 NA 1
# 2 b 1 a 2
# 3 c 5 a 2
# 4 d 2 c 3
# 5 e 3 c 3
答案2
得分: 1
谢谢,这就完成了任务。
以下是另一个例子,它使用了Maël的技巧:
loop_level <- max(mydata$level) - 1
mydata0 <- mydata
for (i in 1:loop_level) {
v <- with(mydata, tapply(value, parent, sum, na.rm = TRUE))
mydata0$value <- dplyr::coalesce(mydata0$value, v[match(mydata0$id, names(v))])
}
英文:
Thanks, that does the job.
Here is another example, that works using the trick by Maël
loop_level <- max(mydata$level) - 1
mydata0 <- mydata
for (i in 1:loop_level) {
v <- with(mydata, tapply(value, parent, sum, na.rm = TRUE))
mydata0$value <- dplyr::coalesce(mydata0$value, v[match(mydata0$id, names(v))])
}
答案3
得分: 1
mydata %>% mutate(sum=sum(value), .by=level) %>% fill(sum, .direction = 'up') %>%
mutate(value=ifelse(parent!='' & is.na(value), sum, value),
sum2=sum(value), .by=level) %>% fill(sum2, .direction = 'up') %>%
mutate(value=ifelse(is.na(parent) & is.na(value), sum2, value)) %>%
select(-c(sum,sum2))
Created on 2023-07-24 with reprex v2.0.2
# A tibble: 5 × 4
id value parent level
1 a 6 NA 1
2 b 1 a 2
3 c 5 a 2
4 d 2 c 3
5 e 3 c 3
<details>
<summary>英文:</summary>
Please also try
``` r
mydata %>% mutate(sum=sum(value), .by=level) %>% fill(sum, .direction = 'up') %>%
mutate(value=ifelse(parent!='' & is.na(value), sum, value),
sum2=sum(value), .by=level) %>% fill(sum2, .direction = 'up') %>%
mutate(value=ifelse(is.na(parent) & is.na(value), sum2, value)) %>% select(-c(sum,sum2))
<sup>Created on 2023-07-24 with reprex v2.0.2</sup>
# A tibble: 5 × 4
id value parent level
<chr> <dbl> <chr> <dbl>
1 a 6 <NA> 1
2 b 1 a 2
3 c 5 a 2
4 d 2 c 3
5 e 3 c 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论