英文:
Adding values by row and have them apply to a single row variable while preserving other variables and rows
问题
这是你要的代码部分的翻译:
我有一个看起来像这样的数据框,
df <- data.frame(num1 = c('a','b','c','d')
,num2 = c(1,2,3,4)
,num3 = c(5,6,7,8)
,num4 = c('x','y','b','d'))
我想要的输出是从
num1 num2 num3 num4
a 1 5 x
b 2 6 y
c 3 7 b
d 4 8 d
到
num1 num2 num3 num4
a 10 26 x
b 2 6 y
c 3 7 b
d 4 8 d
这里是一个可以实现这一结果的示例
df <- data.frame(num1 = c('a', 'b', 'c', 'd'),
num2 = c(1, 2, 3, 4),
num3 = c(5, 6, 7, 8),
num4 = c('x', 'y', 'b', 'd'))
sum_summarised <- df %>%
filter(grepl('a|b|c', num1)) %>%
summarise(num2 = sum(num2), num3 = sum(num3))
df <- df %>%
mutate(num2 = if_else(num1 == 'a', sum_summarised$num2, num2))
df <- df %>%
mutate(num3 = if_else(num1 == 'a', sum_summarised$num3, num3))
基本上是对num2/num3列求和,并将求和结果应用于a行,同时保留变量b、c和d以及num4列的原始行值。
最好使用dplyr - 我尝试过各种`group_by`、`slice`和`filter`的变种组合,但都无济于事。对于我面临的这个独特问题,任何帮助都将不胜感激。谢谢!
<details>
<summary>英文:</summary>
I have a dataframe that looks like,
df <- data.frame(num1 = c('a','b','c','d')
,num2 = c(1,2,3,4)
,num3 = c(5,6,7,8)
,num4 = c('x','y','b','d'))
And would like the out put to go from
num1 num2 num3 num4
a 1 5 x
b 2 6 y
c 3 7 b
d 4 8 d
To
num1 num2 num3 num4
a 10 26 x
b 2 6 y
c 3 7 b
d 4 8 d
Here is a sample that achieves the result in my own solution
df <- data.frame(num1 = c('a', 'b', 'c', 'd'),
num2 = c(1, 2, 3, 4),
num3 = c(5, 6, 7, 8),
num4 = c('x', 'y', 'b', 'd'))
sum_summarised <- df %>%
filter(grepl('a|b|c',num1)) %>%
summarise(num2 = sum(num2), num3 = sum(num3))
df <- df %>%
mutate(num2 = if_else(num1 == 'a',sum_summarised$num2,num2))
df <- df %>%
mutate(num3 = if_else(num1 == 'a',sum_summarised$num3,num3))
Essentially summing num2/num3 columns and applying the sum to row a variable while preserving original row values for variables b,c, and d and num 4 column values.
Preference would be to use dplyr - I have tried variants of `group_by` and `slice` and `filter` combinations to no avail. Any help would be greatly appreciated in this unique problem I have faced. Thank you!
</details>
# 答案1
**得分**: 2
由于您偏好使用 `dplyr`,我们可以使用 `across` 和 `if_else`:
```R
library(dplyr)
df |>
mutate(across(num2:num3, ~ if_else(num1 == "a", sum(.), .)))
输出:
num1 num2 num3 num4
1 a 10 26 x
2 b 2 6 y
3 c 3 7 b
4 d 4 8 d
更新 如果我只想对 a、b 和 c 求和,而不是全部,而且只应用于 a 呢?:
library(dplyr)
df |>
mutate(across(num2:num3, ~ if_else(num1 == "a", sum(.[num1 %in% c("a", "b", "c")]), .)))
输出:
num1 num2 num3 num4
1 a 6 18 x
2 b 2 6 y
3 c 3 7 b
4 d 4 8 d
英文:
Since you have a preference for dplyr
, we could use across
and if_else
:
library(dplyr)
df |>
mutate(across(num2:num3, ~ if_else(num1 == "a", sum(.), .)))
Output:
num1 num2 num3 num4
1 a 10 26 x
2 b 2 6 y
3 c 3 7 b
4 d 4 8 d
Update if i wanted to to just sum a, b, and c? not all? and apply to a?:
library(dplyr)
df |>
mutate(across(num2:num3, ~ if_else(num1 == "a", sum(.[num1 %in% c("a", "b", "c")]), .)))
Output:
num1 num2 num3 num4
1 a 6 18 x
2 b 2 6 y
3 c 3 7 b
4 d 4 8 d
答案2
得分: 2
另一种方法是使用 rows_update()
。这种方法稍微冗长一些,但如果我们想要构建更复杂的操作,我会说它是值得的。下面我们调用 rows_update()
,在其中我们使用 summarise()
首先定义我们想要按 id 列进行连接的列,然后使用 across()
更新要更新的列,其他列将保持不变。
library(dplyr)
df %>%
rows_update(
df %>%
summarise(num1 = "a",
across(num2:num3, sum)),
by = "num1")
我们还可以在我们的 tibble
内执行更复杂的操作,例如,如果我们不想对 c
列进行求和,我们可以使用 filter()
:
df %>%
rows_update(
df %>%
filter(num1 != "c") %>%
summarise(num1 = "a",
across(num2:num3, sum)),
by = "num1")
OP 的数据:
df <- data.frame(num1 = c('a','b','c','d'),
num2 = c(1,2,3,4),
num3 = c(5,6,7,8),
num4 = c('x','y','b','d'))
2023-03-03 由 reprex package 创建
英文:
Another approach is to use rows_update()
. This is a bit more verbose, but I'd say it pays of if we want to construct more complex operations. Below we call rows_update()
and inside we use summarise()
first defining the id column we want to join by and then the columns we want to update with `across(), everything else will be untouched.
library(dplyr)
df %>%
rows_update(
df %>%
summarise(num1 = "a",
across(num2:num3, sum)),
by = "num1")
#> num1 num2 num3 num4
#> 1 a 10 26 x
#> 2 b 2 6 y
#> 3 c 3 7 b
#> 4 d 4 8 d
We can also perform more complex operations inside our tibble
for example if we don't want to sum-up c
we can filter()
:
df %>%
rows_update(
df %>%
filter(num1 != "c") %>%
summarise(num1 = "a",
across(num2:num3, sum)),
by = "num1")
#> num1 num2 num3 num4
#> 1 a 7 19 x
#> 2 b 2 6 y
#> 3 c 3 7 b
#> 4 d 4 8 d
Data from OP
df <- data.frame(num1 = c('a','b','c','d')
,num2 = c(1,2,3,4)
,num3 = c(5,6,7,8)
,num4 = c('x','y','b','d'))
<sup>Created on 2023-03-03 by the reprex package (v2.0.1)</sup>
答案3
得分: 0
以下是翻译好的代码部分:
# 可能的一种方法是使用'dplyr'包中的`transmute()`函数:
library(tidyverse)
df %>%
transmute(num1,
num2 = c(sum(num2), num2[-1]),
num3 = c(sum(num3), num3[-1]),
num4)
请注意,这是代码的翻译部分,没有其他内容。
英文:
One way could be using transmute()
from 'dplyr' package:
library(tidyverse)
df %>%
transmute(num1,
num2 = c(sum(num2), num2[-1]),
num3 = c(sum(num3), num3[-1]),
num4)
num1 num2 num3 num4
1 a 10 26 x
2 b 2 6 y
3 c 3 7 b
4 d 4 8 d
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论