英文:
Add column in R that is dependent on several variables
问题
以下是您提供的代码的翻译:
这是我之前问题的后续[这里][1]。
我有一个如下的数据框:
ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A')
EXP_P <- c(62,62,62,62,62,62,62,64,64,64,67,67,67,67,67,67)
BRR <- c(61,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
CURR_PR <- c(1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3)
df <- data.frame(ID, EXP_P, BRR, val1, CURR_PR)
输出:
ID EXP_P BRR val1 CURR_PR
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 65 2 2
9 A 64 71 2 2
10 A 64 53 2 2
11 A 67 51 3 2
12 A 67 50 3 3
13 A 67 58 3 3
14 A 67 54 3 3
15 A 67 55 3 3
16 A 67 57 3 3
现在我已经添加了新的列`BPP`,它依赖于我的初始问题中指定的规则,感谢那个回答:
df %>%
group_by(ID, val1) %>%
mutate(occ = row_number()) %>%
group_by(ID) %>%
mutate(
BPP = case_when(
occ == 1 ~ EXP_P,
occ == 2 ~ lag(BRR),
occ == 3 ~ pmax(lag(BRR, 1), lag(BRR, 2), na.rm = TRUE),
TRUE ~ pmax(lag(BRR, 1), lag(BRR, 2), lag(BRR, 3), na.rm = TRUE)
)
) %>%
ungroup()
# # A tibble: 16 × 6
# ID EXP_P BRR val1 occ BPP
# <chr> <dbl> <dbl> <dbl> <int> <dbl>
# 1 A 62 61 1 1 62
# 2 A 62 57 1 2 61
# 3 A 62 66 1 3 61
# 4 A 62 53 1 4 66
# 5 A 62 54 1 5 66
# 6 A 62 50 1 6 66
# 7 A 62 55 1 7 54
# 8 A 64 65 2 1 64
# 9 A 64 71 2 2 65
# 10 A 64 53 2 3 71
# 11 A 67 51 3 1 67
# 12 A 67 50 3 2 51
# 13 A 67 58 3 3 51
# 14 A 67 54 3 4 58
# 15 A 67 55 3 5 58
# 16 A 67 57 3 6 58
现在我有另一个变量`(CURR_PR)`,当这个变量改变时,我需要`BRR`的值与之前`BRR`中的值相同。
例如,当`CURR_PR`为`1`时,`BRR`的第一个值为`61`。当`CURR_PR`变为`2`时,我希望`BRR`的值与`CURR_PR`为`1`时的第一个`BRR`值相同`(61)`。
当`CURR_PR`为`2`时,更新后的第一个`BRR`值将是`61`。当`CURR_PR`变为`3`时,我希望`BRR`的值与`CURR_PR`为`2`时的第一个`BRR`值相同`(61)`。
请注意,这个值将通过上面的`mutate`和`case_when`计算更新,因此它不会始终为`61`。
期望的输出:
ID EXP_P BRR val1 CURR_PR
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 61 2 2
9 A 64
<details>
<summary>英文:</summary>
This is a follow up to my question [here][1].
I have a dataframe like so:
ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A' )
EXP_P <- c(62,62,62,62,62,62,62,64,64,64,67,67,67,67,67,67)
BRR <- c(61,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
CURR_PR <- c(1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3)
df <- data.frame(ID, EXP_P, BRR, val1, CURR_PR)
Output:
ID EXP_P BRR val1 CURR_PR
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 65 2 2
9 A 64 71 2 2
10 A 64 53 2 2
11 A 67 51 3 2
12 A 67 50 3 3
13 A 67 58 3 3
14 A 67 54 3 3
15 A 67 55 3 3
16 A 67 57 3 3
I have now added the new column `BPP` that is dependent on each of the other columns using the rules specified in my initial question thanks to that response:
df %>%
group_by(ID, val1) %>%
mutate(occ = row_number()) %>%
group_by(ID) %>%
mutate(
BPP = case_when(
occ == 1 ~ EXP_P,
occ == 2 ~ lag(BRR),
occ == 3 ~ pmax(lag(BRR, 1), lag(BRR, 2), na.rm = TRUE),
TRUE ~ pmax(lag(BRR, 1), lag(BRR, 2), lag(BRR, 3), na.rm = TRUE)
)
) %>%
ungroup()
# # A tibble: 16 × 6
# ID EXP_P BRR val1 occ BPP
# <chr> <dbl> <dbl> <dbl> <int> <dbl>
# 1 A 62 61 1 1 62
# 2 A 62 57 1 2 61
# 3 A 62 66 1 3 61
# 4 A 62 53 1 4 66
# 5 A 62 54 1 5 66
# 6 A 62 50 1 6 66
# 7 A 62 55 1 7 54
# 8 A 64 65 2 1 64
# 9 A 64 71 2 2 65
# 10 A 64 53 2 3 71
# 11 A 67 51 3 1 67
# 12 A 67 50 3 2 51
# 13 A 67 58 3 3 51
# 14 A 67 54 3 4 58
# 15 A 67 55 3 5 58
# 16 A 67 57 3 6 58
Now I have another variable `(CURR_PR)` and when this changes, I need the value in `BRR` to be the same as it was under the previous value in `BRR`.
For example, when `CURR_PR` is `1`, the first value for `BRR` is `61`. When `CURR_PR` changes to `2`, I want `BRR` to be the same value as it was for the first `BRR` value when `CURR_PR` is `1` `(61)`.
When `CURR_PR` is `2`, the first value for `BRR` after the update will be `61`. When `CURR_PR` changes to `3`, I want `BRR` to be the same value as it was for the first `BRR` value when `CURR_PR` is `2` `(61)`.
Note that this value will update through the calcs shown above in the `mutate` and `case_when` values, so it won't always be `61`.
Desired output:
ID EXP_P BRR val1 CURR_PR
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 61 2 2
9 A 64 71 2 2
10 A 64 53 2 2
11 A 67 51 3 2
12 A 67 61 3 3
13 A 67 58 3 3
14 A 67 54 3 3
15 A 67 55 3 3
16 A 67 57 3 3
[1]: https://stackoverflow.com/questions/74621754/add-column-in-r-that-is-dependent-on-multiple-variables
</details>
# 答案1
**得分**: 3
每次`CURR_PR`更改时,`BRR`基本上会接收`BRR`的第一个值,其中`CURR_PR`为1。如果是这样的话:
### 代码
df %>%
group_by(CURR_PR) %>%
mutate(aux = row_number()) %>%
ungroup() %>%
mutate(BRR = if_else(aux == 1 & CURR_PR > 1, BRR[aux == 1 & CURR_PR == 1],BRR))
### 输出
# 一个tibble: 16 x 6
ID EXP_P BRR val1 CURR_PR aux
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 A 62 61 1 1 1
2 A 62 57 1 1 2
3 A 62 66 1 1 3
4 A 62 53 1 1 4
5 A 62 54 1 1 5
6 A 62 50 1 1 6
7 A 62 55 1 1 7
8 A 64 61 2 2 1
9 A 64 71 2 2 2
10 A 64 53 2 2 3
11 A 67 51 3 2 4
12 A 67 61 3 3 1
13 A 67 58 3 3 2
14 A 67 54 3 3 3
15 A 67 55 3 3 4
16 A 67 57 3 3 5
<details>
<summary>英文:</summary>
I was a little confused, basically every time that `CURR_PR` change, `BRR` will receive the first value of `BRR` where `CURR_PR` was 1? If so:
### Code
df %>%
group_by(CURR_PR) %>%
mutate(aux = row_number()) %>%
ungroup() %>%
mutate(BRR = if_else(aux == 1 & CURR_PR > 1, BRR[aux == 1 & CURR_PR == 1],BRR))
### Output
# A tibble: 16 x 6
ID EXP_P BRR val1 CURR_PR aux
<chr> <dbl> <dbl> <dbl> <dbl> <int>
1 A 62 61 1 1 1
2 A 62 57 1 1 2
3 A 62 66 1 1 3
4 A 62 53 1 1 4
5 A 62 54 1 1 5
6 A 62 50 1 1 6
7 A 62 55 1 1 7
8 A 64 61 2 2 1
9 A 64 71 2 2 2
10 A 64 53 2 2 3
11 A 67 51 3 2 4
12 A 67 61 3 3 1
13 A 67 58 3 3 2
14 A 67 54 3 3 3
15 A 67 55 3 3 4
16 A 67 57 3 3 5
</details>
# 答案2
**得分**: 1
We may group by 'ID' and use `replace`
```R
library(dplyr)
df %>%
group_by(ID) %>%
mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR))) %>%
ungroup
-output
# A tibble: 16 × 5
ID EXP_P BRR val1 CURR_PR
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 61 2 2
9 A 64 71 2 2
10 A 64 53 2 2
11 A 67 51 3 2
12 A 67 61 3 3
13 A 67 58 3 3
14 A 67 54 3 3
15 A 67 55 3 3
16 A 67 57 3 3
If we are using the devel
version of dplyr
, we can use .by
in mutate
df %>%
mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR)), .by = "ID")
-output
ID EXP_P BRR val1 CURR_PR
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 61 2 2
9 A 64 71 2 2
10 A 64 53 2 2
11 A 67 51 3 2
12 A 67 61 3 3
13 A 67 58 3 3
14 A 67 54 3 3
15 A 67 55 3 3
16 A 67 57 3 3
英文:
We may group by 'ID' and use replace
library(dplyr)
df %>%
group_by(ID) %>%
mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR))) %>%
ungroup
-output
# A tibble: 16 × 5
ID EXP_P BRR val1 CURR_PR
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 61 2 2
9 A 64 71 2 2
10 A 64 53 2 2
11 A 67 51 3 2
12 A 67 61 3 3
13 A 67 58 3 3
14 A 67 54 3 3
15 A 67 55 3 3
16 A 67 57 3 3
If we are using the devel
version of dplyr
, can use .by
in mutate
df %>%
mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR)), .by = "ID")
-output
ID EXP_P BRR val1 CURR_PR
1 A 62 61 1 1
2 A 62 57 1 1
3 A 62 66 1 1
4 A 62 53 1 1
5 A 62 54 1 1
6 A 62 50 1 1
7 A 62 55 1 1
8 A 64 61 2 2
9 A 64 71 2 2
10 A 64 53 2 2
11 A 67 51 3 2
12 A 67 61 3 3
13 A 67 58 3 3
14 A 67 54 3 3
15 A 67 55 3 3
16 A 67 57 3 3
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论