在R中添加一个依赖于多个变量的列。

huangapple go评论59阅读模式
英文:

Add column in R that is dependent on several variables

问题

以下是您提供的代码的翻译:

这是我之前问题的后续[这里][1]。

我有一个如下的数据框:

    ID <- c('A','A','A','A','A','A','A','A','A','A','A','A','A','A','A','A')
    EXP_P <- c(62,62,62,62,62,62,62,64,64,64,67,67,67,67,67,67)
    BRR <- c(61,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
    val1 <- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
    CURR_PR <- c(1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3)
    
    
    df <- data.frame(ID, EXP_P, BRR, val1, CURR_PR)

输出:

       ID EXP_P BRR val1 CURR_PR
    1   A    62  61    1       1
    2   A    62  57    1       1
    3   A    62  66    1       1
    4   A    62  53    1       1
    5   A    62  54    1       1
    6   A    62  50    1       1
    7   A    62  55    1       1
    8   A    64  65    2       2
    9   A    64  71    2       2
    10  A    64  53    2       2
    11  A    67  51    3       2
    12  A    67  50    3       3
    13  A    67  58    3       3
    14  A    67  54    3       3
    15  A    67  55    3       3
    16  A    67  57    3       3

现在我已经添加了新的列`BPP`,它依赖于我的初始问题中指定的规则,感谢那个回答:

        df %>%
          group_by(ID, val1) %>%
          mutate(occ = row_number()) %>%
          group_by(ID) %>%
          mutate(
            BPP = case_when(
              occ == 1 ~ EXP_P,
              occ == 2 ~ lag(BRR),
              occ == 3 ~ pmax(lag(BRR, 1), lag(BRR, 2), na.rm = TRUE),
              TRUE ~ pmax(lag(BRR, 1), lag(BRR, 2), lag(BRR, 3), na.rm = TRUE)
            )
          ) %>%
          ungroup()
    # # A tibble: 16 × 6
    #    ID    EXP_P   BRR  val1   occ   BPP
    #    <chr> <dbl> <dbl> <dbl> <int> <dbl>
    #  1 A        62    61     1     1    62
    #  2 A        62    57     1     2    61
    #  3 A        62    66     1     3    61
    #  4 A        62    53     1     4    66
    #  5 A        62    54     1     5    66
    #  6 A        62    50     1     6    66
    #  7 A        62    55     1     7    54
    #  8 A        64    65     2     1    64
    #  9 A        64    71     2     2    65
    # 10 A        64    53     2     3    71
    # 11 A        67    51     3     1    67
    # 12 A        67    50     3     2    51
    # 13 A        67    58     3     3    51
    # 14 A        67    54     3     4    58
    # 15 A        67    55     3     5    58
    # 16 A        67    57     3     6    58

现在我有另一个变量`(CURR_PR)`,当这个变量改变时,我需要`BRR`的值与之前`BRR`中的值相同。

例如,当`CURR_PR`为`1`时,`BRR`的第一个值为`61`。当`CURR_PR`变为`2`时,我希望`BRR`的值与`CURR_PR`为`1`时的第一个`BRR`值相同`(61)`。

当`CURR_PR`为`2`时,更新后的第一个`BRR`值将是`61`。当`CURR_PR`变为`3`时,我希望`BRR`的值与`CURR_PR`为`2`时的第一个`BRR`值相同`(61)`。

请注意,这个值将通过上面的`mutate`和`case_when`计算更新,因此它不会始终为`61`。

期望的输出:

       ID EXP_P BRR val1 CURR_PR
    1   A    62  61    1       1
    2   A    62  57    1       1
    3   A    62  66    1       1
    4   A    62  53    1       1
    5   A    62  54    1       1
    6   A    62  50    1       1
    7   A    62  55    1       1
    8   A    64  61    2       2
    9   A    64  

<details>
<summary>英文:</summary>

This is a follow up to my question [here][1].

I have a dataframe like so:

    ID &lt;- c(&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39;,&#39;A&#39; )
    EXP_P &lt;- c(62,62,62,62,62,62,62,64,64,64,67,67,67,67,67,67)
    BRR &lt;- c(61,57,66,53,54,50,55,65,71,53,51,50,58,54,55,57)
    val1 &lt;- c(1,1,1,1,1,1,1,2,2,2,3,3,3,3,3,3)
    CURR_PR &lt;- c(1,1,1,1,1,1,1,2,2,2,2,3,3,3,3,3)
    
    
    df &lt;- data.frame(ID, EXP_P, BRR, val1, CURR_PR)

Output:

       ID EXP_P BRR val1 CURR_PR
    1   A    62  61    1       1
    2   A    62  57    1       1
    3   A    62  66    1       1
    4   A    62  53    1       1
    5   A    62  54    1       1
    6   A    62  50    1       1
    7   A    62  55    1       1
    8   A    64  65    2       2
    9   A    64  71    2       2
    10  A    64  53    2       2
    11  A    67  51    3       2
    12  A    67  50    3       3
    13  A    67  58    3       3
    14  A    67  54    3       3
    15  A    67  55    3       3
    16  A    67  57    3       3

I have now added the new column `BPP` that is dependent on each of the other columns using the rules specified in my initial question thanks to that response:

        df %&gt;%
          group_by(ID, val1) %&gt;%
          mutate(occ = row_number()) %&gt;%
          group_by(ID) %&gt;%
          mutate(
            BPP = case_when(
              occ == 1 ~ EXP_P,
              occ == 2 ~ lag(BRR),
              occ == 3 ~ pmax(lag(BRR, 1), lag(BRR, 2), na.rm = TRUE),
              TRUE ~ pmax(lag(BRR, 1), lag(BRR, 2), lag(BRR, 3), na.rm = TRUE)
            )
          ) %&gt;%
          ungroup()
    # # A tibble: 16 &#215; 6
    #    ID    EXP_P   BRR  val1   occ   BPP
    #    &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;int&gt; &lt;dbl&gt;
    #  1 A        62    61     1     1    62
    #  2 A        62    57     1     2    61
    #  3 A        62    66     1     3    61
    #  4 A        62    53     1     4    66
    #  5 A        62    54     1     5    66
    #  6 A        62    50     1     6    66
    #  7 A        62    55     1     7    54
    #  8 A        64    65     2     1    64
    #  9 A        64    71     2     2    65
    # 10 A        64    53     2     3    71
    # 11 A        67    51     3     1    67
    # 12 A        67    50     3     2    51
    # 13 A        67    58     3     3    51
    # 14 A        67    54     3     4    58
    # 15 A        67    55     3     5    58
    # 16 A        67    57     3     6    58

Now I have another variable `(CURR_PR)` and when this changes, I need the value in `BRR` to be the same as it was under the previous value in `BRR`.

For example, when `CURR_PR` is `1`, the first value for `BRR` is `61`. When `CURR_PR` changes to `2`, I want `BRR` to be the same value as it was for the first `BRR` value when `CURR_PR` is `1` `(61)`. 

When `CURR_PR` is `2`, the first value for `BRR` after the update will be `61`. When `CURR_PR` changes to `3`, I want `BRR` to be the same value as it was for the first `BRR` value when `CURR_PR` is `2` `(61)`.

Note that this value will update through the calcs shown above in the `mutate` and `case_when` values, so it won&#39;t always be `61`. 

Desired output:

       ID EXP_P BRR val1 CURR_PR
    1   A    62  61    1       1
    2   A    62  57    1       1
    3   A    62  66    1       1
    4   A    62  53    1       1
    5   A    62  54    1       1
    6   A    62  50    1       1
    7   A    62  55    1       1
    8   A    64  61    2       2
    9   A    64  71    2       2
    10  A    64  53    2       2
    11  A    67  51    3       2
    12  A    67  61    3       3
    13  A    67  58    3       3
    14  A    67  54    3       3
    15  A    67  55    3       3
    16  A    67  57    3       3

  [1]: https://stackoverflow.com/questions/74621754/add-column-in-r-that-is-dependent-on-multiple-variables

</details>


# 答案1
**得分**: 3

每次`CURR_PR`更改时,`BRR`基本上会接收`BRR`的第一个值,其中`CURR_PR`为1。如果是这样的话:

### 代码

    df %>%
      group_by(CURR_PR) %>%
      mutate(aux = row_number()) %>%
      ungroup() %>%
      mutate(BRR = if_else(aux == 1 & CURR_PR > 1, BRR[aux == 1 & CURR_PR == 1],BRR))

### 输出

    # 一个tibble: 16 x 6
       ID    EXP_P   BRR  val1 CURR_PR   aux
       <chr> <dbl> <dbl> <dbl>   <dbl> <int>
     1 A        62    61     1       1     1
     2 A        62    57     1       1     2
     3 A        62    66     1       1     3
     4 A        62    53     1       1     4
     5 A        62    54     1       1     5
     6 A        62    50     1       1     6
     7 A        62    55     1       1     7
     8 A        64    61     2       2     1
     9 A        64    71     2       2     2
    10 A        64    53     2       2     3
    11 A        67    51     3       2     4
    12 A        67    61     3       3     1
    13 A        67    58     3       3     2
    14 A        67    54     3       3     3
    15 A        67    55     3       3     4
    16 A        67    57     3       3     5

<details>
<summary>英文:</summary>

I was a little confused, basically every time that `CURR_PR` change, `BRR` will receive the first value of `BRR` where `CURR_PR` was 1? If so:

### Code

    df %&gt;% 
      group_by(CURR_PR) %&gt;% 
      mutate(aux = row_number()) %&gt;% 
      ungroup() %&gt;% 
      mutate(BRR = if_else(aux == 1 &amp; CURR_PR &gt; 1, BRR[aux == 1 &amp; CURR_PR == 1],BRR))

### Output

    # A tibble: 16 x 6
       ID    EXP_P   BRR  val1 CURR_PR   aux
       &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt; &lt;int&gt;
     1 A        62    61     1       1     1
     2 A        62    57     1       1     2
     3 A        62    66     1       1     3
     4 A        62    53     1       1     4
     5 A        62    54     1       1     5
     6 A        62    50     1       1     6
     7 A        62    55     1       1     7
     8 A        64    61     2       2     1
     9 A        64    71     2       2     2
    10 A        64    53     2       2     3
    11 A        67    51     3       2     4
    12 A        67    61     3       3     1
    13 A        67    58     3       3     2
    14 A        67    54     3       3     3
    15 A        67    55     3       3     4
    16 A        67    57     3       3     5

</details>



# 答案2
**得分**: 1

We may group by 'ID' and use `replace`

```R
library(dplyr)
df %>%
   group_by(ID) %>%
   mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR))) %>%
   ungroup

-output

# A tibble: 16 × 5
   ID    EXP_P   BRR  val1 CURR_PR
   <chr> <dbl> <dbl> <dbl>   <dbl>
 1 A        62    61     1       1
 2 A        62    57     1       1
 3 A        62    66     1       1
 4 A        62    53     1       1
 5 A        62    54     1       1
 6 A        62    50     1       1
 7 A        62    55     1       1
 8 A        64    61     2       2
 9 A        64    71     2       2
10 A        64    53     2       2
11 A        67    51     3       2
12 A        67    61     3       3
13 A        67    58     3       3
14 A        67    54     3       3
15 A        67    55     3       3
16 A        67    57     3       3

If we are using the devel version of dplyr, we can use .by in mutate

 df %>%
   mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR)), .by = "ID")

-output

   ID EXP_P BRR val1 CURR_PR
1   A    62  61    1       1
2   A    62  57    1       1
3   A    62  66    1       1
4   A    62  53    1       1
5   A    62  54    1       1
6   A    62  50    1       1
7   A    62  55    1       1
8   A    64  61    2       2
9   A    64  71    2       2
10  A    64  53    2       2
11  A    67  51    3       2
12  A    67  61    3       3
13  A    67  58    3       3
14  A    67  54    3       3
15  A    67  55    3       3
16  A    67  57    3       3
英文:

We may group by 'ID' and use replace

library(dplyr)
df %&gt;%
group_by(ID) %&gt;% 
mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR))) %&gt;% 
ungroup

-output

# A tibble: 16 &#215; 5
ID    EXP_P   BRR  val1 CURR_PR
&lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt; &lt;dbl&gt;   &lt;dbl&gt;
1 A        62    61     1       1
2 A        62    57     1       1
3 A        62    66     1       1
4 A        62    53     1       1
5 A        62    54     1       1
6 A        62    50     1       1
7 A        62    55     1       1
8 A        64    61     2       2
9 A        64    71     2       2
10 A        64    53     2       2
11 A        67    51     3       2
12 A        67    61     3       3
13 A        67    58     3       3
14 A        67    54     3       3
15 A        67    55     3       3
16 A        67    57     3       3

If we are using the devel version of dplyr, can use .by in mutate

 df %&gt;%
mutate(BRR = replace(BRR, !duplicated(CURR_PR), first(BRR)), .by = &quot;ID&quot;)

-output

   ID EXP_P BRR val1 CURR_PR
1   A    62  61    1       1
2   A    62  57    1       1
3   A    62  66    1       1
4   A    62  53    1       1
5   A    62  54    1       1
6   A    62  50    1       1
7   A    62  55    1       1
8   A    64  61    2       2
9   A    64  71    2       2
10  A    64  53    2       2
11  A    67  51    3       2
12  A    67  61    3       3
13  A    67  58    3       3
14  A    67  54    3       3
15  A    67  55    3       3
16  A    67  57    3       3

huangapple
  • 本文由 发表于 2023年1月6日 10:22:38
  • 转载请务必保留本文链接:https://go.coder-hub.com/75026327.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定