创建新变量,基于组中其他变量的结果 – R

huangapple go评论64阅读模式
英文:

Create new variable based on outcome of other variable in group - R

问题

这是一个类似的/后续问题,链接在这里 https://stackoverflow.com/questions/75475934/r-how-to-code-new-variable-based-on-grouped-variable-and-conditioned-on-earlier,但不同之处在于在捐赠者之间可能存在两个匹配运行。

我有一个包含器官捐赠者信息的数据文件。我正在研究被捐赠的肺部 - 每个捐赠者有两个肺。

如果肺部被分割成左右两侧并用于捐赠,每个肺部都会尝试与接受者进行匹配("matchrun")。它们会通过合格的接受者,直到找到匹配的接受者("sequence")。

如果肺部匹配到接受者,它就会被移交给接受者("organ_placed")。

如果肺部没有匹配,它会继续在序列中,然后在最大的序列号处保持NA。

我想创建一个新的变量,用于表示匹配运行的结果,以便如果一个肺部被放置而另一个没有被放置,它会告诉您该肺部被丢弃。即,查看数据中捐赠者2的情况 - 左肺被放置,但右肺没有匹配。

在捐赠者3中,第一个匹配运行不匹配,但另一侧的匹配运行匹配。

我认为它可能类似于使用group_by(donorid, matchrun),但然后如何基于匹配运行制定条件呢?

library(tribble)
library(dplyr)

data <- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed,
  2, 3, 1, NA,
  2, 3, 2, NA,
  2, 3, 3, "L",
  2, 4, 1, NA,
  2, 4, 2, NA,
  2, 4, 3, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 6, 1, NA,
  3, 6, 2, NA,
  3, 6, 3, "L"
)

desired_outcome <- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed, ~organ,
  2, 3, 1, NA, NA, 
  2, 3, 2, NA, NA, 
  2, 3, 3, "L", "Left Single",
  2, 4, 1, NA, NA,
  2, 4, 2, NA, NA, 
  2, 4, 3, NA, "Right Discarded",
  3, 5, 1, NA, NA,
  3, 5, 1, NA, NA,
  3, 5, 1, NA, "Right Discarded",
  3, 6, 1, NA, NA,
  3, 6, 2, NA, NA,
  3, 6, 3, "L", "Left Single"
)
英文:

This a similar/followup question to this <https://stackoverflow.com/questions/75475934/r-how-to-code-new-variable-based-on-grouped-variable-and-conditioned-on-earlier> but it is different because within donors there are potentially two match runs.

I have a data file with organ donors. I'm looking at lungs that are donated - there are two lungs.

If the lungs are split (L and R) and put up for donation, they are each attempted to match with recipients ("matchrun"). They go through eligible recipients until one matches ("sequence").

If the lung is matched to a recipient, it goes to them ("organ_placed").

If the lung doesn't match, it continues in the sequence and then just remains NA at the maximum sequence number.

I would like to create a new variable that has the outcome of the match run such that if one lung is placed and the other is not, it tells you that the lung was discarded. i.e. see case of Donor 2 in the data - the left lung is placed, but the right doesn't match.

In donor 3, the first match run doesn't match but the match run for the other lung does.

I figure it will be something like group_by(donorid, matchrun) but then how do you make a condition based on the match run?

library(tribble)
library(dplyr)

data &lt;- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed,
    2, 3, 1, NA,
  2, 3, 2, NA,
  2, 3, 3, &quot;L&quot;,
  2, 4, 1, NA,
  2, 4, 2, NA,
  2, 4, 3, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 6, 1, NA,
  3, 6, 2, NA,
  3, 6, 3, &quot;L&quot;
)

desired_outcome &lt;- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed, ~organ,
  2, 3, 1, NA, NA, 
  2, 3, 2, NA, NA, 
  2, 3, 3, &quot;L&quot;, &quot;Left Single&quot;,
  2, 4, 1, NA, NA,
  2, 4, 2, NA, NA, 
  2, 4, 3, NA, &quot;Right Discarded&quot;,
  3, 5, 1, NA, NA,
  3, 5, 1, NA, NA,
  3, 5, 1, NA, &quot;Right Discarded&quot;,
  3, 6, 1, NA, NA,
  3, 6, 2, NA, NA,
  3, 6, 3, &quot;L&quot;, &quot;Left Single&quot;)

答案1

得分: 1

您可以尝试以下代码:

data %>%
  group_by(donorid) %>%
  mutate(temp = ifelse(n_distinct(organ_placed, na.rm = TRUE) == 1, unique(na.omit(organ_placed)), "B")) %>%
  group_by(matchrun, .add = TRUE) %>%
  mutate(organ = case_when(organ_placed == "L" ~ "Left Single",
                           organ_placed == "R" ~ "Right Single",
                           all(is.na(organ_placed)) & row_number() == max(sequence) & temp == "L" ~ "Right Discarded", 
                           all(is.na(organ_placed)) & row_number() == max(sequence) & temp == "R" ~ "Left Discarded")) %>%
  ungroup()

输出:

   donorid matchrun sequence organ_placed temp  organ       
 1       1        1        1 NA           B     NA          
 2       1        1        2 NA           B     NA          
 3       1        1        3 L            B     Left Single 
 4       1        2        1 NA           B     NA          
 5       1        2        2 NA           B     NA          
 6       1        2        3 R            B     Right Single
 7       2        3        1 NA           L     NA          
 8       2        3        2 NA           L     NA          
 9       2        3        3 L            L     Left Single 
10       2        4        1 NA           L     NA          
11       2        4        2 NA           L     NA          
12       2        4        3 NA           L     Right Discarded
英文:

You can try this:

data %&gt;% 
  group_by(donorid) %&gt;% 
  mutate(temp = ifelse(n_distinct(organ_placed, na.rm = TRUE) == 1, unique(na.omit(organ_placed)), &quot;B&quot;)) %&gt;% 
  group_by(matchrun, .add = TRUE) %&gt;% 
  mutate(organ = case_when(organ_placed == &quot;L&quot; ~ &quot;Left Single&quot;,
                           organ_placed == &quot;R&quot; ~ &quot;Right Single&quot;,
                           all(is.na(organ_placed)) &amp; row_number() == max(sequence) &amp; temp == &quot;L&quot; ~ &quot;Right Discarded&quot;, 
                           all(is.na(organ_placed)) &amp; row_number() == max(sequence) &amp; temp == &quot;R&quot; ~ &quot;Left Discarded&quot;)) %&gt;%
  ungroup()

output

   donorid matchrun sequence organ_placed temp  organ       
 1       1        1        1 NA           B     NA          
 2       1        1        2 NA           B     NA          
 3       1        1        3 L            B     Left Single 
 4       1        2        1 NA           B     NA          
 5       1        2        2 NA           B     NA          
 6       1        2        3 R            B     Right Single
 7       2        3        1 NA           L     NA          
 8       2        3        2 NA           L     NA          
 9       2        3        3 L            L     Left Single 
10       2        4        1 NA           L     NA          
11       2        4        2 NA           L     NA          
12       2        4        3 NA           L     Right Discarded

答案2

得分: 1

&gt; data
    donorid matchrun sequence organ_placed           organ
 1:       2        3        1         &lt;NA&gt;            &lt;NA&gt;
 2:       2        3        2         &lt;NA&gt;            &lt;NA&gt;
 3:       2        3        3            L     Left Single
 4:       2        4        1         &lt;NA&gt;            &lt;NA&gt;
 5:       2        4        2         &lt;NA&gt;            &lt;NA&gt;
 6:       2        4        3         &lt;NA&gt; Right Discarded
 7:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 8:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 9:       3        5        1         &lt;NA&gt; Right Discarded
10:       3        6        1         &lt;NA&gt;            &lt;NA&gt;
11:       3        6        2         &lt;NA&gt;            &lt;NA&gt;
12:       3        6        3            L     Left Single
英文:

We can use

library(data.table)
library(stringr)
setDT(data)[, seq2 := rowid(donorid, matchrun) ]
data[, organ := str_replace_all(organ_placed,
   setNames(c(&quot;Left Single&quot;, &quot;Right Single&quot;), c(&quot;L&quot;, &quot;R&quot;)))]
 data[seq2 == max(seq2), 
  organ := fcase(!is.na(organ), organ, default = 
  str_replace_all(setdiff(c(&quot;Left Single&quot;, &quot;Right Single&quot;), organ), 
   setNames(c(&quot;Left Discarded&quot;, &quot;Right Discarded&quot;),
   c(&quot;Left Single&quot;, &quot;Right Single&quot;)))), donorid
  ][, seq2 := NULL][]

-output

&gt; data
    donorid matchrun sequence organ_placed           organ
 1:       2        3        1         &lt;NA&gt;            &lt;NA&gt;
 2:       2        3        2         &lt;NA&gt;            &lt;NA&gt;
 3:       2        3        3            L     Left Single
 4:       2        4        1         &lt;NA&gt;            &lt;NA&gt;
 5:       2        4        2         &lt;NA&gt;            &lt;NA&gt;
 6:       2        4        3         &lt;NA&gt; Right Discarded
 7:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 8:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 9:       3        5        1         &lt;NA&gt; Right Discarded
10:       3        6        1         &lt;NA&gt;            &lt;NA&gt;
11:       3        6        2         &lt;NA&gt;            &lt;NA&gt;
12:       3        6        3            L     Left Single

答案3

得分: 1

更新:我们需要将matchrun添加到分组中。删除之前的解决方案:

data %>%
  group_by(donorid, matchrun) %>%
  mutate(outcome = case_when(organ_placed == "L" ~ "左侧单侧",
                             organ_placed == "R" ~ "右侧单侧",
                             organ_placed == "B" ~ "双侧",
                             (is.na(organ_placed) &
                                row_number() == max(row_number())) &
                               "L" %in% organ_placed ~ "右侧被丢弃",
                             (is.na(organ_placed) &
                                row_number() == max(row_number())) &
                               "R" %in% organ_placed ~ "左侧被丢弃",
                             TRUE ~ NA_character_))
分组:   donorid, matchrun [4]
   donorid matchrun sequence organ_placed outcome    
     <dbl>    <dbl>    <dbl> <chr>        <chr>      
 1       2        3        1 NA           NA         
 2       2        3        2 NA           NA         
 3       2        3        3 L            左侧单侧
 4       2        4        1 NA           NA         
 5       2        4        2 NA           NA         
 6       2        4        3 NA           NA         
 7       3        5        1 NA           NA         
 8       3        5        1 NA           NA         
 9       3        5        1 NA           NA         
10       3        6        1 NA           NA         
11       3        6        2 NA           NA         
12       3        6        3 L            左侧单侧

请注意,原文中的R代码和数据保持不变,只进行了部分中文翻译。

英文:

Update: we have to add matchrun to the group. Removed prior solution:

data %&gt;% 
  group_by(donorid, matchrun) %&gt;% 
  mutate(outcome = case_when(organ_placed == &quot;L&quot; ~ &quot;Left Single&quot;,
                             organ_placed == &quot;R&quot; ~ &quot;Right Single&quot;,
                             organ_placed == &quot;B&quot; ~ &quot;Bilateral&quot;,
                             (is.na(organ_placed) &amp; 
                                row_number() == max(row_number())) &amp; 
                               &quot;L&quot; %in% organ_placed ~ &quot;Right Discarded&quot;,
                             (is.na(organ_placed) &amp; 
                                row_number() == max(row_number())) &amp; 
                               &quot;R&quot; %in% organ_placed ~ &quot;Left Discarded&quot;,
                             TRUE ~ NA_character_))
Groups:   donorid, matchrun [4]
   donorid matchrun sequence organ_placed outcome    
     &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;        &lt;chr&gt;      
 1       2        3        1 NA           NA         
 2       2        3        2 NA           NA         
 3       2        3        3 L            Left Single
 4       2        4        1 NA           NA         
 5       2        4        2 NA           NA         
 6       2        4        3 NA           NA         
 7       3        5        1 NA           NA         
 8       3        5        1 NA           NA         
 9       3        5        1 NA           NA         
10       3        6        1 NA           NA         
11       3        6        2 NA           NA         
12       3        6        3 L            Left Single

huangapple
  • 本文由 发表于 2023年2月18日 00:40:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75487000.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定