2023年2月18日 00:40:34go评论86阅读模式

英文:

Create new variable based on outcome of other variable in group - R

问题

这是一个类似的/后续问题，链接在这里 https://stackoverflow.com/questions/75475934/r-how-to-code-new-variable-based-on-grouped-variable-and-conditioned-on-earlier，但不同之处在于在捐赠者之间可能存在两个匹配运行。

我有一个包含器官捐赠者信息的数据文件。我正在研究被捐赠的肺部 - 每个捐赠者有两个肺。

如果肺部被分割成左右两侧并用于捐赠，每个肺部都会尝试与接受者进行匹配（"matchrun"）。它们会通过合格的接受者，直到找到匹配的接受者（"sequence"）。

如果肺部匹配到接受者，它就会被移交给接受者（"organ_placed"）。

如果肺部没有匹配，它会继续在序列中，然后在最大的序列号处保持NA。

我想创建一个新的变量，用于表示匹配运行的结果，以便如果一个肺部被放置而另一个没有被放置，它会告诉您该肺部被丢弃。即，查看数据中捐赠者2的情况 - 左肺被放置，但右肺没有匹配。

在捐赠者3中，第一个匹配运行不匹配，但另一侧的匹配运行匹配。

我认为它可能类似于使用group_by(donorid, matchrun)，但然后如何基于匹配运行制定条件呢？

library(tribble)
library(dplyr)
data <- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed,
  2, 3, 1, NA,
  2, 3, 2, NA,
  2, 3, 3, "L",
  2, 4, 1, NA,
  2, 4, 2, NA,
  2, 4, 3, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 6, 1, NA,
  3, 6, 2, NA,
  3, 6, 3, "L"
)
desired_outcome <- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed, ~organ,
  2, 3, 1, NA, NA, 
  2, 3, 2, NA, NA, 
  2, 3, 3, "L", "Left Single",
  2, 4, 1, NA, NA,
  2, 4, 2, NA, NA, 
  2, 4, 3, NA, "Right Discarded",
  3, 5, 1, NA, NA,
  3, 5, 1, NA, NA,
  3, 5, 1, NA, "Right Discarded",
  3, 6, 1, NA, NA,
  3, 6, 2, NA, NA,
  3, 6, 3, "L", "Left Single"
)

英文:

This a similar/followup question to this <https://stackoverflow.com/questions/75475934/r-how-to-code-new-variable-based-on-grouped-variable-and-conditioned-on-earlier> but it is different because within donors there are potentially two match runs.

I have a data file with organ donors. I'm looking at lungs that are donated - there are two lungs.

If the lungs are split (L and R) and put up for donation, they are each attempted to match with recipients ("matchrun"). They go through eligible recipients until one matches ("sequence").

If the lung is matched to a recipient, it goes to them ("organ_placed").

If the lung doesn't match, it continues in the sequence and then just remains NA at the maximum sequence number.

I would like to create a new variable that has the outcome of the match run such that if one lung is placed and the other is not, it tells you that the lung was discarded. i.e. see case of Donor 2 in the data - the left lung is placed, but the right doesn't match.

In donor 3, the first match run doesn't match but the match run for the other lung does.

I figure it will be something like group_by(donorid, matchrun) but then how do you make a condition based on the match run?

library(tribble)
library(dplyr)
data &lt;- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed,
    2, 3, 1, NA,
  2, 3, 2, NA,
  2, 3, 3, &quot;L&quot;,
  2, 4, 1, NA,
  2, 4, 2, NA,
  2, 4, 3, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 5, 1, NA,
  3, 6, 1, NA,
  3, 6, 2, NA,
  3, 6, 3, &quot;L&quot;
)
desired_outcome &lt;- tribble(
  ~donorid, ~matchrun, ~sequence, ~organ_placed, ~organ,
  2, 3, 1, NA, NA, 
  2, 3, 2, NA, NA, 
  2, 3, 3, &quot;L&quot;, &quot;Left Single&quot;,
  2, 4, 1, NA, NA,
  2, 4, 2, NA, NA, 
  2, 4, 3, NA, &quot;Right Discarded&quot;,
  3, 5, 1, NA, NA,
  3, 5, 1, NA, NA,
  3, 5, 1, NA, &quot;Right Discarded&quot;,
  3, 6, 1, NA, NA,
  3, 6, 2, NA, NA,
  3, 6, 3, &quot;L&quot;, &quot;Left Single&quot;)

答案1

得分: 1

您可以尝试以下代码：

data %>%
  group_by(donorid) %>%
  mutate(temp = ifelse(n_distinct(organ_placed, na.rm = TRUE) == 1, unique(na.omit(organ_placed)), "B")) %>%
  group_by(matchrun, .add = TRUE) %>%
  mutate(organ = case_when(organ_placed == "L" ~ "Left Single",
                           organ_placed == "R" ~ "Right Single",
                           all(is.na(organ_placed)) & row_number() == max(sequence) & temp == "L" ~ "Right Discarded", 
                           all(is.na(organ_placed)) & row_number() == max(sequence) & temp == "R" ~ "Left Discarded")) %>%
  ungroup()

输出：

   donorid matchrun sequence organ_placed temp  organ       
 1       1        1        1 NA           B     NA          
 2       1        1        2 NA           B     NA          
 3       1        1        3 L            B     Left Single 
 4       1        2        1 NA           B     NA          
 5       1        2        2 NA           B     NA          
 6       1        2        3 R            B     Right Single
 7       2        3        1 NA           L     NA          
 8       2        3        2 NA           L     NA          
 9       2        3        3 L            L     Left Single 
10       2        4        1 NA           L     NA          
11       2        4        2 NA           L     NA          
12       2        4        3 NA           L     Right Discarded

英文:

You can try this:

data %&gt;% 
  group_by(donorid) %&gt;% 
  mutate(temp = ifelse(n_distinct(organ_placed, na.rm = TRUE) == 1, unique(na.omit(organ_placed)), &quot;B&quot;)) %&gt;% 
  group_by(matchrun, .add = TRUE) %&gt;% 
  mutate(organ = case_when(organ_placed == &quot;L&quot; ~ &quot;Left Single&quot;,
                           organ_placed == &quot;R&quot; ~ &quot;Right Single&quot;,
                           all(is.na(organ_placed)) &amp; row_number() == max(sequence) &amp; temp == &quot;L&quot; ~ &quot;Right Discarded&quot;, 
                           all(is.na(organ_placed)) &amp; row_number() == max(sequence) &amp; temp == &quot;R&quot; ~ &quot;Left Discarded&quot;)) %&gt;%
  ungroup()

output

   donorid matchrun sequence organ_placed temp  organ       
 1       1        1        1 NA           B     NA          
 2       1        1        2 NA           B     NA          
 3       1        1        3 L            B     Left Single 
 4       1        2        1 NA           B     NA          
 5       1        2        2 NA           B     NA          
 6       1        2        3 R            B     Right Single
 7       2        3        1 NA           L     NA          
 8       2        3        2 NA           L     NA          
 9       2        3        3 L            L     Left Single 
10       2        4        1 NA           L     NA          
11       2        4        2 NA           L     NA          
12       2        4        3 NA           L     Right Discarded

答案2

得分: 1

&gt; data
    donorid matchrun sequence organ_placed           organ
 1:       2        3        1         &lt;NA&gt;            &lt;NA&gt;
 2:       2        3        2         &lt;NA&gt;            &lt;NA&gt;
 3:       2        3        3            L     Left Single
 4:       2        4        1         &lt;NA&gt;            &lt;NA&gt;
 5:       2        4        2         &lt;NA&gt;            &lt;NA&gt;
 6:       2        4        3         &lt;NA&gt; Right Discarded
 7:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 8:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 9:       3        5        1         &lt;NA&gt; Right Discarded
10:       3        6        1         &lt;NA&gt;            &lt;NA&gt;
11:       3        6        2         &lt;NA&gt;            &lt;NA&gt;
12:       3        6        3            L     Left Single

英文:

We can use

library(data.table)
library(stringr)
setDT(data)[, seq2 := rowid(donorid, matchrun) ]
data[, organ := str_replace_all(organ_placed,
   setNames(c(&quot;Left Single&quot;, &quot;Right Single&quot;), c(&quot;L&quot;, &quot;R&quot;)))]
 data[seq2 == max(seq2), 
  organ := fcase(!is.na(organ), organ, default = 
  str_replace_all(setdiff(c(&quot;Left Single&quot;, &quot;Right Single&quot;), organ), 
   setNames(c(&quot;Left Discarded&quot;, &quot;Right Discarded&quot;),
   c(&quot;Left Single&quot;, &quot;Right Single&quot;)))), donorid
  ][, seq2 := NULL][]

-output

&gt; data
    donorid matchrun sequence organ_placed           organ
 1:       2        3        1         &lt;NA&gt;            &lt;NA&gt;
 2:       2        3        2         &lt;NA&gt;            &lt;NA&gt;
 3:       2        3        3            L     Left Single
 4:       2        4        1         &lt;NA&gt;            &lt;NA&gt;
 5:       2        4        2         &lt;NA&gt;            &lt;NA&gt;
 6:       2        4        3         &lt;NA&gt; Right Discarded
 7:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 8:       3        5        1         &lt;NA&gt;            &lt;NA&gt;
 9:       3        5        1         &lt;NA&gt; Right Discarded
10:       3        6        1         &lt;NA&gt;            &lt;NA&gt;
11:       3        6        2         &lt;NA&gt;            &lt;NA&gt;
12:       3        6        3            L     Left Single

答案3

得分: 1

更新：我们需要将matchrun添加到分组中。删除之前的解决方案：

data %>%
  group_by(donorid, matchrun) %>%
  mutate(outcome = case_when(organ_placed == "L" ~ "左侧单侧",
                             organ_placed == "R" ~ "右侧单侧",
                             organ_placed == "B" ~ "双侧",
                             (is.na(organ_placed) &
                                row_number() == max(row_number())) &
                               "L" %in% organ_placed ~ "右侧被丢弃",
                             (is.na(organ_placed) &
                                row_number() == max(row_number())) &
                               "R" %in% organ_placed ~ "左侧被丢弃",
                             TRUE ~ NA_character_))

分组：   donorid, matchrun [4]
   donorid matchrun sequence organ_placed outcome    
     <dbl>    <dbl>    <dbl> <chr>        <chr>      
 1       2        3        1 NA           NA         
 2       2        3        2 NA           NA         
 3       2        3        3 L            左侧单侧
 4       2        4        1 NA           NA         
 5       2        4        2 NA           NA         
 6       2        4        3 NA           NA         
 7       3        5        1 NA           NA         
 8       3        5        1 NA           NA         
 9       3        5        1 NA           NA         
10       3        6        1 NA           NA         
11       3        6        2 NA           NA         
12       3        6        3 L            左侧单侧

请注意，原文中的R代码和数据保持不变，只进行了部分中文翻译。

英文:

Update: we have to add matchrun to the group. Removed prior solution:

data %&gt;% 
  group_by(donorid, matchrun) %&gt;% 
  mutate(outcome = case_when(organ_placed == &quot;L&quot; ~ &quot;Left Single&quot;,
                             organ_placed == &quot;R&quot; ~ &quot;Right Single&quot;,
                             organ_placed == &quot;B&quot; ~ &quot;Bilateral&quot;,
                             (is.na(organ_placed) &amp; 
                                row_number() == max(row_number())) &amp; 
                               &quot;L&quot; %in% organ_placed ~ &quot;Right Discarded&quot;,
                             (is.na(organ_placed) &amp; 
                                row_number() == max(row_number())) &amp; 
                               &quot;R&quot; %in% organ_placed ~ &quot;Left Discarded&quot;,
                             TRUE ~ NA_character_))

Groups:   donorid, matchrun [4]
   donorid matchrun sequence organ_placed outcome    
     &lt;dbl&gt;    &lt;dbl&gt;    &lt;dbl&gt; &lt;chr&gt;        &lt;chr&gt;      
 1       2        3        1 NA           NA         
 2       2        3        2 NA           NA         
 3       2        3        3 L            Left Single
 4       2        4        1 NA           NA         
 5       2        4        2 NA           NA         
 6       2        4        3 NA           NA         
 7       3        5        1 NA           NA         
 8       3        5        1 NA           NA         
 9       3        5        1 NA           NA         
10       3        6        1 NA           NA         
11       3        6        2 NA           NA         
12       3        6        3 L            Left Single

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

创建新变量，基于组中其他变量的结果 – R

问题

答案1

答案2

答案3

使用RStudio中的deSolve包中的dede来解决带有时间延迟的ODE。

使用ggplot显示y最大值的位置。

是不是可以使用stargazer将回归统计数据分成多列？

在基本的R中，反转绘制的GAM中的X和Y轴：

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论