我如何编写代码在一个分组内搜索多个列 – R

huangapple go评论63阅读模式
英文:

How can I code to search across multiple columns within a group - R

问题

我有癌症数据,每个患者有1-4个测量值。一些测量进行了细胞学检查,其他一些进行了病理学检查,有些两者都有。

库(dplyr)
库(鹈鹕)

数据<-简表(
  ~record_number, ~tool, ~cytology, ~pathology,
  114, "forceps", "Indeterminate", NA,
  114, "needle", "Non-Malignant", "Malignant",
  114, "lavage", NA, "Indeterminate",
  115, "forceps", NA, "Non-Malignant",
  115, "needle", NA, "Malignant"
)

我想创建一个恶性变量(0/1),如果对于给定主题(record_number)的任何样本(行),在任何列(cytology、pathology)中出现"Malignant"。

任何想法都受到赞赏!

期望的<-简表(
  ~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
  114, "forceps", "Indeterminate", NA, 1,
  114, "needle", "Non-Malignant", "Malignant", 1,
  114, "lavage", NA, "Indeterminate", 1, 
  115, "forceps", NA, "Non-Malignant", 1, 
  115, "needle", NA, "Malignant", 1,
)

我想它将以group_by(record_number)开始...但接下来呢?

期望的<-数据 %>%
  group_by(record_number) %>%
  ...?
英文:

I have cancer data and each patient had 1-4 measurements. Some measurements had cytology done, others had pathology done, some had both.

library(dplyr)
library(tibble)

data<-tribble(
  ~record_number, ~tool, ~cytology, ~pathology,
  114, "forceps", "Indeterminate", NA,
  114, "needle", "Non-Malignant", "Malignant",
  114, "lavage", NA, "Indeterminate",
  115, "forceps", NA, "Non-Malignant",
  115, "needle", NA, "Malignant"
)

I'd like to create a Malignancy variable (0/1) if "Malignant" occurs for any of the samples (rows) for a given subject (record_number), in either of the columns (cytology, pathology).

Any ideas are appreciated!

desired<-tribble(
  ~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
  114, "forceps", "Indeterminate", NA, 1,
  114, "needle", "Non-Malignant", "Malignant", 1,
  114, "lavage", NA, "Indeterminate", 1, 
  115, "forceps", NA, "Non-Malignant", 1, 
  115, "needle", NA, "Malignant", 1,
)

I'm thinking it will start with group_by(record_number)...but then what?

desired<-data %>%
  group_by(record_number) %>%
  ...?

答案1

得分: 2

你可以这样做:

library(tidyverse)
data %>%
  mutate(Malignant = as.numeric(any(c(cytology, pathology) == 'Malignant', na.rm = TRUE)), .by = record_number)
  
# 一个 tibble: 5 x 5
  record_number tool    cytology      pathology     Malignant
          <dbl> <chr>   <chr>         <chr>             <dbl>
1           114 forceps Indeterminate <NA>                  1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  <NA>          Indeterminate         1
4           115 forceps <NA>          Non-Malignant         1
5           115 needle  <NA>          Malignant             1
英文:

You could do:

library(tidyverse)
data %&gt;%
  mutate(Malignant = as.numeric(any(c(cytology, pathology) == &#39;Malignant&#39;, na.rm = TRUE)), .by = record_number)

# A tibble: 5 x 5
  record_number tool    cytology      pathology     Malignant
          &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;         &lt;chr&gt;             &lt;dbl&gt;
1           114 forceps Indeterminate &lt;NA&gt;                  1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  &lt;NA&gt;          Indeterminate         1
4           115 forceps &lt;NA&gt;          Non-Malignant         1
5           115 needle  &lt;NA&gt;          Malignant             1

答案2

得分: 1

我们可以使用 ifelseany 进行条件判断:

library(dplyr) #&gt; 1.1.0

data %&gt;%
  mutate(Malignant = ifelse(any(cytology == &quot;Malignant&quot; | pathology == &quot;Malignant&quot;, na.rm = TRUE), 1, 0), .by=record_number)

  record_number tool    cytology      pathology     Malignant
          &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;         &lt;chr&gt;             &lt;dbl&gt;
1           114 forceps Indeterminate NA                    1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  NA            Indeterminate         1
4           115 forceps NA            Non-Malignant         1
5           115 needle  NA            Malignant             1
英文:

We can ifelse with any:

library(dplyr) #&gt; 1.1.0

data %&gt;%
  mutate(Malignant = ifelse(any(cytology == &quot;Malignant&quot; | pathology == &quot;Malignant&quot;, na.rm = TRUE), 1, 0), .by=record_number)

  record_number tool    cytology      pathology     Malignant
          &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;         &lt;chr&gt;             &lt;dbl&gt;
1           114 forceps Indeterminate NA                    1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  NA            Indeterminate         1
4           115 forceps NA            Non-Malignant         1
5           115 needle  NA            Malignant             1

答案3

得分: 1

你也可以使用%in%运算符,如果存在NA值,它不会返回NA

data |&gt;
  mutate(malignancy = 1*("Malignant" %in% c(cytology, pathology)), .by = record_number) 

输出

  record_number tool    cytology      pathology     malignancy
1           114 forceps Indeterminate NA                     1
2           114 needle  Non-Malignant Malignant              1
3           114 lavage  NA            Indeterminate          1
4           115 forceps NA            Non-Malignant          1
5           115 needle  NA            Malignant              1
英文:

You can use the %in% operator for this too which will not return NA if there are NAs:

data |&gt;
  mutate(malignancy = 1*(&quot;Malignant&quot; %in% c(cytology, pathology)), .by = record_number) 

Output

  record_number tool    cytology      pathology     malignancy
1           114 forceps Indeterminate NA                     1
2           114 needle  Non-Malignant Malignant              1
3           114 lavage  NA            Indeterminate          1
4           115 forceps NA            Non-Malignant          1
5           115 needle  NA            Malignant              1

huangapple
  • 本文由 发表于 2023年5月24日 22:27:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76324621.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定