我如何编写代码在一个分组内搜索多个列 – R

huangapple go评论96阅读模式
英文:

How can I code to search across multiple columns within a group - R

问题

我有癌症数据,每个患者有1-4个测量值。一些测量进行了细胞学检查,其他一些进行了病理学检查,有些两者都有。

  1. 库(dplyr
  2. 库(鹈鹕)
  3. 数据<-简表(
  4. ~record_number, ~tool, ~cytology, ~pathology,
  5. 114, "forceps", "Indeterminate", NA,
  6. 114, "needle", "Non-Malignant", "Malignant",
  7. 114, "lavage", NA, "Indeterminate",
  8. 115, "forceps", NA, "Non-Malignant",
  9. 115, "needle", NA, "Malignant"
  10. )

我想创建一个恶性变量(0/1),如果对于给定主题(record_number)的任何样本(行),在任何列(cytology、pathology)中出现"Malignant"。

任何想法都受到赞赏!

  1. 期望的<-简表(
  2. ~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
  3. 114, "forceps", "Indeterminate", NA, 1,
  4. 114, "needle", "Non-Malignant", "Malignant", 1,
  5. 114, "lavage", NA, "Indeterminate", 1,
  6. 115, "forceps", NA, "Non-Malignant", 1,
  7. 115, "needle", NA, "Malignant", 1,
  8. )

我想它将以group_by(record_number)开始...但接下来呢?

  1. 期望的<-数据 %>%
  2. group_by(record_number) %>%
  3. ...?
英文:

I have cancer data and each patient had 1-4 measurements. Some measurements had cytology done, others had pathology done, some had both.

  1. library(dplyr)
  2. library(tibble)
  3. data<-tribble(
  4. ~record_number, ~tool, ~cytology, ~pathology,
  5. 114, "forceps", "Indeterminate", NA,
  6. 114, "needle", "Non-Malignant", "Malignant",
  7. 114, "lavage", NA, "Indeterminate",
  8. 115, "forceps", NA, "Non-Malignant",
  9. 115, "needle", NA, "Malignant"
  10. )

I'd like to create a Malignancy variable (0/1) if "Malignant" occurs for any of the samples (rows) for a given subject (record_number), in either of the columns (cytology, pathology).

Any ideas are appreciated!

  1. desired<-tribble(
  2. ~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
  3. 114, "forceps", "Indeterminate", NA, 1,
  4. 114, "needle", "Non-Malignant", "Malignant", 1,
  5. 114, "lavage", NA, "Indeterminate", 1,
  6. 115, "forceps", NA, "Non-Malignant", 1,
  7. 115, "needle", NA, "Malignant", 1,
  8. )

I'm thinking it will start with group_by(record_number)...but then what?

  1. desired<-data %>%
  2. group_by(record_number) %>%
  3. ...?

答案1

得分: 2

你可以这样做:

  1. library(tidyverse)
  2. data %>%
  3. mutate(Malignant = as.numeric(any(c(cytology, pathology) == 'Malignant', na.rm = TRUE)), .by = record_number)
  4. # 一个 tibble: 5 x 5
  5. record_number tool cytology pathology Malignant
  6. <dbl> <chr> <chr> <chr> <dbl>
  7. 1 114 forceps Indeterminate <NA> 1
  8. 2 114 needle Non-Malignant Malignant 1
  9. 3 114 lavage <NA> Indeterminate 1
  10. 4 115 forceps <NA> Non-Malignant 1
  11. 5 115 needle <NA> Malignant 1
英文:

You could do:

  1. library(tidyverse)
  2. data %&gt;%
  3. mutate(Malignant = as.numeric(any(c(cytology, pathology) == &#39;Malignant&#39;, na.rm = TRUE)), .by = record_number)
  4. # A tibble: 5 x 5
  5. record_number tool cytology pathology Malignant
  6. &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
  7. 1 114 forceps Indeterminate &lt;NA&gt; 1
  8. 2 114 needle Non-Malignant Malignant 1
  9. 3 114 lavage &lt;NA&gt; Indeterminate 1
  10. 4 115 forceps &lt;NA&gt; Non-Malignant 1
  11. 5 115 needle &lt;NA&gt; Malignant 1

答案2

得分: 1

我们可以使用 ifelseany 进行条件判断:

  1. library(dplyr) #&gt; 1.1.0
  2. data %&gt;%
  3. mutate(Malignant = ifelse(any(cytology == &quot;Malignant&quot; | pathology == &quot;Malignant&quot;, na.rm = TRUE), 1, 0), .by=record_number)
  4. record_number tool cytology pathology Malignant
  5. &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
  6. 1 114 forceps Indeterminate NA 1
  7. 2 114 needle Non-Malignant Malignant 1
  8. 3 114 lavage NA Indeterminate 1
  9. 4 115 forceps NA Non-Malignant 1
  10. 5 115 needle NA Malignant 1
英文:

We can ifelse with any:

  1. library(dplyr) #&gt; 1.1.0
  2. data %&gt;%
  3. mutate(Malignant = ifelse(any(cytology == &quot;Malignant&quot; | pathology == &quot;Malignant&quot;, na.rm = TRUE), 1, 0), .by=record_number)
  4. record_number tool cytology pathology Malignant
  5. &lt;dbl&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;dbl&gt;
  6. 1 114 forceps Indeterminate NA 1
  7. 2 114 needle Non-Malignant Malignant 1
  8. 3 114 lavage NA Indeterminate 1
  9. 4 115 forceps NA Non-Malignant 1
  10. 5 115 needle NA Malignant 1

答案3

得分: 1

你也可以使用%in%运算符,如果存在NA值,它不会返回NA

  1. data |&gt;
  2. mutate(malignancy = 1*("Malignant" %in% c(cytology, pathology)), .by = record_number)

输出

  1. record_number tool cytology pathology malignancy
  2. 1 114 forceps Indeterminate NA 1
  3. 2 114 needle Non-Malignant Malignant 1
  4. 3 114 lavage NA Indeterminate 1
  5. 4 115 forceps NA Non-Malignant 1
  6. 5 115 needle NA Malignant 1
英文:

You can use the %in% operator for this too which will not return NA if there are NAs:

  1. data |&gt;
  2. mutate(malignancy = 1*(&quot;Malignant&quot; %in% c(cytology, pathology)), .by = record_number)

Output

  1. record_number tool cytology pathology malignancy
  2. 1 114 forceps Indeterminate NA 1
  3. 2 114 needle Non-Malignant Malignant 1
  4. 3 114 lavage NA Indeterminate 1
  5. 4 115 forceps NA Non-Malignant 1
  6. 5 115 needle NA Malignant 1

huangapple
  • 本文由 发表于 2023年5月24日 22:27:44
  • 转载请务必保留本文链接:https://go.coder-hub.com/76324621.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定