英文:
How can I code to search across multiple columns within a group - R
问题
我有癌症数据,每个患者有1-4个测量值。一些测量进行了细胞学检查,其他一些进行了病理学检查,有些两者都有。
库(dplyr)
库(鹈鹕)
数据<-简表(
~record_number, ~tool, ~cytology, ~pathology,
114, "forceps", "Indeterminate", NA,
114, "needle", "Non-Malignant", "Malignant",
114, "lavage", NA, "Indeterminate",
115, "forceps", NA, "Non-Malignant",
115, "needle", NA, "Malignant"
)
我想创建一个恶性变量(0/1),如果对于给定主题(record_number)的任何样本(行),在任何列(cytology、pathology)中出现"Malignant"。
任何想法都受到赞赏!
期望的<-简表(
~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
114, "forceps", "Indeterminate", NA, 1,
114, "needle", "Non-Malignant", "Malignant", 1,
114, "lavage", NA, "Indeterminate", 1,
115, "forceps", NA, "Non-Malignant", 1,
115, "needle", NA, "Malignant", 1,
)
我想它将以group_by(record_number)开始...但接下来呢?
期望的<-数据 %>%
group_by(record_number) %>%
...?
英文:
I have cancer data and each patient had 1-4 measurements. Some measurements had cytology done, others had pathology done, some had both.
library(dplyr)
library(tibble)
data<-tribble(
~record_number, ~tool, ~cytology, ~pathology,
114, "forceps", "Indeterminate", NA,
114, "needle", "Non-Malignant", "Malignant",
114, "lavage", NA, "Indeterminate",
115, "forceps", NA, "Non-Malignant",
115, "needle", NA, "Malignant"
)
I'd like to create a Malignancy variable (0/1) if "Malignant" occurs for any of the samples (rows) for a given subject (record_number), in either of the columns (cytology, pathology).
Any ideas are appreciated!
desired<-tribble(
~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
114, "forceps", "Indeterminate", NA, 1,
114, "needle", "Non-Malignant", "Malignant", 1,
114, "lavage", NA, "Indeterminate", 1,
115, "forceps", NA, "Non-Malignant", 1,
115, "needle", NA, "Malignant", 1,
)
I'm thinking it will start with group_by(record_number)...but then what?
desired<-data %>%
group_by(record_number) %>%
...?
答案1
得分: 2
你可以这样做:
library(tidyverse)
data %>%
mutate(Malignant = as.numeric(any(c(cytology, pathology) == 'Malignant', na.rm = TRUE)), .by = record_number)
# 一个 tibble: 5 x 5
record_number tool cytology pathology Malignant
<dbl> <chr> <chr> <chr> <dbl>
1 114 forceps Indeterminate <NA> 1
2 114 needle Non-Malignant Malignant 1
3 114 lavage <NA> Indeterminate 1
4 115 forceps <NA> Non-Malignant 1
5 115 needle <NA> Malignant 1
英文:
You could do:
library(tidyverse)
data %>%
mutate(Malignant = as.numeric(any(c(cytology, pathology) == 'Malignant', na.rm = TRUE)), .by = record_number)
# A tibble: 5 x 5
record_number tool cytology pathology Malignant
<dbl> <chr> <chr> <chr> <dbl>
1 114 forceps Indeterminate <NA> 1
2 114 needle Non-Malignant Malignant 1
3 114 lavage <NA> Indeterminate 1
4 115 forceps <NA> Non-Malignant 1
5 115 needle <NA> Malignant 1
答案2
得分: 1
我们可以使用 ifelse
和 any
进行条件判断:
library(dplyr) #> 1.1.0
data %>%
mutate(Malignant = ifelse(any(cytology == "Malignant" | pathology == "Malignant", na.rm = TRUE), 1, 0), .by=record_number)
record_number tool cytology pathology Malignant
<dbl> <chr> <chr> <chr> <dbl>
1 114 forceps Indeterminate NA 1
2 114 needle Non-Malignant Malignant 1
3 114 lavage NA Indeterminate 1
4 115 forceps NA Non-Malignant 1
5 115 needle NA Malignant 1
英文:
We can ifelse
with any
:
library(dplyr) #> 1.1.0
data %>%
mutate(Malignant = ifelse(any(cytology == "Malignant" | pathology == "Malignant", na.rm = TRUE), 1, 0), .by=record_number)
record_number tool cytology pathology Malignant
<dbl> <chr> <chr> <chr> <dbl>
1 114 forceps Indeterminate NA 1
2 114 needle Non-Malignant Malignant 1
3 114 lavage NA Indeterminate 1
4 115 forceps NA Non-Malignant 1
5 115 needle NA Malignant 1
答案3
得分: 1
你也可以使用%in%
运算符,如果存在NA值,它不会返回NA
:
data |>
mutate(malignancy = 1*("Malignant" %in% c(cytology, pathology)), .by = record_number)
输出
record_number tool cytology pathology malignancy
1 114 forceps Indeterminate NA 1
2 114 needle Non-Malignant Malignant 1
3 114 lavage NA Indeterminate 1
4 115 forceps NA Non-Malignant 1
5 115 needle NA Malignant 1
英文:
You can use the %in%
operator for this too which will not return NA
if there are NAs:
data |>
mutate(malignancy = 1*("Malignant" %in% c(cytology, pathology)), .by = record_number)
Output
record_number tool cytology pathology malignancy
1 114 forceps Indeterminate NA 1
2 114 needle Non-Malignant Malignant 1
3 114 lavage NA Indeterminate 1
4 115 forceps NA Non-Malignant 1
5 115 needle NA Malignant 1
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论