2023年5月24日 22:27:44go评论96阅读模式

英文:

How can I code to search across multiple columns within a group - R

问题

我有癌症数据，每个患者有1-4个测量值。一些测量进行了细胞学检查，其他一些进行了病理学检查，有些两者都有。

库（dplyr）
库（鹈鹕）
数据&lt;-简表(
  ~record_number, ~tool, ~cytology, ~pathology,
  114, &quot;forceps&quot;, &quot;Indeterminate&quot;, NA,
  114, &quot;needle&quot;, &quot;Non-Malignant&quot;, &quot;Malignant&quot;,
  114, &quot;lavage&quot;, NA, &quot;Indeterminate&quot;,
  115, &quot;forceps&quot;, NA, &quot;Non-Malignant&quot;,
  115, &quot;needle&quot;, NA, &quot;Malignant&quot;
)

我想创建一个恶性变量（0/1），如果对于给定主题（record_number）的任何样本（行），在任何列（cytology、pathology）中出现"Malignant"。

任何想法都受到赞赏！

期望的&lt;-简表(
  ~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
  114, &quot;forceps&quot;, &quot;Indeterminate&quot;, NA, 1,
  114, &quot;needle&quot;, &quot;Non-Malignant&quot;, &quot;Malignant&quot;, 1,
  114, &quot;lavage&quot;, NA, &quot;Indeterminate&quot;, 1, 
  115, &quot;forceps&quot;, NA, &quot;Non-Malignant&quot;, 1, 
  115, &quot;needle&quot;, NA, &quot;Malignant&quot;, 1,
)

我想它将以group_by（record_number）开始...但接下来呢？

期望的&lt;-数据 %&gt;%
  group_by(record_number) %&gt;%
  ...?

英文:

I have cancer data and each patient had 1-4 measurements. Some measurements had cytology done, others had pathology done, some had both.

library(dplyr)
library(tibble)
data&lt;-tribble(
  ~record_number, ~tool, ~cytology, ~pathology,
  114, &quot;forceps&quot;, &quot;Indeterminate&quot;, NA,
  114, &quot;needle&quot;, &quot;Non-Malignant&quot;, &quot;Malignant&quot;,
  114, &quot;lavage&quot;, NA, &quot;Indeterminate&quot;,
  115, &quot;forceps&quot;, NA, &quot;Non-Malignant&quot;,
  115, &quot;needle&quot;, NA, &quot;Malignant&quot;
)

I'd like to create a Malignancy variable (0/1) if "Malignant" occurs for any of the samples (rows) for a given subject (record_number), in either of the columns (cytology, pathology).

Any ideas are appreciated!

desired&lt;-tribble(
  ~record_number, ~tool, ~cytology, ~pathology, ~ Malignant,
  114, &quot;forceps&quot;, &quot;Indeterminate&quot;, NA, 1,
  114, &quot;needle&quot;, &quot;Non-Malignant&quot;, &quot;Malignant&quot;, 1,
  114, &quot;lavage&quot;, NA, &quot;Indeterminate&quot;, 1, 
  115, &quot;forceps&quot;, NA, &quot;Non-Malignant&quot;, 1, 
  115, &quot;needle&quot;, NA, &quot;Malignant&quot;, 1,
)

I'm thinking it will start with group_by(record_number)...but then what?

desired&lt;-data %&gt;%
  group_by(record_number) %&gt;%
  ...?

答案1

得分: 2

你可以这样做：

library(tidyverse)
data %>%
  mutate(Malignant = as.numeric(any(c(cytology, pathology) == 'Malignant', na.rm = TRUE)), .by = record_number)
  
# 一个 tibble: 5 x 5
  record_number tool    cytology      pathology     Malignant
          <dbl> <chr>   <chr>         <chr>             <dbl>
1           114 forceps Indeterminate <NA>                  1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  <NA>          Indeterminate         1
4           115 forceps <NA>          Non-Malignant         1
5           115 needle  <NA>          Malignant             1

英文:

You could do:

library(tidyverse)
data %&gt;%
  mutate(Malignant = as.numeric(any(c(cytology, pathology) == &#39;Malignant&#39;, na.rm = TRUE)), .by = record_number)
# A tibble: 5 x 5
  record_number tool    cytology      pathology     Malignant
          &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;         &lt;chr&gt;             &lt;dbl&gt;
1           114 forceps Indeterminate &lt;NA&gt;                  1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  &lt;NA&gt;          Indeterminate         1
4           115 forceps &lt;NA&gt;          Non-Malignant         1
5           115 needle  &lt;NA&gt;          Malignant             1

答案2

得分: 1

我们可以使用 ifelse 和 any 进行条件判断：

library(dplyr) #&gt; 1.1.0
data %&gt;%
  mutate(Malignant = ifelse(any(cytology == &quot;Malignant&quot; | pathology == &quot;Malignant&quot;, na.rm = TRUE), 1, 0), .by=record_number)
  record_number tool    cytology      pathology     Malignant
          &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;         &lt;chr&gt;             &lt;dbl&gt;
1           114 forceps Indeterminate NA                    1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  NA            Indeterminate         1
4           115 forceps NA            Non-Malignant         1
5           115 needle  NA            Malignant             1

英文:

We can ifelse with any:

library(dplyr) #&gt; 1.1.0
data %&gt;%
  mutate(Malignant = ifelse(any(cytology == &quot;Malignant&quot; | pathology == &quot;Malignant&quot;, na.rm = TRUE), 1, 0), .by=record_number)
  record_number tool    cytology      pathology     Malignant
          &lt;dbl&gt; &lt;chr&gt;   &lt;chr&gt;         &lt;chr&gt;             &lt;dbl&gt;
1           114 forceps Indeterminate NA                    1
2           114 needle  Non-Malignant Malignant             1
3           114 lavage  NA            Indeterminate         1
4           115 forceps NA            Non-Malignant         1
5           115 needle  NA            Malignant             1

答案3

得分: 1

你也可以使用%in%运算符，如果存在NA值，它不会返回NA：

data |&gt;
  mutate(malignancy = 1*("Malignant" %in% c(cytology, pathology)), .by = record_number)

输出

  record_number tool    cytology      pathology     malignancy
1           114 forceps Indeterminate NA                     1
2           114 needle  Non-Malignant Malignant              1
3           114 lavage  NA            Indeterminate          1
4           115 forceps NA            Non-Malignant          1
5           115 needle  NA            Malignant              1

英文:

You can use the %in% operator for this too which will not return NA if there are NAs:

data |&gt;
  mutate(malignancy = 1*(&quot;Malignant&quot; %in% c(cytology, pathology)), .by = record_number)

Output

  record_number tool    cytology      pathology     malignancy
1           114 forceps Indeterminate NA                     1
2           114 needle  Non-Malignant Malignant              1
3           114 lavage  NA            Indeterminate          1
4           115 forceps NA            Non-Malignant          1
5           115 needle  NA            Malignant              1

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

我如何编写代码在一个分组内搜索多个列 – R

问题

答案1

答案2

答案3

密度图应该使用ggplot返回具有多个水平的变量的有序频率吗？

无法更改ggplot的图例标题。

将R中的列字符串放入另一列中

有没有办法避免在这里使用for循环？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。