如何在R中筛选具有特定数值的参与者?

huangapple go评论74阅读模式
英文:

how to filter participants that have specific values in R?

问题

这是对[这个问题][1]的跟进。我如何筛选出所有符合我所需子集中所有因素(在这种情况下,学校)的参与者,而不是一个或另一个。

* 编辑 = 子集是到Level 04(从1到4),而不是Level 05

* 我的数据框如下:

```R
quest40_2[1:10,]
# A tibble: 10 x 4
   SCHOOL  Q9    Q11     Q40               
   <glue>  <fct> <fct>   <fct>             
 1 School1 typeB level0  NA                
 2 School1 typeB level01 NA                
 3 School1 typeB level02 NA                
 4 School1 typeB level03 NA                
 5 School1 typeB NA      plan1_level0upto02
 6 School1 typeB NA      plan2_level_03    
 7 School1 typeB NA      plan3_level_04    
 8 School2 typeB level01 NA                
 9 School2 typeB level02 NA                
10 School2 typeB level03 NA     
  • 为了澄清:期望的输出:
A tibble: 12 x 4
   SCHOOL  Q9    Q11     Q40  
   <glue>  <fct> <fct>   <fct>
 1 School2 typeB level01 NA   
 2 School2 typeB level02 NA   
 3 School2 typeB level03 NA   
 4 School2 typeB level04 NA   
 5 School3 typeB level01 NA   
 6 School3 typeB level02 NA   
 7 School3 typeB level03 NA   
 8 School3 typeB level04 NA   
 9 School5 typeC level01 NA   
10 School5 typeC level02 NA   
11 School5 typeC level03 NA   
12 School5 typeC level04 NA 
  • 问题 我想要子集所有具有在Q11级别中 level01, level02 level03 level04SCHOOLs(例如操作符 &&)。因此,我不能有提供其中一个,但不是全部的学校。有任何想法吗?(最好使用tidyverse

  • 数据在上面的链接中。


<details>
<summary>英文:</summary>

This is a follow up of [this question][1]. How can I filter all participants (in this case, schools) that matches ALL factors in my desired subset ? *Not one or another.*

* edit = subset was up to Level 04 (1 up to 4), not Level 05

* my dataframe looks like this:

quest40_2[1:10,]

A tibble: 10 x 4

SCHOOL Q9 Q11 Q40
<glue> <fct> <fct> <fct>
1 School1 typeB level0 NA
2 School1 typeB level01 NA
3 School1 typeB level02 NA
4 School1 typeB level03 NA
5 School1 typeB NA plan1_level0upto02
6 School1 typeB NA plan2_level_03
7 School1 typeB NA plan3_level_04
8 School2 typeB level01 NA
9 School2 typeB level02 NA
10 School2 typeB level03 NA


* edit for clarification: desired output:

A tibble: 12 x 4
SCHOOL Q9 Q11 Q40
<glue> <fct> <fct> <fct>
1 School2 typeB level01 NA
2 School2 typeB level02 NA
3 School2 typeB level03 NA
4 School2 typeB level04 NA
5 School3 typeB level01 NA
6 School3 typeB level02 NA
7 School3 typeB level03 NA
8 School3 typeB level04 NA
9 School5 typeC level01 NA
10 School5 typeC level02 NA
11 School5 typeC level03 NA
12 School5 typeC level04 NA


* **Question** I want to subset all ```SCHOOLs``` that have *level01*, *level02* *level03* **AND** *level04*  in ```Q11``` levels (such as the operator &amp;&amp;, for example). Hence, I cannot have schools that offer one of them, but not all. Any ideas?(preferably, ```tidyverse``` ones)

* data is in the linked post above 


  [1]: https://stackoverflow.com/questions/75488236/how-to-get-right-proportions-based-on-a-subset-of-the-data-set-in-r?noredirect=1#comment133191892_75488236
  [2]: https://i.stack.imgur.com/88JuN.png

</details>


# 答案1
**得分**: 1

代码部分不翻译,只提供翻译好的内容:

第一部分输出:
```markdown
# A tibble: 96 × 4
   SCHOOL  Q9    Q11     Q40               
 1 School3 typeB level01 <NA>              
 2 School3 typeB level02 <NA>              
 3 School3 typeB level03 <NA>              
 4 School3 typeB level04 <NA>              
 5 School3 typeB level05 <NA>              
 6 School3 typeB <NA>    plan1_level0upto02
 7 School3 typeB <NA>    plan2_level_03    
 8 School3 typeB <NA>    plan3_level_04    
 9 School3 typeB <NA>    plan4_level_05    
10 School5 typeC level01 <NA>              
# … with 86 more rows

第二部分输出:

# A tibble: 44 × 4
   SCHOOL   Q9    Q11     Q40  
 1 School3  typeB level01 <NA> 
 2 School3  typeB level02 <NA> 
 3 School3  typeB level04 <NA> 
 4 School3  typeB level05 <NA> 
 5 School5  typeC level01 <NA> 
 6 School5  typeC level02 <NA> 
 7 School5  typeC level04 <NA> 
 8 School5  typeC level05 <NA> 
 9 School13 typeD level01 <NA> 
10 School13 typeD level02 <NA> 
# … with 34 more rows

第三部分输出:

# A tibble: 96 × 4
   SCHOOL  Q9    Q11     Q40               
 1 School3 typeB level01 <NA>              
 2 School3 typeB level02 <NA>              
 3 School3 typeB level03 <NA>              
 4 School3 typeB level04 <NA>              
 5 School3 typeB level05 <NA>              
 6 School3 typeB <NA>    plan1_level0upto02
 7 School3 typeB <NA>    plan2_level_03    
 8 School3 typeB <NA>    plan3_level_04    
 9 School3 typeB <NA>    plan4_level_05    
10 School5 typeC level01 <NA>              
# … with 86 more rows

第四部分输出:

# A tibble: 115 × 4
   SCHOOL  Q9    Q11     Q40  
 1 School1 typeB level01 <NA> 
 2 School1 typeB level02 <NA> 
 3 School2 typeB level01 <NA> 
 4 School2 typeB level02 <NA> 
 5 School2 typeB level04 <NA> 
 6 School3 typeB level01 <NA> 
 7 School3 typeB level02 <NA> 
 8 School3 typeB level04 <NA> 
 9 School3 typeB level05 <NA> 
10 School4 typeB level02 <NA> 
# … with 105 more rows

希望这些翻译对你有帮助。

英文:

We can use filter the 'SCHOOL' having all the custom levels in 'Q11' (With dplyr 1.1, can use .by in filter

library(dplyr) # version &gt;= 1.1.0
quest40_2 %&gt;%
    filter(all(c(&quot;level01&quot;, &quot;level02&quot;, &quot;level04&quot;, &quot;level05&quot;) %in% Q11),
     .by = &quot;SCHOOL&quot;)

-output

# A tibble: 96 &#215; 4
   SCHOOL  Q9    Q11     Q40               
   &lt;glue&gt;  &lt;fct&gt; &lt;fct&gt;   &lt;fct&gt;             
 1 School3 typeB level01 &lt;NA&gt;              
 2 School3 typeB level02 &lt;NA&gt;              
 3 School3 typeB level03 &lt;NA&gt;              
 4 School3 typeB level04 &lt;NA&gt;              
 5 School3 typeB level05 &lt;NA&gt;              
 6 School3 typeB &lt;NA&gt;    plan1_level0upto02
 7 School3 typeB &lt;NA&gt;    plan2_level_03    
 8 School3 typeB &lt;NA&gt;    plan3_level_04    
 9 School3 typeB &lt;NA&gt;    plan4_level_05    
10 School5 typeC level01 &lt;NA&gt;              
# … with 86 more rows

If we want to further filter with only those levels

quest40_2 %&gt;%
    filter(all(c(&quot;level01&quot;, &quot;level02&quot;, &quot;level04&quot;, &quot;level05&quot;) %in% 
     Q11), Q11 %in% c(&quot;level01&quot;, &quot;level02&quot;, &quot;level04&quot;, &quot;level05&quot;), .by = &#39;SCHOOL&#39;)

-output

# A tibble: 44 &#215; 4
   SCHOOL   Q9    Q11     Q40  
   &lt;glue&gt;   &lt;fct&gt; &lt;fct&gt;   &lt;fct&gt;
 1 School3  typeB level01 &lt;NA&gt; 
 2 School3  typeB level02 &lt;NA&gt; 
 3 School3  typeB level04 &lt;NA&gt; 
 4 School3  typeB level05 &lt;NA&gt; 
 5 School5  typeC level01 &lt;NA&gt; 
 6 School5  typeC level02 &lt;NA&gt; 
 7 School5  typeC level04 &lt;NA&gt; 
 8 School5  typeC level05 &lt;NA&gt; 
 9 School13 typeD level01 &lt;NA&gt; 
10 School13 typeD level02 &lt;NA&gt; 
# … with 34 more rows

For earlier versions, use group_by

quest40_2 %&gt;%
  group_by(SCHOOL) %&gt;%
   filter(all(c(&quot;level01&quot;, &quot;level02&quot;, &quot;level04&quot;, &quot;level05&quot;) %in% Q11)) %&gt;%
  ungroup

-output

# A tibble: 96 &#215; 4
   SCHOOL  Q9    Q11     Q40               
   &lt;glue&gt;  &lt;fct&gt; &lt;fct&gt;   &lt;fct&gt;             
 1 School3 typeB level01 &lt;NA&gt;              
 2 School3 typeB level02 &lt;NA&gt;              
 3 School3 typeB level03 &lt;NA&gt;              
 4 School3 typeB level04 &lt;NA&gt;              
 5 School3 typeB level05 &lt;NA&gt;              
 6 School3 typeB &lt;NA&gt;    plan1_level0upto02
 7 School3 typeB &lt;NA&gt;    plan2_level_03    
 8 School3 typeB &lt;NA&gt;    plan3_level_04    
 9 School3 typeB &lt;NA&gt;    plan4_level_05    
10 School5 typeC level01 &lt;NA&gt;              
# … with 86 more rows

Instead of filtering the 'SCHOOL's if we need to only the filter only the custom levels, just do

quest40_2 %&gt;%
    filter(Q11 %in% c(&quot;level01&quot;, &quot;level02&quot;, &quot;level04&quot;, &quot;level05&quot;))

-output

# A tibble: 115 &#215; 4
   SCHOOL  Q9    Q11     Q40  
   &lt;glue&gt;  &lt;fct&gt; &lt;fct&gt;   &lt;fct&gt;
 1 School1 typeB level01 &lt;NA&gt; 
 2 School1 typeB level02 &lt;NA&gt; 
 3 School2 typeB level01 &lt;NA&gt; 
 4 School2 typeB level02 &lt;NA&gt; 
 5 School2 typeB level04 &lt;NA&gt; 
 6 School3 typeB level01 &lt;NA&gt; 
 7 School3 typeB level02 &lt;NA&gt; 
 8 School3 typeB level04 &lt;NA&gt; 
 9 School3 typeB level05 &lt;NA&gt; 
10 School4 typeB level02 &lt;NA&gt; 
# … with 105 more rows

答案2

得分: 1

我们可以创建一个Q11值的向量,用于在dplyr的filter()函数中保留和使用。

library(dplyr)
tokeep <- c("level01", "level02", "level04", "level05")

quest40_2 %>%
  group_by(SCHOOL) %>%
  filter(Q11 %in% tokeep) %>%
  ungroup()

输出的前几行如下:

# A tibble: 115 × 4
   SCHOOL  Q9    Q11     Q40  
   <glue>  <fct> <fct>   <fct>
 1 School1 typeB level01 NA   
 2 School1 typeB level02 NA   
 3 School2 typeB level01 NA   
 4 School2 typeB level02 NA   
 5 School2 typeB level04 NA   
 6 School3 typeB level01 NA   
 7 School3 typeB level02 NA   
 8 School3 typeB level04 NA   
 9 School3 typeB level05 NA   
10 School4 typeB level02 NA   
# … with 105 more rows
英文:

we can make a vector of Q11 values to keep and use for filter() fanction from dplyr.

library(dplyr)
tokeep&lt;-c(&quot;level01&quot;, &quot;level02&quot;, &quot;level04&quot;, &quot;level05&quot;)

quest40_2 %&gt;% group_by(SCHOOL) %&gt;%
 filter(Q11 %in% tokeep) %&gt;% ungroup()

few rows of the output:

# A tibble: 115 &#215; 4
   SCHOOL  Q9    Q11     Q40  
   &lt;glue&gt;  &lt;fct&gt; &lt;fct&gt;   &lt;fct&gt;
 1 School1 typeB level01 NA   
 2 School1 typeB level02 NA   
 3 School2 typeB level01 NA   
 4 School2 typeB level02 NA   
 5 School2 typeB level04 NA   
 6 School3 typeB level01 NA   
 7 School3 typeB level02 NA   
 8 School3 typeB level04 NA   
 9 School3 typeB level05 NA   
10 School4 typeB level02 NA   
# … with 105 more rows

huangapple
  • 本文由 发表于 2023年2月18日 07:05:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/75489985.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定