按组内值计数筛选

huangapple go评论64阅读模式
英文:

Filter by value counts within groups

问题

我想根据每个分组中特定值的出现次数来筛选我的分组数据框。

一些示例数据:

data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T), 
                 rt = runif(100,0.2,1),
                 lapse = sample(1:2,100,replace = T))

在这种情况下,“lapse”列是我的筛选变量。
我想要排除每个“ID”组中“lapse”等于2的计数超过15次的情况!

data %>% group_by(ID) %>% count(lapse == 2)

所以,例如,如果组“A”中有17次“lapse”等于2,那么整个数据框将被过滤掉。

英文:

I want to filter my grouped dataframe based on the number of occurrences of a specific value within a group.

Some exemplary data:

data &lt;- data.frame(ID = sample(c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;),100,replace = T), 
                 rt = runif(100,0.2,1),
                 lapse = sample(1:2,100,replace = T))

The “lapse” column is my filter variable in this case.
I want to exclude every “ID” group that has more than 15 counts of “lapse” == 2 within!

data %&gt;% group_by(ID) %&gt;% count(lapse == 2)

So, if for example the group “A” has 17 times “lapse” == 2 within it should be filtered entirely from the datafame.

答案1

得分: 3

以下是您要翻译的内容:

"First I created some reproducible data using a set.seed and check the number of values per group. It seems that in this case only group D more values with lapse 2 has. You can use filter and sum the values with lapse 2 per group like this:

set.seed(7)
data &lt;- data.frame(ID = sample(c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;),100,replace = T), 
                   rt = runif(100,0.2,1),
                   lapse = sample(1:2,100,replace = T))

library(dplyr)
# Check n values per group
data %&gt;% 
  group_by(ID, lapse) %&gt;% 
  summarise(n = n())
#&gt; # A tibble: 8 &#215; 3
#&gt; # Groups:   ID [4]
#&gt;   ID    lapse     n
#&gt;   &lt;chr&gt; &lt;int&gt; &lt;int&gt;
#&gt; 1 A         1     8
#&gt; 2 A         2     7
#&gt; 3 B         1    13
#&gt; 4 B         2    15
#&gt; 5 C         1    18
#&gt; 6 C         2     6
#&gt; 7 D         1    17
#&gt; 8 D         2    16

data %&gt;% 
  group_by(ID) %&gt;% 
  filter(!(sum(lapse ==  2) &gt; 15))
#&gt; # A tibble: 67 &#215; 3
#&gt; # Groups:   ID [3]
#&gt;    ID       rt lapse
#&gt;    &lt;chr&gt; &lt;dbl&gt; &lt;int&gt;
#&gt;  1 B     0.517     2
#&gt;  2 C     0.589     1
#&gt;  3 C     0.598     2
#&gt;  4 C     0.715     1
#&gt;  5 B     0.475     2
#&gt;  6 C     0.965     1
#&gt;  7 B     0.234     1
#&gt;  8 B     0.812     2
#&gt;  9 C     0.517     1
#&gt; 10 B     0.700     1
#&gt; # … with 57 more rows

<sup>Created on 2023-01-08 with reprex v2.0.2</sup>"

英文:

First I created some reproducible data using a set.seed and check the number of values per group. It seems that in this case only group D more values with lapse 2 has. You can use filter and sum the values with lapse 2 per group like this:

set.seed(7)
data &lt;- data.frame(ID = sample(c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;),100,replace = T), 
                   rt = runif(100,0.2,1),
                   lapse = sample(1:2,100,replace = T))

library(dplyr)
# Check n values per group
data %&gt;%
  group_by(ID, lapse) %&gt;%
  summarise(n = n())
#&gt; # A tibble: 8 &#215; 3
#&gt; # Groups:   ID [4]
#&gt;   ID    lapse     n
#&gt;   &lt;chr&gt; &lt;int&gt; &lt;int&gt;
#&gt; 1 A         1     8
#&gt; 2 A         2     7
#&gt; 3 B         1    13
#&gt; 4 B         2    15
#&gt; 5 C         1    18
#&gt; 6 C         2     6
#&gt; 7 D         1    17
#&gt; 8 D         2    16

data %&gt;%
  group_by(ID) %&gt;%
  filter(!(sum(lapse ==  2) &gt; 15))
#&gt; # A tibble: 67 &#215; 3
#&gt; # Groups:   ID [3]
#&gt;    ID       rt lapse
#&gt;    &lt;chr&gt; &lt;dbl&gt; &lt;int&gt;
#&gt;  1 B     0.517     2
#&gt;  2 C     0.589     1
#&gt;  3 C     0.598     2
#&gt;  4 C     0.715     1
#&gt;  5 B     0.475     2
#&gt;  6 C     0.965     1
#&gt;  7 B     0.234     1
#&gt;  8 B     0.812     2
#&gt;  9 C     0.517     1
#&gt; 10 B     0.700     1
#&gt; # … with 57 more rows

<sup>Created on 2023-01-08 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年1月9日 01:52:19
  • 转载请务必保留本文链接:https://go.coder-hub.com/75050097.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定