问题

我想根据每个分组中特定值的出现次数来筛选我的分组数据框。

一些示例数据：

data <- data.frame(ID = sample(c("A","B","C","D"),100,replace = T), 
                 rt = runif(100,0.2,1),
                 lapse = sample(1:2,100,replace = T))

在这种情况下，“lapse”列是我的筛选变量。
我想要排除每个“ID”组中“lapse”等于2的计数超过15次的情况！

data %>% group_by(ID) %>% count(lapse == 2)

所以，例如，如果组“A”中有17次“lapse”等于2，那么整个数据框将被过滤掉。

英文:

I want to filter my grouped dataframe based on the number of occurrences of a specific value within a group.

Some exemplary data:

data &lt;- data.frame(ID = sample(c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;),100,replace = T), 
                 rt = runif(100,0.2,1),
                 lapse = sample(1:2,100,replace = T))

The “lapse” column is my filter variable in this case.
I want to exclude every “ID” group that has more than 15 counts of “lapse” == 2 within!

data %&gt;% group_by(ID) %&gt;% count(lapse == 2)

So, if for example the group “A” has 17 times “lapse” == 2 within it should be filtered entirely from the datafame.

答案1

得分: 3

以下是您要翻译的内容：

"First I created some reproducible data using a set.seed and check the number of values per group. It seems that in this case only group D more values with lapse 2 has. You can use filter and sum the values with lapse 2 per group like this:

set.seed(7)
data &lt;- data.frame(ID = sample(c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;),100,replace = T), 
                   rt = runif(100,0.2,1),
                   lapse = sample(1:2,100,replace = T))

library(dplyr)
# Check n values per group
data %&gt;% 
  group_by(ID, lapse) %&gt;% 
  summarise(n = n())
#&gt; # A tibble: 8 &#215; 3
#&gt; # Groups:   ID [4]
#&gt;   ID    lapse     n
#&gt;   &lt;chr&gt; &lt;int&gt; &lt;int&gt;
#&gt; 1 A         1     8
#&gt; 2 A         2     7
#&gt; 3 B         1    13
#&gt; 4 B         2    15
#&gt; 5 C         1    18
#&gt; 6 C         2     6
#&gt; 7 D         1    17
#&gt; 8 D         2    16

data %&gt;% 
  group_by(ID) %&gt;% 
  filter(!(sum(lapse ==  2) &gt; 15))
#&gt; # A tibble: 67 &#215; 3
#&gt; # Groups:   ID [3]
#&gt;    ID       rt lapse
#&gt;    &lt;chr&gt; &lt;dbl&gt; &lt;int&gt;
#&gt;  1 B     0.517     2
#&gt;  2 C     0.589     1
#&gt;  3 C     0.598     2
#&gt;  4 C     0.715     1
#&gt;  5 B     0.475     2
#&gt;  6 C     0.965     1
#&gt;  7 B     0.234     1
#&gt;  8 B     0.812     2
#&gt;  9 C     0.517     1
#&gt; 10 B     0.700     1
#&gt; # … with 57 more rows

<sup>Created on 2023-01-08 with reprex v2.0.2</sup>"

英文:

First I created some reproducible data using a set.seed and check the number of values per group. It seems that in this case only group D more values with lapse 2 has. You can use filter and sum the values with lapse 2 per group like this:

set.seed(7)
data &lt;- data.frame(ID = sample(c(&quot;A&quot;,&quot;B&quot;,&quot;C&quot;,&quot;D&quot;),100,replace = T), 
                   rt = runif(100,0.2,1),
                   lapse = sample(1:2,100,replace = T))

library(dplyr)
# Check n values per group
data %&gt;%
  group_by(ID, lapse) %&gt;%
  summarise(n = n())
#&gt; # A tibble: 8 &#215; 3
#&gt; # Groups:   ID [4]
#&gt;   ID    lapse     n
#&gt;   &lt;chr&gt; &lt;int&gt; &lt;int&gt;
#&gt; 1 A         1     8
#&gt; 2 A         2     7
#&gt; 3 B         1    13
#&gt; 4 B         2    15
#&gt; 5 C         1    18
#&gt; 6 C         2     6
#&gt; 7 D         1    17
#&gt; 8 D         2    16

data %&gt;%
  group_by(ID) %&gt;%
  filter(!(sum(lapse ==  2) &gt; 15))
#&gt; # A tibble: 67 &#215; 3
#&gt; # Groups:   ID [3]
#&gt;    ID       rt lapse
#&gt;    &lt;chr&gt; &lt;dbl&gt; &lt;int&gt;
#&gt;  1 B     0.517     2
#&gt;  2 C     0.589     1
#&gt;  3 C     0.598     2
#&gt;  4 C     0.715     1
#&gt;  5 B     0.475     2
#&gt;  6 C     0.965     1
#&gt;  7 B     0.234     1
#&gt;  8 B     0.812     2
#&gt;  9 C     0.517     1
#&gt; 10 B     0.700     1
#&gt; # … with 57 more rows

<sup>Created on 2023-01-08 with reprex v2.0.2</sup>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

按组内值计数筛选

问题

答案1

使用tidyr unite将某些选择列的列值与列名合并。

rvest – 浏览网站并下载加拿大水文数据

Is there a way to summarise by percentage in R while including the data in a new data frame?

List of Tables and List of Figures in Table of Contents using Quarto book in pdf format

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论