在R中基于多列变量的计数筛选行:

huangapple go评论68阅读模式
英文:

filtering rows based on counts of variables across multiple columns in R

问题

以下是翻译好的部分:

我正在处理一个看起来像这样的数据集:

d <- read.table(text = "
X1 name var1 var2 var3
A1 A1 0 9 0
A3 A3 0 7 0
A4 A4 0 11 0
A5 A5 0 7 0
A6 A6 0 8 0
D D 0 11 0
IN A5 0 0 11
IN IN 0 11 0 ", header = TRUE)

我想要筛选掉当X1 + name中的变量计数>=3时的行,得到以下结果:

X1 name var1 var2 var3
A5 A5 0 7 0
IN A5 0 0 11
IN IN 0 11 0

我目前的进展是:

d %>%
group_by(X1, name) %>%
filter(n() >= 3)

但我知道我漏掉了一些东西,因为它不起作用。

谢谢您的时间!
英文:

I'm working with a dataset that looks like this:

d &lt;- read.table(text = &quot;
X1    name   var1  var2  var3
A1    A1        0     9     0
A3    A3        0     7     0
A4    A4        0    11     0
A5    A5        0     7     0
A6    A6        0     8     0
D     D         0    11     0
IN    A5        0     0    11
IN    IN        0    11     0 &quot;, header = TRUE)

I'd like to filter out rows when the count of variables in both X1 + name is >=3 giving this:

X1 name var1 var2 var3
A5   A5    0    7    0
IN   A5    0    0   11
IN   IN    0   11    0

the furthest I've got is:

    d%&gt;% 
    group_by (X1,name) %&gt;% 
    filter(n() &gt;=3)

but I know I'm missing something here as it doesn't work.

Thanks for your time!

答案1

得分: 3

d %>%
add_count(X1, name = "X1_count") %>%
add_count(name, name = "name_count") %>%
filter(X1_count + name_count >= 3)

X1 name var1 var2 var3 X1_count name_count
1 A5 A5 0 7 0 1 2
2 IN A5 0 0 11 2 2
3 IN IN 0 11 0 2 1

英文:
d %&gt;%
  add_count(X1, name = &quot;X1_count&quot;) %&gt;%
  add_count(name, name = &quot;name_count&quot;) %&gt;%
  filter(X1_count + name_count &gt;= 3)


  X1 name var1 var2 var3 X1_count name_count
1 A5   A5    0    7    0        1          2
2 IN   A5    0    0   11        2          2
3 IN   IN    0   11    0        2          1

答案2

得分: 1

x &lt;- table(c(d$X1, d$name)) &gt;= 3
keep &lt;- names(x[x])

d |&gt;
  filter(X1 %in% keep | name %in% keep)

或者在基本的R语言中:

subset(d, X1 %in% keep | name %in% keep)
英文:
x &lt;- table(c(d$X1, d$name)) &gt;= 3
keep &lt;- names(x[x])

d |&gt;
  filter(X1 %in% keep | name %in% keep)

Or in base R:

subset(d, X1 %in% keep | name %in% keep)

huangapple
  • 本文由 发表于 2023年6月13日 04:42:08
  • 转载请务必保留本文链接:https://go.coder-hub.com/76460193.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定