英文:
filtering rows based on counts of variables across multiple columns in R
问题
以下是翻译好的部分:
我正在处理一个看起来像这样的数据集:
d <- read.table(text = "
X1 name var1 var2 var3
A1 A1 0 9 0
A3 A3 0 7 0
A4 A4 0 11 0
A5 A5 0 7 0
A6 A6 0 8 0
D D 0 11 0
IN A5 0 0 11
IN IN 0 11 0 ", header = TRUE)
我想要筛选掉当X1 + name中的变量计数>=3时的行,得到以下结果:
X1 name var1 var2 var3
A5 A5 0 7 0
IN A5 0 0 11
IN IN 0 11 0
我目前的进展是:
d %>%
group_by(X1, name) %>%
filter(n() >= 3)
但我知道我漏掉了一些东西,因为它不起作用。
谢谢您的时间!
英文:
I'm working with a dataset that looks like this:
d <- read.table(text = "
X1 name var1 var2 var3
A1 A1 0 9 0
A3 A3 0 7 0
A4 A4 0 11 0
A5 A5 0 7 0
A6 A6 0 8 0
D D 0 11 0
IN A5 0 0 11
IN IN 0 11 0 ", header = TRUE)
I'd like to filter out rows when the count of variables in both X1 + name is >=3 giving this:
X1 name var1 var2 var3
A5 A5 0 7 0
IN A5 0 0 11
IN IN 0 11 0
the furthest I've got is:
d%>%
group_by (X1,name) %>%
filter(n() >=3)
but I know I'm missing something here as it doesn't work.
Thanks for your time!
答案1
得分: 3
d %>%
add_count(X1, name = "X1_count") %>%
add_count(name, name = "name_count") %>%
filter(X1_count + name_count >= 3)
X1 name var1 var2 var3 X1_count name_count
1 A5 A5 0 7 0 1 2
2 IN A5 0 0 11 2 2
3 IN IN 0 11 0 2 1
英文:
d %>%
add_count(X1, name = "X1_count") %>%
add_count(name, name = "name_count") %>%
filter(X1_count + name_count >= 3)
X1 name var1 var2 var3 X1_count name_count
1 A5 A5 0 7 0 1 2
2 IN A5 0 0 11 2 2
3 IN IN 0 11 0 2 1
答案2
得分: 1
x <- table(c(d$X1, d$name)) >= 3
keep <- names(x[x])
d |>
filter(X1 %in% keep | name %in% keep)
或者在基本的R语言中:
subset(d, X1 %in% keep | name %in% keep)
英文:
x <- table(c(d$X1, d$name)) >= 3
keep <- names(x[x])
d |>
filter(X1 %in% keep | name %in% keep)
Or in base R:
subset(d, X1 %in% keep | name %in% keep)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论