英文:
Selecting rows with certain values
问题
I would like to keep only the rows of people which both have 'car' and 'house' (so keep the rows with Jay and Ashley, and discard rows with Kate, Miranda, and Mike).
英文:
I have a data frame from which I would like to keep only certain rows.
The data frame (in total around 1000 similar rows and more possibilities in var1) looks like this:
name | var1 |
---|---|
Jay | car |
Jay | house |
Kate | car |
Miranda | house |
Ashley | car |
Ashley | house |
Mike | car |
Mike | yacht |
I would like to keep only the rows of people which both have 'car' and 'house' (so keep the rows with Jay and Ashley, and discard rows with Kate , Miranda and Mike). I know how to select rows based on certain conditions, but I can think of only methods that will also keep Kate and Miranda or Mike in the data frame. Im not sure how to approach this problem? Thanks for the advice.
答案1
得分: 3
使用基本的R,我们可以像下面这样使用subset
+ ave
:
> subset(df, as.logical(ave(var1, name, FUN = \(x) all(c("car", "house") %in% x))))
name var1
1 Jay car
2 Jay house
5 Ashley car
6 Ashley house
注意:我只翻译代码部分,不包括问题的其他内容。
英文:
With base R, we can use subset
+ ave
like below
> subset(df, as.logical(ave(var1, name, FUN = \(x) all(c("car", "house") %in% x))))
name var1
1 Jay car
2 Jay house
5 Ashley car
6 Ashley house
答案2
得分: 2
请尝试以下代码:
library(tidyverse)
df %>%
# 对于每个`name`...
group_by(name) %>%
# ...将所有`var1`的值粘合在一起:
mutate(temp = str_c(var1, collapse = ",")) %>%
# 过滤感兴趣的值组合的行:
filter(temp == "car,house") %>%
# 删除`temp`:
select(-temp)
编辑:
根据@zx8754的评论,您可以首先按字母顺序排列var1
中的值:
df %>%
group_by(name) %>%
arrange(var1) %>%
mutate(temp = str_c(var1, collapse = ",")) %>%
filter(temp == "car,house") %>%
select(-temp)
英文:
Try this:
library(tidyverse)
df %>%
# for each `name`...
group_by(name) %>%
# ...paste all `var1` values together:
mutate(temp = str_c(var1, collapse = ",")) %>%
# filter rows of value combinations of interest:
filter(temp == "car,house") %>%
# remove `temp`:
select(-temp)
EDIT:
Picking up on a comment by @zx8754, you can first arrange
the values in var1
in alphabetic order:
df %>%
group_by(name) %>%
arrange(var1) %>%
mutate(temp = str_c(var1, collapse = ",")) %>%
filter(temp == "car,house") %>%
select(-temp)
答案3
得分: 1
Here is the translation of the provided code:
数据
df = structure(list(var1 = c("car", "car", "car", "car", "house", "house", "house"), name = c("Jay", "Kate", "Ashley", "Mike", "Jay", "Miranda", "Ashley"), w = c(1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -7L), class = "data.frame")
第一选项
do.call(
rbind,
by(df, list(df$name), function(x) {
if (all(c("house", "car") %in% unlist(x["var1"]))) x else NULL
})
)
name var1
Ashley.5 Ashley car
Ashley.6 Ashley house
Jay.1 Jay car
Jay.2 Jay house
第二选项
x = data.frame(
"var1" = c("house", "car"),
"w" = 1
)
df = merge(df, x, by = "var1")
df[ave(df$w, df$name, FUN = sum) == nrow(x),]
var1 name w
1 car Jay 1
3 car Ashley 1
5 house Jay 1
7 house Ashley 1
Please note that I've translated the code only, as you requested.
英文:
Data
df=structure(list(var1 = c("car", "car", "car", "car", "house",
"house", "house"), name = c("Jay", "Kate", "Ashley", "Mike",
"Jay", "Miranda", "Ashley"), w = c(1, 1, 1, 1, 1, 1, 1)), row.names = c(NA,
-7L), class = "data.frame")
first option
do.call(
rbind,
by(df,list(df$name),function(x){
if (all(c("house","car") %in% unlist(x["var1"]))) x else NULL
})
)
name var1
Ashley.5 Ashley car
Ashley.6 Ashley house
Jay.1 Jay car
Jay.2 Jay house
second option
x=data.frame(
"var1"=c("house","car"),
"w"=1
)
df=merge(df,x,by="var1")
df[ave(df$w,df$name,FUN=sum)==nrow(x),]
var1 name w
1 car Jay 1
3 car Ashley 1
5 house Jay 1
7 house Ashley 1
答案4
得分: 1
获取每个名称对应的资产计数,其中资产与我们的集合匹配,然后进行子集操作:
assets <- c("car", "house")
x <- table(df[df$var1 %in% assets, "name"])
df[df$name %in% names(x)[x >= length(assets)], ]
# name var1
# 1 Jay car
# 2 Jay house
# 5 Ashley car
# 6 Ashley house
请注意,这是提供的代码的翻译部分。
英文:
Get counts of assets per name where assets match our set, then subset:
assets <- c("car", "house")
x <- table(df[ df$var1 %in% assets, "name"])
df[ df$name %in% names(x)[ x >= length(assets) ], ]
# name var1
# 1 Jay car
# 2 Jay house
# 5 Ashley car
# 6 Ashley house
答案5
得分: 0
We can do a simple filter for groups in which all
desired values are found in var1
, .by
name, then filter again with ungrouped data to remove instances of "yacht" and others from the selected groups:
library(dplyr) #版本 >= 1.1.0
df %>%
filter(all(c("car", "house") %in% var1), .by = name) %>%
filter(var1 %in% c("car", "house"))
英文:
We can do a simple filter for groups in which all
desired values are found in var1
, .by
name, then filter again with ungrouped data to remove instances of "yatch" and others from the selected groups
library(dplyr) #version >= 1.1.0
df %>%
filter(all(c("car", "house") %in% var1), .by = name) %>%
filter(var1 %in% c("car", "house"))
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论