选择具有特定值的行

huangapple go评论61阅读模式
英文:

Selecting rows with certain values

问题

I would like to keep only the rows of people which both have 'car' and 'house' (so keep the rows with Jay and Ashley, and discard rows with Kate, Miranda, and Mike).

英文:

I have a data frame from which I would like to keep only certain rows.

The data frame (in total around 1000 similar rows and more possibilities in var1) looks like this:

name var1
Jay car
Jay house
Kate car
Miranda house
Ashley car
Ashley house
Mike car
Mike yacht

I would like to keep only the rows of people which both have 'car' and 'house' (so keep the rows with Jay and Ashley, and discard rows with Kate , Miranda and Mike). I know how to select rows based on certain conditions, but I can think of only methods that will also keep Kate and Miranda or Mike in the data frame. Im not sure how to approach this problem? Thanks for the advice.

答案1

得分: 3

使用基本的R,我们可以像下面这样使用subset + ave

> subset(df, as.logical(ave(var1, name, FUN = \(x) all(c("car", "house") %in% x))))
    name  var1
1    Jay   car
2    Jay house
5 Ashley   car
6 Ashley house

注意:我只翻译代码部分,不包括问题的其他内容。

英文:

With base R, we can use subset + ave like below

> subset(df, as.logical(ave(var1, name, FUN = \(x) all(c("car", "house") %in% x))))
    name  var1
1    Jay   car
2    Jay house
5 Ashley   car
6 Ashley house

答案2

得分: 2

请尝试以下代码:

library(tidyverse)
df %>%
  # 对于每个`name`...
  group_by(name) %>%
  # ...将所有`var1`的值粘合在一起:
  mutate(temp = str_c(var1, collapse = ",")) %>%
  # 过滤感兴趣的值组合的行:
  filter(temp == "car,house") %>%
  # 删除`temp`:
  select(-temp)

编辑

根据@zx8754的评论,您可以首先按字母顺序排列var1中的值:

df %>%
  group_by(name) %>%
  arrange(var1) %>%
  mutate(temp = str_c(var1, collapse = ",")) %>%
  filter(temp == "car,house") %>%
  select(-temp)
英文:

Try this:

library(tidyverse)
df %>%
  # for each `name`...
  group_by(name) %>%
  # ...paste all `var1` values together:
  mutate(temp = str_c(var1, collapse = ",")) %>%
  # filter rows of value combinations of interest:
  filter(temp == "car,house") %>%
  # remove `temp`:
  select(-temp)

EDIT:

Picking up on a comment by @zx8754, you can first arrange the values in var1 in alphabetic order:

df %>%
  group_by(name) %>%
  arrange(var1) %>%
  mutate(temp = str_c(var1, collapse = ",")) %>%
  filter(temp == "car,house") %>%
  select(-temp)

答案3

得分: 1

Here is the translation of the provided code:

数据

df = structure(list(var1 = c("car", "car", "car", "car", "house", "house", "house"), name = c("Jay", "Kate", "Ashley", "Mike", "Jay", "Miranda", "Ashley"), w = c(1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -7L), class = "data.frame")

第一选项

do.call(
  rbind,
  by(df, list(df$name), function(x) {
    if (all(c("house", "car") %in% unlist(x["var1"]))) x else NULL
  })
)

               name  var1
Ashley.5 Ashley   car
Ashley.6 Ashley house
Jay.1       Jay   car
Jay.2       Jay house

第二选项

x = data.frame(
  "var1" = c("house", "car"),
  "w" = 1
)
df = merge(df, x, by = "var1")
df[ave(df$w, df$name, FUN = sum) == nrow(x),]

   var1   name w
1   car    Jay 1
3   car Ashley 1
5 house    Jay 1
7 house Ashley 1

Please note that I've translated the code only, as you requested.

英文:

Data

df=structure(list(var1 = c("car", "car", "car", "car", "house", 
"house", "house"), name = c("Jay", "Kate", "Ashley", "Mike", 
"Jay", "Miranda", "Ashley"), w = c(1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 
-7L), class = "data.frame")

first option

do.call(
  rbind,
  by(df,list(df$name),function(x){
    if (all(c("house","car") %in% unlist(x["var1"]))) x else NULL
  })
)

           name  var1
Ashley.5 Ashley   car
Ashley.6 Ashley house
Jay.1       Jay   car
Jay.2       Jay house

second option

x=data.frame(
  "var1"=c("house","car"),
  "w"=1
)
df=merge(df,x,by="var1")
df[ave(df$w,df$name,FUN=sum)==nrow(x),]

   var1   name w
1   car    Jay 1
3   car Ashley 1
5 house    Jay 1
7 house Ashley 1

答案4

得分: 1

获取每个名称对应的资产计数,其中资产与我们的集合匹配,然后进行子集操作:

assets <- c("car", "house")
x <- table(df[df$var1 %in% assets, "name"])

df[df$name %in% names(x)[x >= length(assets)], ]
#     name  var1
# 1    Jay   car
# 2    Jay house
# 5 Ashley   car
# 6 Ashley house

请注意,这是提供的代码的翻译部分。

英文:

Get counts of assets per name where assets match our set, then subset:

assets &lt;- c(&quot;car&quot;, &quot;house&quot;)
x &lt;- table(df[ df$var1 %in% assets, &quot;name&quot;])

df[ df$name %in% names(x)[ x &gt;= length(assets) ], ]
#     name  var1
# 1    Jay   car
# 2    Jay house
# 5 Ashley   car
# 6 Ashley house

答案5

得分: 0

We can do a simple filter for groups in which all desired values are found in var1, .by name, then filter again with ungrouped data to remove instances of "yacht" and others from the selected groups:

library(dplyr) #版本 >= 1.1.0
df %>%
    filter(all(c("car", "house") %in% var1), .by = name) %>%
    filter(var1 %in% c("car", "house"))
英文:

We can do a simple filter for groups in which all desired values are found in var1, .by name, then filter again with ungrouped data to remove instances of "yatch" and others from the selected groups

library(dplyr) #version &gt;= 1.1.0
df %&gt;% 
    filter(all(c(&quot;car&quot;, &quot;house&quot;) %in% var1), .by = name) %&gt;% 
    filter(var1 %in% c(&quot;car&quot;, &quot;house&quot;))

</details>



huangapple
  • 本文由 发表于 2023年4月19日 18:54:13
  • 转载请务必保留本文链接:https://go.coder-hub.com/76053665.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定