2023年4月19日 18:54:13go评论103阅读模式

英文:

Selecting rows with certain values

问题

I would like to keep only the rows of people which both have 'car' and 'house' (so keep the rows with Jay and Ashley, and discard rows with Kate, Miranda, and Mike).

英文:

I have a data frame from which I would like to keep only certain rows.

The data frame (in total around 1000 similar rows and more possibilities in var1) looks like this:

name	var1
Jay	car
Jay	house
Kate	car
Miranda	house
Ashley	car
Ashley	house
Mike	car
Mike	yacht

I would like to keep only the rows of people which both have 'car' and 'house' (so keep the rows with Jay and Ashley, and discard rows with Kate , Miranda and Mike). I know how to select rows based on certain conditions, but I can think of only methods that will also keep Kate and Miranda or Mike in the data frame. Im not sure how to approach this problem? Thanks for the advice.

答案1

得分: 3

使用基本的R，我们可以像下面这样使用subset + ave：

> subset(df, as.logical(ave(var1, name, FUN = \(x) all(c("car", "house") %in% x))))
    name  var1
1    Jay   car
2    Jay house
5 Ashley   car
6 Ashley house

注意：我只翻译代码部分，不包括问题的其他内容。

英文:

With base R, we can use subset + ave like below

&gt; subset(df, as.logical(ave(var1, name, FUN = \(x) all(c(&quot;car&quot;, &quot;house&quot;) %in% x))))
    name  var1
1    Jay   car
2    Jay house
5 Ashley   car
6 Ashley house

答案2

得分: 2

请尝试以下代码：

library(tidyverse)
df %>%
  # 对于每个`name`...
  group_by(name) %>%
  # ...将所有`var1`的值粘合在一起：
  mutate(temp = str_c(var1, collapse = ",")) %>%
  # 过滤感兴趣的值组合的行：
  filter(temp == "car,house") %>%
  # 删除`temp`：
  select(-temp)

编辑：

根据@zx8754的评论，您可以首先按字母顺序排列var1中的值：

df %>%
  group_by(name) %>%
  arrange(var1) %>%
  mutate(temp = str_c(var1, collapse = ",")) %>%
  filter(temp == "car,house") %>%
  select(-temp)

英文:

Try this:

library(tidyverse)
df %&gt;%
  # for each `name`...
  group_by(name) %&gt;%
  # ...paste all `var1` values together:
  mutate(temp = str_c(var1, collapse = &quot;,&quot;)) %&gt;%
  # filter rows of value combinations of interest:
  filter(temp == &quot;car,house&quot;) %&gt;%
  # remove `temp`:
  select(-temp)

EDIT:

Picking up on a comment by @zx8754, you can first arrange the values in var1 in alphabetic order:

df %&gt;%
  group_by(name) %&gt;%
  arrange(var1) %&gt;%
  mutate(temp = str_c(var1, collapse = &quot;,&quot;)) %&gt;%
  filter(temp == &quot;car,house&quot;) %&gt;%
  select(-temp)

答案3

得分: 1

Here is the translation of the provided code:

数据
df = structure(list(var1 = c("car", "car", "car", "car", "house", "house", "house"), name = c("Jay", "Kate", "Ashley", "Mike", "Jay", "Miranda", "Ashley"), w = c(1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, -7L), class = "data.frame")
第一选项
do.call(
  rbind,
  by(df, list(df$name), function(x) {
    if (all(c("house", "car") %in% unlist(x["var1"]))) x else NULL
  })
)
               name  var1
Ashley.5 Ashley   car
Ashley.6 Ashley house
Jay.1       Jay   car
Jay.2       Jay house
第二选项
x = data.frame(
  "var1" = c("house", "car"),
  "w" = 1
)
df = merge(df, x, by = "var1")
df[ave(df$w, df$name, FUN = sum) == nrow(x),]
   var1   name w
1   car    Jay 1
3   car Ashley 1
5 house    Jay 1
7 house Ashley 1

Please note that I've translated the code only, as you requested.

英文:

Data

df=structure(list(var1 = c(&quot;car&quot;, &quot;car&quot;, &quot;car&quot;, &quot;car&quot;, &quot;house&quot;, 
&quot;house&quot;, &quot;house&quot;), name = c(&quot;Jay&quot;, &quot;Kate&quot;, &quot;Ashley&quot;, &quot;Mike&quot;, 
&quot;Jay&quot;, &quot;Miranda&quot;, &quot;Ashley&quot;), w = c(1, 1, 1, 1, 1, 1, 1)), row.names = c(NA, 
-7L), class = &quot;data.frame&quot;)

first option

do.call(
  rbind,
  by(df,list(df$name),function(x){
    if (all(c(&quot;house&quot;,&quot;car&quot;) %in% unlist(x[&quot;var1&quot;]))) x else NULL
  })
)
           name  var1
Ashley.5 Ashley   car
Ashley.6 Ashley house
Jay.1       Jay   car
Jay.2       Jay house

second option

x=data.frame(
  &quot;var1&quot;=c(&quot;house&quot;,&quot;car&quot;),
  &quot;w&quot;=1
)
df=merge(df,x,by=&quot;var1&quot;)
df[ave(df$w,df$name,FUN=sum)==nrow(x),]
   var1   name w
1   car    Jay 1
3   car Ashley 1
5 house    Jay 1
7 house Ashley 1

答案4

得分: 1

获取每个名称对应的资产计数，其中资产与我们的集合匹配，然后进行子集操作：

assets <- c("car", "house")
x <- table(df[df$var1 %in% assets, "name"])
df[df$name %in% names(x)[x >= length(assets)], ]
#     name  var1
# 1    Jay   car
# 2    Jay house
# 5 Ashley   car
# 6 Ashley house

请注意，这是提供的代码的翻译部分。

英文:

Get counts of assets per name where assets match our set, then subset:

assets &lt;- c(&quot;car&quot;, &quot;house&quot;)
x &lt;- table(df[ df$var1 %in% assets, &quot;name&quot;])
df[ df$name %in% names(x)[ x &gt;= length(assets) ], ]
#     name  var1
# 1    Jay   car
# 2    Jay house
# 5 Ashley   car
# 6 Ashley house

答案5

得分: 0

We can do a simple filter for groups in which all desired values are found in var1, .by name, then filter again with ungrouped data to remove instances of "yacht" and others from the selected groups:

library(dplyr) #版本 >= 1.1.0
df %>%
    filter(all(c("car", "house") %in% var1), .by = name) %>%
    filter(var1 %in% c("car", "house"))

英文:

We can do a simple filter for groups in which all desired values are found in var1, .by name, then filter again with ungrouped data to remove instances of "yatch" and others from the selected groups

library(dplyr) #version &gt;= 1.1.0
df %&gt;% 
    filter(all(c(&quot;car&quot;, &quot;house&quot;) %in% var1), .by = name) %&gt;% 
    filter(var1 %in% c(&quot;car&quot;, &quot;house&quot;))
</details>

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

选择具有特定值的行

问题

答案1

答案2

答案3

答案4

答案5

error while using rbind function in R: add the same rows multiple times

不同的卡方检验数值在R和在线计算器中

How can I create a frequency plot/histogram in R using ggplot2 while normalizing to the total of a factor?

如何根据空单元格动态更改Google表格中的TextJoin范围

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。