2023年1月9日 09:41:34go评论104阅读模式

英文:

Filter a column in R based on another column with 2 criteria

问题

我有一个名为“vouchers”的R数据框中的列 - “issue_slip”，其中包含以下值/行：

发行单：
IS/001，
IS/001，
IS/001，
IS/002，
IS/002，
IS/002

还有另一列“rec_status”，其值为0或1。
每个issue_slip行都可以具有rec_status为0或1。
我想只保留那些具有所有rec_status为0或0和1的issue_slips --> 删除具有所有rec_status为1的issue_slip行。

例如，

IS/001 - 1，
IS/001 - 0，
IS/001 - 1

应该显示，并且不会被过滤掉，因为至少有一行具有rec_status = 1。

我尝试使用filter和subset函数，但无法弄清楚如何在同一列中进行筛选。

英文:

I have a column - "issue_slip" in R dataframe - "vouchers" with values/rows such as

Issue slip:
IS/001,
IS/001,
IS/001,
IS/002,
IS/002,
IS/002

and another column "rec_status" with values 0 or 1.
Each issue_slip row can have rec_status 0 or 1.
I would like to keep only those issue_slips that have all rec_status as 0 OR 0 or 1 --> remove issue_slip rows that have all rec_status as 1.

For example,

IS/001 - 1,
IS/001 - 0 ,
IS/001 - 1

should show up and not get filtered out because at least one row has rec_status = 1

I tried using the filter and subset functions but could not figure out how to go about filtering this in the same column

答案1

得分: 1

以下是翻译好的部分：

样本数据
```r
quux <- data.frame(issue_slip = c("IS/001", "IS/001", "IS/001", "IS/002", "IS/002", "IS/002"), rec_status = c(0, 0, 1, 1, 1, 1))
quux
#   issue_slip rec_status
# 1     IS/001          0
# 2     IS/001          0
# 3     IS/001          1
# 4     IS/002          1
# 5     IS/002          1
# 6     IS/002          1

基本 R

ind <- ave(quux$rec_status, quux$issue_slip, FUN = function(z) any(z %in% 0)) > 0
ind
# [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
quux[ind,]
#   issue_slip rec_status
# 1     IS/001          0
# 2     IS/001          0
# 3     IS/001          1

dplyr

library(dplyr)
quux %>%
  group_by(issue_slip) %>%
  filter(any(rec_status %in% 0)) %>%
  ungroup()
# # A tibble: 3 × 2
#   issue_slip rec_status
#   <chr>           <dbl>
# 1 IS/001              0
# 2 IS/001              0
# 3 IS/001              1

data.table

library(data.table)
as.data.table(quux)[, .SD[any(rec_status %in% 0),], by = issue_slip]
#    issue_slip rec_status
#        <char>      <num>
# 1:     IS/001          0
# 2:     IS/001          0
# 3:     IS/001          1

注意，我使用 rec_status %in% 0 而不是 rec_status == 0 有一个原因：因为我们没有示例数据（通常即使有数据，也不能保证没有任何 NA），我不能确定数据中是否存在 NA。请注意，NA == 0 将返回 NA 本身，通常会导致非防御性的代码失败，但 NA %in% 0 返回假，这通常是我们需要的（我推测这是我们想要的情况）。

英文:

Sample data

quux &lt;- data.frame(issue_slip = c(&quot;IS/001&quot;, &quot;IS/001&quot;, &quot;IS/001&quot;, &quot;IS/002&quot;, &quot;IS/002&quot;, &quot;IS/002&quot;), rec_status = c(0, 0, 1, 1, 1, 1))
quux
#   issue_slip rec_status
# 1     IS/001          0
# 2     IS/001          0
# 3     IS/001          1
# 4     IS/002          1
# 5     IS/002          1
# 6     IS/002          1

base R

ind &lt;- ave(quux$rec_status, quux$issue_slip, FUN = function(z) any(z %in% 0)) &gt; 0
ind
# [1]  TRUE  TRUE  TRUE FALSE FALSE FALSE
quux[ind,]
#   issue_slip rec_status
# 1     IS/001          0
# 2     IS/001          0
# 3     IS/001          1

dplyr

library(dplyr)
quux %&gt;%
  group_by(issue_slip) %&gt;%
  filter(any(rec_status %in% 0)) %&gt;%
  ungroup()
# # A tibble: 3 &#215; 2
#   issue_slip rec_status
#   &lt;chr&gt;           &lt;dbl&gt;
# 1 IS/001              0
# 2 IS/001              0
# 3 IS/001              1

data.table

library(data.table)
as.data.table(quux)[, .SD[any(rec_status %in% 0),], by = issue_slip]
#    issue_slip rec_status
#        &lt;char&gt;      &lt;num&gt;
# 1:     IS/001          0
# 2:     IS/001          0
# 3:     IS/001          1

Note, I'm using rec_status %in% 0 instead of rec_status == 0 for a reason: since we have no sample data (and often even when we do), I have no assurance that there are not any NAs in the data; note that NA == 0 will return NA itself and therefore often fail non-defensive code, but NA %in% 0 returns false, which is often what we need (and I'm inferring it's what we want here).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

根据另一列的两个条件在R中筛选一列。

问题

答案1

基本 R

dplyr

data.table

base R

dplyr

data.table

如何使用Spark Scala UDF函数将文本文件转换为Spark DataFrame

如何使用rvest计算图标数量？

在折线图中的特定点创建另一个符号。

如何使用VBA编写一个“或”筛选器？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

发表评论