根据另一列的两个条件在R中筛选一列。

huangapple go评论104阅读模式
英文:

Filter a column in R based on another column with 2 criteria

问题

我有一个名为“vouchers”的R数据框中的列 - “issue_slip”,其中包含以下值/行:

发行单:
IS/001,
IS/001,
IS/001,
IS/002,
IS/002,
IS/002

还有另一列“rec_status”,其值为0或1。
每个issue_slip行都可以具有rec_status为0或1。
我想只保留那些具有所有rec_status为0或0和1的issue_slips --> 删除具有所有rec_status为1的issue_slip行。

例如,

  1. IS/001 - 1,
  2. IS/001 - 0,
  3. IS/001 - 1

应该显示,并且不会被过滤掉,因为至少有一行具有rec_status = 1。

我尝试使用filter和subset函数,但无法弄清楚如何在同一列中进行筛选。

英文:

I have a column - "issue_slip" in R dataframe - "vouchers" with values/rows such as

Issue slip:
IS/001,
IS/001,
IS/001,
IS/002,
IS/002,
IS/002

and another column "rec_status" with values 0 or 1.
Each issue_slip row can have rec_status 0 or 1.
I would like to keep only those issue_slips that have all rec_status as 0 OR 0 or 1 --> remove issue_slip rows that have all rec_status as 1.

For example,

  1. IS/001 - 1,
  2. IS/001 - 0 ,
  3. IS/001 - 1

should show up and not get filtered out because at least one row has rec_status = 1

I tried using the filter and subset functions but could not figure out how to go about filtering this in the same column

答案1

得分: 1

以下是翻译好的部分:

  1. 样本数据
  2. ```r
  3. quux <- data.frame(issue_slip = c("IS/001", "IS/001", "IS/001", "IS/002", "IS/002", "IS/002"), rec_status = c(0, 0, 1, 1, 1, 1))
  4. quux
  5. # issue_slip rec_status
  6. # 1 IS/001 0
  7. # 2 IS/001 0
  8. # 3 IS/001 1
  9. # 4 IS/002 1
  10. # 5 IS/002 1
  11. # 6 IS/002 1

基本 R

  1. ind <- ave(quux$rec_status, quux$issue_slip, FUN = function(z) any(z %in% 0)) > 0
  2. ind
  3. # [1] TRUE TRUE TRUE FALSE FALSE FALSE
  4. quux[ind,]
  5. # issue_slip rec_status
  6. # 1 IS/001 0
  7. # 2 IS/001 0
  8. # 3 IS/001 1

dplyr

  1. library(dplyr)
  2. quux %>%
  3. group_by(issue_slip) %>%
  4. filter(any(rec_status %in% 0)) %>%
  5. ungroup()
  6. # # A tibble: 3 × 2
  7. # issue_slip rec_status
  8. # <chr> <dbl>
  9. # 1 IS/001 0
  10. # 2 IS/001 0
  11. # 3 IS/001 1

data.table

  1. library(data.table)
  2. as.data.table(quux)[, .SD[any(rec_status %in% 0),], by = issue_slip]
  3. # issue_slip rec_status
  4. # <char> <num>
  5. # 1: IS/001 0
  6. # 2: IS/001 0
  7. # 3: IS/001 1

注意,我使用 rec_status %in% 0 而不是 rec_status == 0 有一个原因:因为我们没有示例数据(通常即使有数据,也不能保证没有任何 NA),我不能确定数据中是否存在 NA。请注意,NA == 0 将返回 NA 本身,通常会导致非防御性的代码失败,但 NA %in% 0 返回假,这通常是我们需要的(我推测这是我们想要的情况)。

英文:

Sample data

  1. quux &lt;- data.frame(issue_slip = c(&quot;IS/001&quot;, &quot;IS/001&quot;, &quot;IS/001&quot;, &quot;IS/002&quot;, &quot;IS/002&quot;, &quot;IS/002&quot;), rec_status = c(0, 0, 1, 1, 1, 1))
  2. quux
  3. # issue_slip rec_status
  4. # 1 IS/001 0
  5. # 2 IS/001 0
  6. # 3 IS/001 1
  7. # 4 IS/002 1
  8. # 5 IS/002 1
  9. # 6 IS/002 1

base R

  1. ind &lt;- ave(quux$rec_status, quux$issue_slip, FUN = function(z) any(z %in% 0)) &gt; 0
  2. ind
  3. # [1] TRUE TRUE TRUE FALSE FALSE FALSE
  4. quux[ind,]
  5. # issue_slip rec_status
  6. # 1 IS/001 0
  7. # 2 IS/001 0
  8. # 3 IS/001 1

dplyr

  1. library(dplyr)
  2. quux %&gt;%
  3. group_by(issue_slip) %&gt;%
  4. filter(any(rec_status %in% 0)) %&gt;%
  5. ungroup()
  6. # # A tibble: 3 &#215; 2
  7. # issue_slip rec_status
  8. # &lt;chr&gt; &lt;dbl&gt;
  9. # 1 IS/001 0
  10. # 2 IS/001 0
  11. # 3 IS/001 1

data.table

  1. library(data.table)
  2. as.data.table(quux)[, .SD[any(rec_status %in% 0),], by = issue_slip]
  3. # issue_slip rec_status
  4. # &lt;char&gt; &lt;num&gt;
  5. # 1: IS/001 0
  6. # 2: IS/001 0
  7. # 3: IS/001 1

Note, I'm using rec_status %in% 0 instead of rec_status == 0 for a reason: since we have no sample data (and often even when we do), I have no assurance that there are not any NAs in the data; note that NA == 0 will return NA itself and therefore often fail non-defensive code, but NA %in% 0 returns false, which is often what we need (and I'm inferring it's what we want here).

huangapple
  • 本文由 发表于 2023年1月9日 09:41:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/75052507.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定