英文:
Filter only duplicated rows based on condition
问题
Sure, here's the dynamically generated code to achieve your desired output using the tidyverse approach:
library(dplyr)
df %>%
group_by(ID) %>%
filter(all(c("NoPlan", "Plan") %in% Q40) | n() == 1) %>%
ungroup()
This code groups the data by the "ID" column, then filters the groups to keep rows where either all unique values in the "Q40" column are "NoPlan" and "Plan" or there is only one row in the group. Finally, it ungroups the data to get the desired output.
英文:
In the data below, I want to filter the duplicated participants (Ana, Paul) based on their answer to Q40. Participants should have answered only "NoPlan" or "Plan", but sometimes they've answered both. I want to keep only the rows in each the duplicated participants have answered "Plan"
- The data is:
ID Q40
<chr> <chr>
1 Maria Plan
2 Joseph Plan
3 Cyntia NoPlan
4 Ana NoPlan ## first noPlan
5 Paul Plan ## first Plan
6 Ana Plan ## then, Plan
7 Paul NoPlan ## then, noPlan
- Desired Output:
ID Q40
<chr> <chr>
1 Maria Plan
2 Joseph Plan
3 Cyntia NoPlan
5 Paul Plan
6 Ana Plan
- I was doing it manually, like this:
df %>%
filter(Q40 %in% c('noPlan', 'Plan')) %>%
ungroup() %>%
distinct()
then I was removing manually the duplicated ones with (slice()) and their corresponding position.
QUESTION
1: How can I dinamically remove the rows only of the duplicated participants in ID (who answered "noPlan" and "plan" at the same time), keeping only their "Plan" row? I'm sure this is easy to do with tidyverse
, but I'm struggling with it. I really need a tidyverse
approach.
Important note:
1 I don't want to filter all "NoPlan" rows, I only want to filter the "Plan" rows of the participants who have answered both NoPlan AND Plan. Only the duplicated ones, not the whole sample.
2 I don't want to keep only the first row such as with distinc()
, I want to be able to set the condition myself (in this case, to filter "NoPlan" from the duplicated participants
3 I cannot use distinct(starwars, pick(contains("color")))
because this would remove "NoPlan" from all participants and I need to filter only the duplicated ones
- data:
structure(list(ID = c("Joseph", "Cyntia", "Paul", "Ana", "Paul",
"Maria", "Ana"), Q40 = c("Plan", "NoPlan", "Plan", "Plan", "NoPlan",
"Plan", "NoPlan")), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
答案1
得分: 2
?duplicated
应该在这里有帮助:
dat %>%
filter(!((duplicated(ID) | duplicated(ID, fromLast=TRUE)) & Q40 == "NoPlan"))
或者用基本的R语言:
dat[with(dat, !((duplicated(ID) | duplicated(ID, fromLast=TRUE)) & Q40 == "NoPlan")),]
英文:
?duplicated
should aid you here:
dat %>%
filter(!((duplicated(ID) | duplicated(ID, fromLast=TRUE)) & Q40 == "NoPlan"))
Or in base R speak:
dat[with(dat, !((duplicated(ID) | duplicated(ID, fromLast=TRUE)) & Q40 == "NoPlan")),]
答案2
得分: 1
这是你想要的翻译:
听起来你更喜欢使用 `tidyverse` 的方法,但这里提供了一个使用基本的 R 语言实现的分割-操作-合并的解决方案。你通过 `ID` 进行分割;在具有多行的子集中,选择包含 `Q40` 中“Plan”的行;然后使用 `do.call("rbind", ...)` 将其余的行重新组合。
dat <- structure(list(ID = c("Joseph", "Cyntia", "Paul", "Ana", "Paul", "Maria", "Ana"), Q40 = c("Plan", "NoPlan", "Plan", "Plan", "NoPlan", "Plan", "NoPlan")), row.names = c(NA, -7L), class = c("tbl_df", "tbl", "data.frame"))
do.call("rbind", lapply(split(dat, dat$ID), function(x) {
if(nrow(x) > 1) {
x[x$Q40 == "Plan",]
} else {
x
}
}))
以下是其产生的结果:
ID Q40
* <chr> <chr>
1 Ana Plan
2 Cyntia NoPlan
3 Joseph Plan
4 Maria Plan
5 Paul Plan
英文:
It sounds like you'd prefer a tidyverse
answer, but here's a base R solution using the split-operate-recombine approach. You split by ID
; in subsets with multiple rows, select the rows with "Plan" in Q40
; then use do.call("rbind", ...)
to put the remaining rows back together.
dat <- structure(list(ID = c("Joseph", "Cyntia", "Paul", "Ana", "Paul",
"Maria", "Ana"), Q40 = c("Plan", "NoPlan", "Plan", "Plan", "NoPlan",
"Plan", "NoPlan")), row.names = c(NA, -7L), class = c("tbl_df",
"tbl", "data.frame"))
do.call("rbind", lapply(split(dat, dat$ID), function(x) {
if(nrow(x) > 1) {
x[x$Q40 == "Plan",]
} else {
x
}
}))
And here's what that produces:
ID Q40
* <chr> <chr>
1 Ana Plan
2 Cyntia NoPlan
3 Joseph Plan
4 Maria Plan
5 Paul Plan
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论