英文:
How to keep only values in one column that match the other column in R?
问题
这里是我的df:
```r
df <- data.frame(
lifetime = c(
"烟草,酒精,大麻,可卡因,兴奋剂",
"烟草,酒精,大麻,可卡因,兴奋剂,吸入剂",
"烟草,酒精,兴奋剂,处方药",
"烟草,酒精,可卡因,兴奋剂",
"烟草,酒精,大麻,阿片类药物,可卡因,兴奋剂,处方药,幻觉,解离,安定剂,吸入剂",
"烟草,酒精,大麻,兴奋剂"
),
remission = c(
"处方药",
"大麻,阿片类药物,处方药,幻觉,解离,安定剂,吸入剂",
"烟草,大麻,阿片类药物,可卡因,幻觉,解离,安定剂,吸入剂",
"大麻,阿片类药物,可卡因,处方药,幻觉,解离,安定剂,吸入剂",
"酒精,可卡因,兴奋剂,处方药,幻觉,解离,安定剂,吸入剂",
"大麻,阿片类药物,可卡因,处方药,幻觉,解离,安定剂,吸入剂"
)
)
我想匹配这两列,并且:
如果某物质在lifetime中存在,应保留在remission列中。
如果某物质在lifetime中存在,应在remission列中删除。
如果没有匹配项,remission列应返回空值。
我可以让完全匹配的情况正常工作,但找不到有关部分匹配然后保留和删除值的任何信息。
<details>
<summary>英文:</summary>
Here is my df:
df <- data.frame(
lifetime = c(
"tobacco,alcohol,cannabis,cocaine,stim",
"tobacco,alcohol,cannabis,cocaine,stim,inhal",
"tobacco,alcohol,stim,rx",
"tobacco,alcohol,cocaine,stim",
"tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal",
"tobacco,alcohol,cannabis,stim"
),
remission = c(
"rx",
"cannabis,opioids,rx,halluc,dissoc,tranq,inhal",
"tobacco,cannabis,opioids,cocaine,halluc,dissoc,tranq,inhal",
"cannabis,opioids,cocaine,rx,halluc,dissoc,tranq,inhal",
"alcohol,cocaine,stim,rx,halluc,dissoc,tranq,inhal",
"cannabis,opioids,cocaine,rx,halluc,dissoc,tranq,inhal"
)
)
I want to match the two columns and
> if a substance is present lifetime, should be kept in remission column
> if a substance is present in lifetime, it should be dropped in remission column
> if nothing matches, the column remission should return empty.
I can get if its a complete match to work, but can't find anything about partial matches and then keeping and dropping values
</details>
# 答案1
**得分**: 1
以下是翻译好的内容:
```R
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(id = row_number()) %>%
separate_rows(lifetime, sep = ",") %>%
separate_rows(remission, sep= ",") %>%
group_by(id) %>%
mutate(remission = case_when(
str_detect(lifetime, paste(remission, collapse = "|")) ~ lifetime
)) %>%
distinct(remission, .keep_all = TRUE) %>%
filter(!is.na(remission) | max(row_number()) == 1) %>%
summarise(remission = toString(remission)) %>%
right_join(df %>%
mutate(id = row_number()), by = "id") %>%
select(lifetime, remission = remission.x)
lifetime remission
<chr> <chr>
1 tobacco,alcohol,cannabis,cocaine,stim NA
2 tobacco,alcohol,cannabis,cocaine,stim,inhal cannabis, inhal
3 tobacco,alcohol,stim,rx tobacco
4 tobacco,alcohol,cocaine,stim cocaine
5 tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal alcohol, cocaine, stim, rx, halluc, dissoc, tranq, inhal
6 tobacco,alcohol,cannabis,stim cannabis
希望这对你有帮助。如果有其他问题,请告诉我。
英文:
Here is one way how we could do it:
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(id = row_number()) %>%
separate_rows(lifetime, sep = ",") %>%
separate_rows(remission, sep= ",") %>%
group_by(id) %>%
mutate(remission = case_when(
str_detect(lifetime, paste(remission, collapse = "|")) ~ lifetime
)) %>%
distinct(remission, .keep_all = TRUE) %>%
filter(!is.na(remission) | max(row_number()) == 1) %>%
summarise(remission = toString(remission)) %>%
right_join(df %>%
mutate(id = row_number()), by = "id") %>%
select(lifetime, remission = remission.x)
lifetime remission
<chr> <chr>
1 tobacco,alcohol,cannabis,cocaine,stim NA
2 tobacco,alcohol,cannabis,cocaine,stim,inhal cannabis, inhal
3 tobacco,alcohol,stim,rx tobacco
4 tobacco,alcohol,cocaine,stim cocaine
5 tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal alcohol, cocaine, stim, rx, halluc, dissoc, tranq, inhal
6 tobacco,alcohol,cannabis,stim cannabis
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论