只保留与另一列匹配的一个列中的值在R中?

huangapple go评论59阅读模式
英文:

How to keep only values in one column that match the other column in R?

问题

这里是我的df:

```r
df <- data.frame(
  lifetime = c(
    "烟草,酒精,大麻,可卡因,兴奋剂",
    "烟草,酒精,大麻,可卡因,兴奋剂,吸入剂",
    "烟草,酒精,兴奋剂,处方药",
    "烟草,酒精,可卡因,兴奋剂",
    "烟草,酒精,大麻,阿片类药物,可卡因,兴奋剂,处方药,幻觉,解离,安定剂,吸入剂",
    "烟草,酒精,大麻,兴奋剂"
  ),
  remission = c(
    "处方药",
    "大麻,阿片类药物,处方药,幻觉,解离,安定剂,吸入剂",
    "烟草,大麻,阿片类药物,可卡因,幻觉,解离,安定剂,吸入剂",
    "大麻,阿片类药物,可卡因,处方药,幻觉,解离,安定剂,吸入剂",
    "酒精,可卡因,兴奋剂,处方药,幻觉,解离,安定剂,吸入剂",
    "大麻,阿片类药物,可卡因,处方药,幻觉,解离,安定剂,吸入剂"
  )
)

我想匹配这两列,并且:

如果某物质在lifetime中存在,应保留在remission列中。
如果某物质在lifetime中存在,应在remission列中删除。
如果没有匹配项,remission列应返回空值。

我可以让完全匹配的情况正常工作,但找不到有关部分匹配然后保留和删除值的任何信息。


<details>
<summary>英文:</summary>
Here is my df:

df <- data.frame(
lifetime = c(
"tobacco,alcohol,cannabis,cocaine,stim",
"tobacco,alcohol,cannabis,cocaine,stim,inhal",
"tobacco,alcohol,stim,rx",
"tobacco,alcohol,cocaine,stim",
"tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal",
"tobacco,alcohol,cannabis,stim"
),
remission = c(
"rx",
"cannabis,opioids,rx,halluc,dissoc,tranq,inhal",
"tobacco,cannabis,opioids,cocaine,halluc,dissoc,tranq,inhal",
"cannabis,opioids,cocaine,rx,halluc,dissoc,tranq,inhal",
"alcohol,cocaine,stim,rx,halluc,dissoc,tranq,inhal",
"cannabis,opioids,cocaine,rx,halluc,dissoc,tranq,inhal"
)
)


I want to match the two columns and
&gt; if a substance is present lifetime, should be kept in remission column
&gt; if a substance is present in lifetime, it should be dropped in remission column
&gt; if nothing matches, the column remission should return empty.
I can get if its a complete match to work, but can&#39;t find anything about partial matches and then keeping and dropping values
</details>
# 答案1
**得分**: 1
以下是翻译好的内容:
```R
library(dplyr)
library(tidyr)
library(stringr)
df %>%
mutate(id = row_number()) %>%
separate_rows(lifetime, sep = ",") %>%
separate_rows(remission, sep= ",") %>%
group_by(id) %>%
mutate(remission = case_when(
str_detect(lifetime, paste(remission, collapse = "|")) ~ lifetime
)) %>%
distinct(remission, .keep_all = TRUE) %>%
filter(!is.na(remission) | max(row_number()) == 1) %>%
summarise(remission = toString(remission)) %>%
right_join(df %>%
mutate(id = row_number()), by = "id") %>%
select(lifetime, remission = remission.x)
  lifetime                                                                   remission                                               
  <chr>                                                                      <chr>                                                   
1 tobacco,alcohol,cannabis,cocaine,stim                                      NA                                                      
2 tobacco,alcohol,cannabis,cocaine,stim,inhal                                cannabis, inhal                                         
3 tobacco,alcohol,stim,rx                                                    tobacco                                                 
4 tobacco,alcohol,cocaine,stim                                               cocaine                                                 
5 tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal alcohol, cocaine, stim, rx, halluc, dissoc, tranq, inhal
6 tobacco,alcohol,cannabis,stim                                              cannabis

希望这对你有帮助。如果有其他问题,请告诉我。

英文:

Here is one way how we could do it:

library(dplyr)
library(tidyr)
library(stringr)
df %&gt;% 
mutate(id = row_number()) %&gt;% 
separate_rows(lifetime, sep = &quot;,&quot;) %&gt;% 
separate_rows(remission, sep= &quot;,&quot;) %&gt;% 
group_by(id) %&gt;% 
mutate(remission = case_when(
str_detect(lifetime, paste(remission, collapse = &quot;|&quot;)) ~ lifetime
)) %&gt;% 
distinct(remission, .keep_all = TRUE) %&gt;% 
filter(!is.na(remission) | max(row_number()) == 1) %&gt;% 
summarise(remission = toString(remission)) %&gt;% 
right_join(df %&gt;% 
mutate(id = row_number()), by = &quot;id&quot;) %&gt;% 
select(lifetime, remission = remission.x)
  lifetime                                                                   remission                                               
&lt;chr&gt;                                                                      &lt;chr&gt;                                                   
1 tobacco,alcohol,cannabis,cocaine,stim                                      NA                                                      
2 tobacco,alcohol,cannabis,cocaine,stim,inhal                                cannabis, inhal                                         
3 tobacco,alcohol,stim,rx                                                    tobacco                                                 
4 tobacco,alcohol,cocaine,stim                                               cocaine                                                 
5 tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal alcohol, cocaine, stim, rx, halluc, dissoc, tranq, inhal
6 tobacco,alcohol,cannabis,stim                                              cannabis      

huangapple
  • 本文由 发表于 2023年4月20日 01:36:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76057369.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定