只保留与另一列匹配的一个列中的值在R中?

huangapple go评论65阅读模式
英文:

How to keep only values in one column that match the other column in R?

问题

  1. 这里是我的df
  2. ```r
  3. df <- data.frame(
  4. lifetime = c(
  5. "烟草,酒精,大麻,可卡因,兴奋剂",
  6. "烟草,酒精,大麻,可卡因,兴奋剂,吸入剂",
  7. "烟草,酒精,兴奋剂,处方药",
  8. "烟草,酒精,可卡因,兴奋剂",
  9. "烟草,酒精,大麻,阿片类药物,可卡因,兴奋剂,处方药,幻觉,解离,安定剂,吸入剂",
  10. "烟草,酒精,大麻,兴奋剂"
  11. ),
  12. remission = c(
  13. "处方药",
  14. "大麻,阿片类药物,处方药,幻觉,解离,安定剂,吸入剂",
  15. "烟草,大麻,阿片类药物,可卡因,幻觉,解离,安定剂,吸入剂",
  16. "大麻,阿片类药物,可卡因,处方药,幻觉,解离,安定剂,吸入剂",
  17. "酒精,可卡因,兴奋剂,处方药,幻觉,解离,安定剂,吸入剂",
  18. "大麻,阿片类药物,可卡因,处方药,幻觉,解离,安定剂,吸入剂"
  19. )
  20. )

我想匹配这两列,并且:

如果某物质在lifetime中存在,应保留在remission列中。
如果某物质在lifetime中存在,应在remission列中删除。
如果没有匹配项,remission列应返回空值。

我可以让完全匹配的情况正常工作,但找不到有关部分匹配然后保留和删除值的任何信息。

  1. <details>
  2. <summary>英文:</summary>
  3. Here is my df:

df <- data.frame(
lifetime = c(
"tobacco,alcohol,cannabis,cocaine,stim",
"tobacco,alcohol,cannabis,cocaine,stim,inhal",
"tobacco,alcohol,stim,rx",
"tobacco,alcohol,cocaine,stim",
"tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal",
"tobacco,alcohol,cannabis,stim"
),
remission = c(
"rx",
"cannabis,opioids,rx,halluc,dissoc,tranq,inhal",
"tobacco,cannabis,opioids,cocaine,halluc,dissoc,tranq,inhal",
"cannabis,opioids,cocaine,rx,halluc,dissoc,tranq,inhal",
"alcohol,cocaine,stim,rx,halluc,dissoc,tranq,inhal",
"cannabis,opioids,cocaine,rx,halluc,dissoc,tranq,inhal"
)
)

  1. I want to match the two columns and
  2. &gt; if a substance is present lifetime, should be kept in remission column
  3. &gt; if a substance is present in lifetime, it should be dropped in remission column
  4. &gt; if nothing matches, the column remission should return empty.
  5. I can get if its a complete match to work, but can&#39;t find anything about partial matches and then keeping and dropping values
  6. </details>
  7. # 答案1
  8. **得分**: 1
  9. 以下是翻译好的内容:
  10. ```R
  11. library(dplyr)
  12. library(tidyr)
  13. library(stringr)
  14. df %>%
  15. mutate(id = row_number()) %>%
  16. separate_rows(lifetime, sep = ",") %>%
  17. separate_rows(remission, sep= ",") %>%
  18. group_by(id) %>%
  19. mutate(remission = case_when(
  20. str_detect(lifetime, paste(remission, collapse = "|")) ~ lifetime
  21. )) %>%
  22. distinct(remission, .keep_all = TRUE) %>%
  23. filter(!is.na(remission) | max(row_number()) == 1) %>%
  24. summarise(remission = toString(remission)) %>%
  25. right_join(df %>%
  26. mutate(id = row_number()), by = "id") %>%
  27. select(lifetime, remission = remission.x)
  1. lifetime remission
  2. <chr> <chr>
  3. 1 tobacco,alcohol,cannabis,cocaine,stim NA
  4. 2 tobacco,alcohol,cannabis,cocaine,stim,inhal cannabis, inhal
  5. 3 tobacco,alcohol,stim,rx tobacco
  6. 4 tobacco,alcohol,cocaine,stim cocaine
  7. 5 tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal alcohol, cocaine, stim, rx, halluc, dissoc, tranq, inhal
  8. 6 tobacco,alcohol,cannabis,stim cannabis

希望这对你有帮助。如果有其他问题,请告诉我。

英文:

Here is one way how we could do it:

  1. library(dplyr)
  2. library(tidyr)
  3. library(stringr)
  4. df %&gt;%
  5. mutate(id = row_number()) %&gt;%
  6. separate_rows(lifetime, sep = &quot;,&quot;) %&gt;%
  7. separate_rows(remission, sep= &quot;,&quot;) %&gt;%
  8. group_by(id) %&gt;%
  9. mutate(remission = case_when(
  10. str_detect(lifetime, paste(remission, collapse = &quot;|&quot;)) ~ lifetime
  11. )) %&gt;%
  12. distinct(remission, .keep_all = TRUE) %&gt;%
  13. filter(!is.na(remission) | max(row_number()) == 1) %&gt;%
  14. summarise(remission = toString(remission)) %&gt;%
  15. right_join(df %&gt;%
  16. mutate(id = row_number()), by = &quot;id&quot;) %&gt;%
  17. select(lifetime, remission = remission.x)
  1. lifetime remission
  2. &lt;chr&gt; &lt;chr&gt;
  3. 1 tobacco,alcohol,cannabis,cocaine,stim NA
  4. 2 tobacco,alcohol,cannabis,cocaine,stim,inhal cannabis, inhal
  5. 3 tobacco,alcohol,stim,rx tobacco
  6. 4 tobacco,alcohol,cocaine,stim cocaine
  7. 5 tobacco,alcohol,cannabis,opioids,cocaine,stim,rx,halluc,dissoc,tranq,inhal alcohol, cocaine, stim, rx, halluc, dissoc, tranq, inhal
  8. 6 tobacco,alcohol,cannabis,stim cannabis

huangapple
  • 本文由 发表于 2023年4月20日 01:36:20
  • 转载请务必保留本文链接:https://go.coder-hub.com/76057369.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定