合并两个数据框,如果一个字符串列表匹配,则将不匹配的字符串列为NA。

huangapple go评论98阅读模式
英文:

Merge two data frames if a list of string matches and list the unmatch string as NA R

问题

我有两个数据框,并且如果Drug_NameDrugz的字符串匹配,我想要在同一行中为第二个数据框分配或粘贴字符串。

注意:
如果我使用left_joinmerge merge(df1, df2, all.x = TRUE, by.x = "Drug_Names", by.y = "Drugz") 进行操作,它会返回与DF1类似的完全相同的数据框。

以下是所需DF的示例:

DF1

  1. > Pat_ID Date Code Drug_Names
  2. > AB1 2010-12-09 1.1.1 Alpha
  3. > AB1 2010-12-15 1.1.1 Alpha
  4. > AB1 2010-12-15 1.1.1 Beta
  5. > Ax2 2010-12-09 1.1.1 Beta
  6. > Ax2 2010-12-17 1.1.1 Beta
  7. > Aq3 2011-02-09 1.1.1 Gamma
  8. > Aq3 2011-04-25 1.1.1 Gamma
  9. > Aw4 2011-04-25 1.1.1 Tango

DF2

  1. Codez Drugz
  2. 1.1.1 Alpha
  3. 1.1.3 Gamma

所需DF3

  1. Pat_ID Date Code Drug_Names Drugz Codez
  1. > AB1 2010-12-09 1.1.1 Alpha Alpha 1.1.1
  2. > AB1 2010-12-15 1.1.1 Alpha Alpha 1.1.1
  3. > AB1 2010-12-15 1.1.1 Beta NA NA
  4. > Ax2 2010-12-09 1.1.1 Beta NA NA
  5. > Ax2 2010-12-17 1.1.1 Beta NA NA
  6. > Aq3 2011-02-09 1.1.1 Gamma Gamma 1.1.3
  7. > Aq3 2011-04-25 1.1.1 Gamma Gamma 1.1.3
  8. > Aw4 2011-04-25 1.1.1 Tango NA NA
英文:

I have two data frames and I want to assign or paste the string of second data frame in the same row if the string of Drug_Name and Drugz matches.

NOTE:
If I do it by left_join or merge merge(df1, df2, all.x = TRUE, by.x = "Drug_Names", by.y = "Drugz"), it returns me the exact same dataframe that is similar to DF1.

Below is the example of with required DF

DF1

  1. > Pat_ID Date Code Drug_Names
  2. > AB1 2010-12-09 1.1.1 Alpha
  3. > AB1 2010-12-15 1.1.1 Alpha
  4. > AB1 2010-12-15 1.1.1 Beta
  5. > Ax2 2010-12-09 1.1.1 Beta
  6. > Ax2 2010-12-17 1.1.1 Beta
  7. > Aq3 2011-02-09 1.1.1 Gamma
  8. > Aq3 2011-04-25 1.1.1 Gamma
  9. > Aw4 2011-04-25 1.1.1 Tango

DF2

  1. Codez Drugz
  2. 1.1.1 Alpha
  3. 1.1.3 Gamma

Required DF3

> Pat_ID Date Code Drug_Names Drugz Codez
> AB1 2010-12-09 1.1.1 Alpha Alpha 1.1.1
> AB1 2010-12-15 1.1.1 Alpha Alpha 1.1.1
> AB1 2010-12-15 1.1.1 Beta NA NA
> Ax2 2010-12-09 1.1.1 Beta NA NA
> Ax2 2010-12-17 1.1.1 Beta NA NA
> Aq3 2011-02-09 1.1.1 Gamma Gamma 1.1.3
> Aq3 2011-04-25 1.1.1 Gamma Gamma 1.1.3
> Aw4 2011-04-25 1.1.1 Tango NA NA

答案1

得分: 1

  1. df1 <- read.table(text = "
  2. Pat_ID Date Code Drug_Names
  3. AB1 2010-12-09 1.1.1 Alpha
  4. AB1 2010-12-15 1.1.1 Alpha
  5. AB1 2010-12-15 1.1.1 Beta
  6. Ax2 2010-12-09 1.1.1 Beta
  7. Ax2 2010-12-17 1.1.1 Beta
  8. Aq3 2011-02-09 1.1.1 Gamma
  9. Aq3 2011-04-25 1.1.1 Gamma
  10. Aw4 2011-04-25 1.1.1 Tango", header = TRUE, stringsAsFactors = FALSE) %>%
  11. as_tibble() %>%
  12. mutate(Date = as.Date(Date))
  13. df2 <- read.table(text = "
  14. Codez Drugz
  15. 1.1.1 Alpha
  16. 1.1.3 Gamma", header = TRUE, stringsAsFactors = FALSE) %>%
  17. as_tibble()
  18. df1 %>%
  19. left_join(df2, by = c("Code" = "Codez", "Drug_Names" = "Drugz"), keep = TRUE)
  20. # A tibble: 8 × 6
  21. Pat_ID Date Code Drug_Names Codez Drugz
  22. <chr> <date> <chr> <chr> <chr> <chr>
  23. 1 AB1 2010-12-09 1.1.1 Alpha 1.1.1 Alpha
  24. 2 AB1 2010-12-15 1.1.1 Alpha 1.1.1 Alpha
  25. 3 AB1 2010-12-15 1.1.1 Beta NA NA
  26. 4 Ax2 2010-12-09 1.1.1 Beta NA NA
  27. 5 Ax2 2010-12-17 1.1.1 Beta NA NA
  28. 6 Aq3 2011-02-09 1.1.1 Gamma NA NA
  29. 7 Aq3 2011-04-25 1.1.1 Gamma NA NA
  30. 8 Aw4 2011-04-25 1.1.1 Tango NA NA
英文:
  1. df1 &lt;- read.table(text = &quot;
  2. Pat_ID Date Code Drug_Names
  3. AB1 2010-12-09 1.1.1 Alpha
  4. AB1 2010-12-15 1.1.1 Alpha
  5. AB1 2010-12-15 1.1.1 Beta
  6. Ax2 2010-12-09 1.1.1 Beta
  7. Ax2 2010-12-17 1.1.1 Beta
  8. Aq3 2011-02-09 1.1.1 Gamma
  9. Aq3 2011-04-25 1.1.1 Gamma
  10. Aw4 2011-04-25 1.1.1 Tango&quot;, header = TRUE, stringsAsFactors = FALSE) %&gt;%
  11. as_tibble() %&gt;%
  12. mutate(Date = as.Date(Date))
  13. df2 &lt;- read.table(text = &quot;
  14. Codez Drugz
  15. 1.1.1 Alpha
  16. 1.1.3 Gamma&quot;, header = TRUE, stringsAsFactors = FALSE) %&gt;%
  17. as_tibble()
  18. df1 %&gt;%
  19. left_join(df2, by = c(&quot;Code&quot; = &quot;Codez&quot;, &quot;Drug_Names&quot; = &quot;Drugz&quot;), keep = TRUE)
  20. # A tibble: 8 &#215; 6
  21. Pat_ID Date Code Drug_Names Codez Drugz
  22. &lt;chr&gt; &lt;date&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt; &lt;chr&gt;
  23. 1 AB1 2010-12-09 1.1.1 Alpha 1.1.1 Alpha
  24. 2 AB1 2010-12-15 1.1.1 Alpha 1.1.1 Alpha
  25. 3 AB1 2010-12-15 1.1.1 Beta NA NA
  26. 4 Ax2 2010-12-09 1.1.1 Beta NA NA
  27. 5 Ax2 2010-12-17 1.1.1 Beta NA NA
  28. 6 Aq3 2011-02-09 1.1.1 Gamma NA NA
  29. 7 Aq3 2011-04-25 1.1.1 Gamma NA NA
  30. 8 Aw4 2011-04-25 1.1.1 Tango NA NA

答案2

得分: 1

你可以使用 match

  1. cbind(DF1, DF2[match(DF1$Drug_Names, DF2$Drugz),2:1])
  2. # Pat_ID Date Code Drug_Names Drugz Codez
  3. #1 AB1 2010-12-09 1.1.1 Alpha Alpha 1.1.1
  4. #1.1 AB1 2010-12-15 1.1.1 Alpha Alpha 1.1.1
  5. #NA AB1 2010-12-15 1.1.1 Beta &lt;NA&gt; &lt;NA&gt;
  6. #NA.1 Ax2 2010-12-09 1.1.1 Beta &lt;NA&gt; &lt;NA&gt;
  7. #NA.2 Ax2 2010-12-17 1.1.1 Beta &lt;NA&gt; &lt;NA&gt;
  8. #2 Aq3 2011-02-09 1.1.1 Gamma Gamma 1.1.3
  9. #2.1 Aq3 2011-04-25 1.1.1 Gamma Gamma 1.1.3
  10. #NA.3 Aw4 2011-04-25 1.1.1 Tango &lt;NA&gt; &lt;NA&gt;

或者如果使用 merge,可以将列 Drug_Names 添加到 DF2。

  1. merge(DF1, cbind(Drug_Names = DF2$Drugz, DF2), all.x = TRUE)
  2. # Drug_Names Pat_ID Date Code Codez Drugz
  3. #1 Alpha AB1 2010-12-09 1.1.1 1.1.1 Alpha
  4. #2 Alpha AB1 2010-12-15 1.1.1 1.1.1 Alpha
  5. #3 Beta AB1 2010-12-15 1.1.1 &lt;NA&gt; &lt;NA&gt;
  6. #4 Beta Ax2 2010-12-09 1.1.1 &lt;NA&gt; &lt;NA&gt;
  7. #5 Beta Ax2 2010-12-17 1.1.1 &lt;NA&gt; &lt;NA&gt;
  8. #6 Gamma Aq3 2011-02-09 1.1.1 1.1.3 Gamma
  9. #7 Gamma Aq3 2011-04-25 1.1.1 1.1.3 Gamma
  10. #8 Tango Aw4 2011-04-25 1.1.1 &lt;NA&gt; &lt;NA&gt;

数据

  1. DF1 <- read.table(header=TRUE, text="Pat_ID Date Code Drug_Names
  2. AB1 2010-12-09 1.1.1 Alpha
  3. AB1 2010-12-15 1.1.1 Alpha
  4. AB1 2010-12-15 1.1.1 Beta
  5. Ax2 2010-12-09 1.1.1 Beta
  6. Ax2 2010-12-17 1.1.1 Beta
  7. Aq3 2011-02-09 1.1.1 Gamma
  8. Aq3 2011-04-25 1.1.1 Gamma
  9. Aw4 2011-04-25 1.1.1 Tango")
  10. DF2 <- read.table(header=TRUE, text="Codez Drugz
  11. 1.1.1 Alpha
  12. 1.1.3 Gamma")
英文:

You can use match.

  1. cbind(DF1, DF2[match(DF1$Drug_Names, DF2$Drugz),2:1])
  2. # Pat_ID Date Code Drug_Names Drugz Codez
  3. #1 AB1 2010-12-09 1.1.1 Alpha Alpha 1.1.1
  4. #1.1 AB1 2010-12-15 1.1.1 Alpha Alpha 1.1.1
  5. #NA AB1 2010-12-15 1.1.1 Beta &lt;NA&gt; &lt;NA&gt;
  6. #NA.1 Ax2 2010-12-09 1.1.1 Beta &lt;NA&gt; &lt;NA&gt;
  7. #NA.2 Ax2 2010-12-17 1.1.1 Beta &lt;NA&gt; &lt;NA&gt;
  8. #2 Aq3 2011-02-09 1.1.1 Gamma Gamma 1.1.3
  9. #2.1 Aq3 2011-04-25 1.1.1 Gamma Gamma 1.1.3
  10. #NA.3 Aw4 2011-04-25 1.1.1 Tango &lt;NA&gt; &lt;NA&gt;

Or in case using merge add the column Drug_Names to DF2.

  1. merge(DF1, cbind(Drug_Names = DF2$Drugz, DF2), all.x = TRUE)
  2. # Drug_Names Pat_ID Date Code Codez Drugz
  3. #1 Alpha AB1 2010-12-09 1.1.1 1.1.1 Alpha
  4. #2 Alpha AB1 2010-12-15 1.1.1 1.1.1 Alpha
  5. #3 Beta AB1 2010-12-15 1.1.1 &lt;NA&gt; &lt;NA&gt;
  6. #4 Beta Ax2 2010-12-09 1.1.1 &lt;NA&gt; &lt;NA&gt;
  7. #5 Beta Ax2 2010-12-17 1.1.1 &lt;NA&gt; &lt;NA&gt;
  8. #6 Gamma Aq3 2011-02-09 1.1.1 1.1.3 Gamma
  9. #7 Gamma Aq3 2011-04-25 1.1.1 1.1.3 Gamma
  10. #8 Tango Aw4 2011-04-25 1.1.1 &lt;NA&gt; &lt;NA&gt;

Data

  1. DF1 &lt;- read.table(header=TRUE, text=&quot;Pat_ID Date Code Drug_Names
  2. AB1 2010-12-09 1.1.1 Alpha
  3. AB1 2010-12-15 1.1.1 Alpha
  4. AB1 2010-12-15 1.1.1 Beta
  5. Ax2 2010-12-09 1.1.1 Beta
  6. Ax2 2010-12-17 1.1.1 Beta
  7. Aq3 2011-02-09 1.1.1 Gamma
  8. Aq3 2011-04-25 1.1.1 Gamma
  9. Aw4 2011-04-25 1.1.1 Tango&quot;)
  10. DF2 &lt;- read.table(header=TRUE, text=&quot;Codez Drugz
  11. 1.1.1 Alpha
  12. 1.1.3 Gamma&quot;)

huangapple
  • 本文由 发表于 2023年7月3日 17:12:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603382.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定