合并两个数据框,如果一个字符串列表匹配,则将不匹配的字符串列为NA。

huangapple go评论64阅读模式
英文:

Merge two data frames if a list of string matches and list the unmatch string as NA R

问题

我有两个数据框,并且如果Drug_NameDrugz的字符串匹配,我想要在同一行中为第二个数据框分配或粘贴字符串。

注意:
如果我使用left_joinmerge merge(df1, df2, all.x = TRUE, by.x = "Drug_Names", by.y = "Drugz") 进行操作,它会返回与DF1类似的完全相同的数据框。

以下是所需DF的示例:

DF1

>     Pat_ID       Date      Code      Drug_Names
>     AB1     2010-12-09     1.1.1     Alpha
>     AB1     2010-12-15     1.1.1     Alpha
>     AB1     2010-12-15     1.1.1     Beta
>     Ax2     2010-12-09     1.1.1     Beta
>     Ax2     2010-12-17     1.1.1     Beta
>     Aq3     2011-02-09     1.1.1     Gamma
>     Aq3     2011-04-25     1.1.1     Gamma
>     Aw4     2011-04-25     1.1.1     Tango

DF2

Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma

所需DF3

Pat_ID       Date      Code    Drug_Names    Drugz     Codez 
>     AB1     2010-12-09     1.1.1     Alpha      Alpha     1.1.1
>     AB1     2010-12-15     1.1.1     Alpha      Alpha     1.1.1
>     AB1     2010-12-15     1.1.1     Beta        NA         NA
>     Ax2     2010-12-09     1.1.1     Beta        NA         NA
>     Ax2     2010-12-17     1.1.1     Beta        NA         NA
>     Aq3     2011-02-09     1.1.1     Gamma      Gamma      1.1.3
>     Aq3     2011-04-25     1.1.1     Gamma      Gamma      1.1.3
>     Aw4     2011-04-25     1.1.1     Tango       NA         NA
英文:

I have two data frames and I want to assign or paste the string of second data frame in the same row if the string of Drug_Name and Drugz matches.

NOTE:
If I do it by left_join or merge merge(df1, df2, all.x = TRUE, by.x = "Drug_Names", by.y = "Drugz"), it returns me the exact same dataframe that is similar to DF1.

Below is the example of with required DF

DF1

>     Pat_ID       Date      Code      Drug_Names
>     AB1     2010-12-09     1.1.1     Alpha
>     AB1     2010-12-15     1.1.1     Alpha
>     AB1     2010-12-15     1.1.1     Beta
>     Ax2     2010-12-09     1.1.1     Beta
>     Ax2     2010-12-17     1.1.1     Beta
>     Aq3     2011-02-09     1.1.1     Gamma
>     Aq3     2011-04-25     1.1.1     Gamma
>     Aw4     2011-04-25     1.1.1     Tango

DF2

Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma

Required DF3

> Pat_ID Date Code Drug_Names Drugz Codez
> AB1 2010-12-09 1.1.1 Alpha Alpha 1.1.1
> AB1 2010-12-15 1.1.1 Alpha Alpha 1.1.1
> AB1 2010-12-15 1.1.1 Beta NA NA
> Ax2 2010-12-09 1.1.1 Beta NA NA
> Ax2 2010-12-17 1.1.1 Beta NA NA
> Aq3 2011-02-09 1.1.1 Gamma Gamma 1.1.3
> Aq3 2011-04-25 1.1.1 Gamma Gamma 1.1.3
> Aw4 2011-04-25 1.1.1 Tango NA NA

答案1

得分: 1

df1 <- read.table(text = "
Pat_ID       Date      Code      Drug_Names
AB1     2010-12-09     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Beta
Ax2     2010-12-09     1.1.1     Beta
Ax2     2010-12-17     1.1.1     Beta
Aq3     2011-02-09     1.1.1     Gamma
Aq3     2011-04-25     1.1.1     Gamma
Aw4     2011-04-25     1.1.1     Tango", header = TRUE, stringsAsFactors = FALSE)  %>%  
as_tibble() %>%
mutate(Date = as.Date(Date))

df2 <- read.table(text = "
Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma", header = TRUE, stringsAsFactors = FALSE)  %>%
as_tibble()
df1  %>%
left_join(df2, by = c("Code" = "Codez", "Drug_Names" = "Drugz"), keep = TRUE)

# A tibble: 8 × 6
  Pat_ID Date       Code  Drug_Names Codez Drugz
  <chr>  <date>     <chr> <chr>      <chr> <chr>
1 AB1    2010-12-09 1.1.1 Alpha      1.1.1 Alpha
2 AB1    2010-12-15 1.1.1 Alpha      1.1.1 Alpha
3 AB1    2010-12-15 1.1.1 Beta       NA    NA   
4 Ax2    2010-12-09 1.1.1 Beta       NA    NA   
5 Ax2    2010-12-17 1.1.1 Beta       NA    NA   
6 Aq3    2011-02-09 1.1.1 Gamma      NA    NA   
7 Aq3    2011-04-25 1.1.1 Gamma      NA    NA   
8 Aw4    2011-04-25 1.1.1 Tango      NA    NA      
英文:
df1 &lt;- read.table(text = &quot;
Pat_ID       Date      Code      Drug_Names
AB1     2010-12-09     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Beta
Ax2     2010-12-09     1.1.1     Beta
Ax2     2010-12-17     1.1.1     Beta
Aq3     2011-02-09     1.1.1     Gamma
Aq3     2011-04-25     1.1.1     Gamma
Aw4     2011-04-25     1.1.1     Tango&quot;, header = TRUE, stringsAsFactors = FALSE)  %&gt;%  
as_tibble() %&gt;%
mutate(Date = as.Date(Date))

df2 &lt;- read.table(text = &quot;
Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma&quot;, header = TRUE, stringsAsFactors = FALSE)  %&gt;%
as_tibble()
df1  %&gt;%
left_join(df2, by = c(&quot;Code&quot; = &quot;Codez&quot;, &quot;Drug_Names&quot; = &quot;Drugz&quot;), keep = TRUE)

# A tibble: 8 &#215; 6
  Pat_ID Date       Code  Drug_Names Codez Drugz
  &lt;chr&gt;  &lt;date&gt;     &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt; &lt;chr&gt;
1 AB1    2010-12-09 1.1.1 Alpha      1.1.1 Alpha
2 AB1    2010-12-15 1.1.1 Alpha      1.1.1 Alpha
3 AB1    2010-12-15 1.1.1 Beta       NA    NA   
4 Ax2    2010-12-09 1.1.1 Beta       NA    NA   
5 Ax2    2010-12-17 1.1.1 Beta       NA    NA   
6 Aq3    2011-02-09 1.1.1 Gamma      NA    NA   
7 Aq3    2011-04-25 1.1.1 Gamma      NA    NA   
8 Aw4    2011-04-25 1.1.1 Tango      NA    NA      

答案2

得分: 1

你可以使用 match

cbind(DF1, DF2[match(DF1$Drug_Names, DF2$Drugz),2:1])
#     Pat_ID       Date  Code Drug_Names Drugz Codez
#1       AB1 2010-12-09 1.1.1      Alpha Alpha 1.1.1
#1.1     AB1 2010-12-15 1.1.1      Alpha Alpha 1.1.1
#NA      AB1 2010-12-15 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.1    Ax2 2010-12-09 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.2    Ax2 2010-12-17 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#2       Aq3 2011-02-09 1.1.1      Gamma Gamma 1.1.3
#2.1     Aq3 2011-04-25 1.1.1      Gamma Gamma 1.1.3
#NA.3    Aw4 2011-04-25 1.1.1      Tango  &lt;NA&gt;  &lt;NA&gt;

或者如果使用 merge,可以将列 Drug_Names 添加到 DF2。

merge(DF1, cbind(Drug_Names = DF2$Drugz, DF2), all.x = TRUE)
#  Drug_Names Pat_ID       Date  Code Codez Drugz
#1      Alpha    AB1 2010-12-09 1.1.1 1.1.1 Alpha
#2      Alpha    AB1 2010-12-15 1.1.1 1.1.1 Alpha
#3       Beta    AB1 2010-12-15 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#4       Beta    Ax2 2010-12-09 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#5       Beta    Ax2 2010-12-17 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#6      Gamma    Aq3 2011-02-09 1.1.1 1.1.3 Gamma
#7      Gamma    Aq3 2011-04-25 1.1.1 1.1.3 Gamma
#8      Tango    Aw4 2011-04-25 1.1.1  &lt;NA&gt;  &lt;NA&gt;

数据

DF1 <- read.table(header=TRUE, text="Pat_ID       Date      Code      Drug_Names
     AB1     2010-12-09     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Beta
     Ax2     2010-12-09     1.1.1     Beta
     Ax2     2010-12-17     1.1.1     Beta
     Aq3     2011-02-09     1.1.1     Gamma
     Aq3     2011-04-25     1.1.1     Gamma
     Aw4     2011-04-25     1.1.1     Tango")

DF2 <- read.table(header=TRUE, text="Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma")
英文:

You can use match.

cbind(DF1, DF2[match(DF1$Drug_Names, DF2$Drugz),2:1])
#     Pat_ID       Date  Code Drug_Names Drugz Codez
#1       AB1 2010-12-09 1.1.1      Alpha Alpha 1.1.1
#1.1     AB1 2010-12-15 1.1.1      Alpha Alpha 1.1.1
#NA      AB1 2010-12-15 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.1    Ax2 2010-12-09 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.2    Ax2 2010-12-17 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#2       Aq3 2011-02-09 1.1.1      Gamma Gamma 1.1.3
#2.1     Aq3 2011-04-25 1.1.1      Gamma Gamma 1.1.3
#NA.3    Aw4 2011-04-25 1.1.1      Tango  &lt;NA&gt;  &lt;NA&gt;

Or in case using merge add the column Drug_Names to DF2.

merge(DF1, cbind(Drug_Names = DF2$Drugz, DF2), all.x = TRUE)
#  Drug_Names Pat_ID       Date  Code Codez Drugz
#1      Alpha    AB1 2010-12-09 1.1.1 1.1.1 Alpha
#2      Alpha    AB1 2010-12-15 1.1.1 1.1.1 Alpha
#3       Beta    AB1 2010-12-15 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#4       Beta    Ax2 2010-12-09 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#5       Beta    Ax2 2010-12-17 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#6      Gamma    Aq3 2011-02-09 1.1.1 1.1.3 Gamma
#7      Gamma    Aq3 2011-04-25 1.1.1 1.1.3 Gamma
#8      Tango    Aw4 2011-04-25 1.1.1  &lt;NA&gt;  &lt;NA&gt;

Data

DF1 &lt;- read.table(header=TRUE, text=&quot;Pat_ID       Date      Code      Drug_Names
     AB1     2010-12-09     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Beta
     Ax2     2010-12-09     1.1.1     Beta
     Ax2     2010-12-17     1.1.1     Beta
     Aq3     2011-02-09     1.1.1     Gamma
     Aq3     2011-04-25     1.1.1     Gamma
     Aw4     2011-04-25     1.1.1     Tango&quot;)

DF2 &lt;- read.table(header=TRUE, text=&quot;Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma&quot;)

huangapple
  • 本文由 发表于 2023年7月3日 17:12:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/76603382.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定