2023年7月3日 17:12:55go评论98阅读模式

英文:

Merge two data frames if a list of string matches and list the unmatch string as NA R

问题

我有两个数据框，并且如果Drug_Name和Drugz的字符串匹配，我想要在同一行中为第二个数据框分配或粘贴字符串。

注意：
如果我使用left_join或merge merge(df1, df2, all.x = TRUE, by.x = "Drug_Names", by.y = "Drugz") 进行操作，它会返回与DF1类似的完全相同的数据框。

以下是所需DF的示例：

DF1

>     Pat_ID       Date      Code      Drug_Names
>     AB1     2010-12-09     1.1.1     Alpha
>     AB1     2010-12-15     1.1.1     Alpha
>     AB1     2010-12-15     1.1.1     Beta
>     Ax2     2010-12-09     1.1.1     Beta
>     Ax2     2010-12-17     1.1.1     Beta
>     Aq3     2011-02-09     1.1.1     Gamma
>     Aq3     2011-04-25     1.1.1     Gamma
>     Aw4     2011-04-25     1.1.1     Tango

DF2

Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma

所需DF3

Pat_ID       Date      Code    Drug_Names    Drugz     Codez

>     AB1     2010-12-09     1.1.1     Alpha      Alpha     1.1.1
>     AB1     2010-12-15     1.1.1     Alpha      Alpha     1.1.1
>     AB1     2010-12-15     1.1.1     Beta        NA         NA
>     Ax2     2010-12-09     1.1.1     Beta        NA         NA
>     Ax2     2010-12-17     1.1.1     Beta        NA         NA
>     Aq3     2011-02-09     1.1.1     Gamma      Gamma      1.1.3
>     Aq3     2011-04-25     1.1.1     Gamma      Gamma      1.1.3
>     Aw4     2011-04-25     1.1.1     Tango       NA         NA

英文:

I have two data frames and I want to assign or paste the string of second data frame in the same row if the string of Drug_Name and Drugz matches.

NOTE:
If I do it by left_join or merge merge(df1, df2, all.x = TRUE, by.x = "Drug_Names", by.y = "Drugz"), it returns me the exact same dataframe that is similar to DF1.

Below is the example of with required DF

DF1

&gt;     Pat_ID       Date      Code      Drug_Names
&gt;     AB1     2010-12-09     1.1.1     Alpha
&gt;     AB1     2010-12-15     1.1.1     Alpha
&gt;     AB1     2010-12-15     1.1.1     Beta
&gt;     Ax2     2010-12-09     1.1.1     Beta
&gt;     Ax2     2010-12-17     1.1.1     Beta
&gt;     Aq3     2011-02-09     1.1.1     Gamma
&gt;     Aq3     2011-04-25     1.1.1     Gamma
&gt;     Aw4     2011-04-25     1.1.1     Tango

DF2

Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma

Required DF3

> Pat_ID Date Code Drug_Names Drugz Codez
> AB1 2010-12-09 1.1.1 Alpha Alpha 1.1.1
> AB1 2010-12-15 1.1.1 Alpha Alpha 1.1.1
> AB1 2010-12-15 1.1.1 Beta NA NA
> Ax2 2010-12-09 1.1.1 Beta NA NA
> Ax2 2010-12-17 1.1.1 Beta NA NA
> Aq3 2011-02-09 1.1.1 Gamma Gamma 1.1.3
> Aq3 2011-04-25 1.1.1 Gamma Gamma 1.1.3
> Aw4 2011-04-25 1.1.1 Tango NA NA

答案1

得分: 1

df1 <- read.table(text = "
Pat_ID       Date      Code      Drug_Names
AB1     2010-12-09     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Beta
Ax2     2010-12-09     1.1.1     Beta
Ax2     2010-12-17     1.1.1     Beta
Aq3     2011-02-09     1.1.1     Gamma
Aq3     2011-04-25     1.1.1     Gamma
Aw4     2011-04-25     1.1.1     Tango", header = TRUE, stringsAsFactors = FALSE)  %>%  
as_tibble() %>%
mutate(Date = as.Date(Date))
df2 <- read.table(text = "
Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma", header = TRUE, stringsAsFactors = FALSE)  %>%
as_tibble()
df1  %>%
left_join(df2, by = c("Code" = "Codez", "Drug_Names" = "Drugz"), keep = TRUE)
# A tibble: 8 × 6
  Pat_ID Date       Code  Drug_Names Codez Drugz
  <chr>  <date>     <chr> <chr>      <chr> <chr>
1 AB1    2010-12-09 1.1.1 Alpha      1.1.1 Alpha
2 AB1    2010-12-15 1.1.1 Alpha      1.1.1 Alpha
3 AB1    2010-12-15 1.1.1 Beta       NA    NA   
4 Ax2    2010-12-09 1.1.1 Beta       NA    NA   
5 Ax2    2010-12-17 1.1.1 Beta       NA    NA   
6 Aq3    2011-02-09 1.1.1 Gamma      NA    NA   
7 Aq3    2011-04-25 1.1.1 Gamma      NA    NA   
8 Aw4    2011-04-25 1.1.1 Tango      NA    NA

英文:

df1 &lt;- read.table(text = &quot;
Pat_ID       Date      Code      Drug_Names
AB1     2010-12-09     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Alpha
AB1     2010-12-15     1.1.1     Beta
Ax2     2010-12-09     1.1.1     Beta
Ax2     2010-12-17     1.1.1     Beta
Aq3     2011-02-09     1.1.1     Gamma
Aq3     2011-04-25     1.1.1     Gamma
Aw4     2011-04-25     1.1.1     Tango&quot;, header = TRUE, stringsAsFactors = FALSE)  %&gt;%  
as_tibble() %&gt;%
mutate(Date = as.Date(Date))
df2 &lt;- read.table(text = &quot;
Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma&quot;, header = TRUE, stringsAsFactors = FALSE)  %&gt;%
as_tibble()
df1  %&gt;%
left_join(df2, by = c(&quot;Code&quot; = &quot;Codez&quot;, &quot;Drug_Names&quot; = &quot;Drugz&quot;), keep = TRUE)
# A tibble: 8 &#215; 6
  Pat_ID Date       Code  Drug_Names Codez Drugz
  &lt;chr&gt;  &lt;date&gt;     &lt;chr&gt; &lt;chr&gt;      &lt;chr&gt; &lt;chr&gt;
1 AB1    2010-12-09 1.1.1 Alpha      1.1.1 Alpha
2 AB1    2010-12-15 1.1.1 Alpha      1.1.1 Alpha
3 AB1    2010-12-15 1.1.1 Beta       NA    NA   
4 Ax2    2010-12-09 1.1.1 Beta       NA    NA   
5 Ax2    2010-12-17 1.1.1 Beta       NA    NA   
6 Aq3    2011-02-09 1.1.1 Gamma      NA    NA   
7 Aq3    2011-04-25 1.1.1 Gamma      NA    NA   
8 Aw4    2011-04-25 1.1.1 Tango      NA    NA

答案2

得分: 1

你可以使用 match。

cbind(DF1, DF2[match(DF1$Drug_Names, DF2$Drugz),2:1])
#     Pat_ID       Date  Code Drug_Names Drugz Codez
#1       AB1 2010-12-09 1.1.1      Alpha Alpha 1.1.1
#1.1     AB1 2010-12-15 1.1.1      Alpha Alpha 1.1.1
#NA      AB1 2010-12-15 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.1    Ax2 2010-12-09 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.2    Ax2 2010-12-17 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#2       Aq3 2011-02-09 1.1.1      Gamma Gamma 1.1.3
#2.1     Aq3 2011-04-25 1.1.1      Gamma Gamma 1.1.3
#NA.3    Aw4 2011-04-25 1.1.1      Tango  &lt;NA&gt;  &lt;NA&gt;

或者如果使用 merge，可以将列 Drug_Names 添加到 DF2。

merge(DF1, cbind(Drug_Names = DF2$Drugz, DF2), all.x = TRUE)
#  Drug_Names Pat_ID       Date  Code Codez Drugz
#1      Alpha    AB1 2010-12-09 1.1.1 1.1.1 Alpha
#2      Alpha    AB1 2010-12-15 1.1.1 1.1.1 Alpha
#3       Beta    AB1 2010-12-15 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#4       Beta    Ax2 2010-12-09 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#5       Beta    Ax2 2010-12-17 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#6      Gamma    Aq3 2011-02-09 1.1.1 1.1.3 Gamma
#7      Gamma    Aq3 2011-04-25 1.1.1 1.1.3 Gamma
#8      Tango    Aw4 2011-04-25 1.1.1  &lt;NA&gt;  &lt;NA&gt;

数据

DF1 <- read.table(header=TRUE, text="Pat_ID       Date      Code      Drug_Names
     AB1     2010-12-09     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Beta
     Ax2     2010-12-09     1.1.1     Beta
     Ax2     2010-12-17     1.1.1     Beta
     Aq3     2011-02-09     1.1.1     Gamma
     Aq3     2011-04-25     1.1.1     Gamma
     Aw4     2011-04-25     1.1.1     Tango")
DF2 <- read.table(header=TRUE, text="Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma")

英文:

You can use match.

cbind(DF1, DF2[match(DF1$Drug_Names, DF2$Drugz),2:1])
#     Pat_ID       Date  Code Drug_Names Drugz Codez
#1       AB1 2010-12-09 1.1.1      Alpha Alpha 1.1.1
#1.1     AB1 2010-12-15 1.1.1      Alpha Alpha 1.1.1
#NA      AB1 2010-12-15 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.1    Ax2 2010-12-09 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#NA.2    Ax2 2010-12-17 1.1.1       Beta  &lt;NA&gt;  &lt;NA&gt;
#2       Aq3 2011-02-09 1.1.1      Gamma Gamma 1.1.3
#2.1     Aq3 2011-04-25 1.1.1      Gamma Gamma 1.1.3
#NA.3    Aw4 2011-04-25 1.1.1      Tango  &lt;NA&gt;  &lt;NA&gt;

Or in case using merge add the column Drug_Names to DF2.

merge(DF1, cbind(Drug_Names = DF2$Drugz, DF2), all.x = TRUE)
#  Drug_Names Pat_ID       Date  Code Codez Drugz
#1      Alpha    AB1 2010-12-09 1.1.1 1.1.1 Alpha
#2      Alpha    AB1 2010-12-15 1.1.1 1.1.1 Alpha
#3       Beta    AB1 2010-12-15 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#4       Beta    Ax2 2010-12-09 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#5       Beta    Ax2 2010-12-17 1.1.1  &lt;NA&gt;  &lt;NA&gt;
#6      Gamma    Aq3 2011-02-09 1.1.1 1.1.3 Gamma
#7      Gamma    Aq3 2011-04-25 1.1.1 1.1.3 Gamma
#8      Tango    Aw4 2011-04-25 1.1.1  &lt;NA&gt;  &lt;NA&gt;

Data

DF1 &lt;- read.table(header=TRUE, text=&quot;Pat_ID       Date      Code      Drug_Names
     AB1     2010-12-09     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Alpha
     AB1     2010-12-15     1.1.1     Beta
     Ax2     2010-12-09     1.1.1     Beta
     Ax2     2010-12-17     1.1.1     Beta
     Aq3     2011-02-09     1.1.1     Gamma
     Aq3     2011-04-25     1.1.1     Gamma
     Aw4     2011-04-25     1.1.1     Tango&quot;)
DF2 &lt;- read.table(header=TRUE, text=&quot;Codez      Drugz
 1.1.1     Alpha
 1.1.3     Gamma&quot;)

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

合并两个数据框，如果一个字符串列表匹配，则将不匹配的字符串列为NA。

问题

答案1

答案2

“Error in window.default(): ‘start’不能在R时间序列分析中晚于’end'”

DataFrame 最高效的方法是将小于 40% 的行值更新为 NaN 吗？

在尝试通过将CSV读入多个块来连接Pandas数据帧时出现了ValueError。

在 “join_by” 函数中关键词 ‘within’ 和 ‘overlaps’ 的使用

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。