2023年7月27日 19:27:48go评论114阅读模式

英文:

left join and keep unique values

问题

我想将这两个数据框根据Country进行左连接，以便我可以获得每个国家的每个League。我不能在**left_join()中使用multiple="any"**参数，因为每个国家都有多个联赛。

期望的输出如下：

# A tibble: 6 × 2
  Country League 
  <chr>   <chr>  
1 France  Ligue 1
2 France  Ligue 2
3 Spain   Liga   
4 Spain   Liga 2 
5 England PL1    
6 England PL2

我可以通过以下方式获得这个结果，但会出现警告消息。而且我有一种印象，可以使用dplyr更友好的方式来完成这个任务。

dtl <- dtl %>% 
  left_join(select(t2, Country, League), by = c("Country")) %>% 
  dplyr::distinct(Country, League, .keep_all = TRUE)

谢谢你的帮助。

英文:

I have the following df :

t1&lt;-structure(list(Country = c(&quot;France&quot;, &quot;Spain&quot;, &quot;England&quot;)), class = c(&quot;tbl_df&quot;, 
&quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -3L))
t2&lt;-structure(list(Country = c(&quot;France&quot;, &quot;France&quot;, &quot;France&quot;, &quot;France&quot;, 
&quot;France&quot;, &quot;France&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, 
&quot;Spain&quot;, &quot;Spain&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;, 
&quot;England&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;), League = c(&quot;Ligue 1&quot;, 
&quot;Ligue 1&quot;, &quot;Ligue 1&quot;, &quot;Ligue 2&quot;, &quot;Ligue 2&quot;, &quot;Ligue 2&quot;, &quot;Liga&quot;, 
&quot;Liga&quot;, &quot;Liga&quot;, &quot;Liga 2&quot;, &quot;Liga 2&quot;, &quot;Liga 2&quot;, &quot;Liga 2&quot;, &quot;PL1&quot;, 
&quot;PL1&quot;, &quot;PL1&quot;, &quot;PL1&quot;, &quot;PL2&quot;, &quot;PL2&quot;, &quot;PL2&quot;, &quot;PL2&quot;), ID = c(&quot;EUR&quot;, 
&quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, 
&quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, 
&quot;EUR&quot;, &quot;EUR&quot;)), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, 
-21L))

I want to left join the two based on Country so that I have every League per country. I can't use multiple="any" argument in left_join() since there are several leagues per country.
The desired output would be this:

# A tibble: 6 &#215; 2
  Country League 
  &lt;chr&gt;   &lt;chr&gt;  
1 France  Ligue 1
2 France  Ligue 2
3 Spain   Liga   
4 Spain   Liga 2 
5 England PL1    
6 England PL2

I'm able to get this result by doing the following but it throws a warning message. Also I'm under the impression that this could be done in a more R friendly way using dplyr.

dtl&lt;- dtl %&gt;% 
  left_join(select(t2,Country,League),by=c(&quot;Country&quot;)) %&gt;% 
  dplyr::distinct(Country,League,.keep_all = T)

Thank you for your help.

答案1

得分: 2

如Darren的评论所示，问题在于t2中的重复行。移除这些重复行，您将获得所需的结果：

library(dplyr)
left_join(t1, distinct(t2), by = "Country") %>%
  select(-ID)
# 输出:
# 一个 tibble: 6 × 2
  Country League 
  <chr>   <chr>  
1 France  Ligue 1
2 France  Ligue 2
3 Spain   Liga   
4 Spain   Liga 2 
5 England PL1    
6 England PL2

英文:

As Darren's comment illustrates, the issue is the duplicate rows in t2. Remove those, and you get the desired result:

library(dplyr)
left_join(t1, distinct(t2), by = &quot;Country&quot;) %&gt;% 
  select(-ID)
# Output:
# A tibble: 6 &#215; 2
  Country League 
  &lt;chr&gt;   &lt;chr&gt;  
1 France  Ligue 1
2 France  Ligue 2
3 Spain   Liga   
4 Spain   Liga 2 
5 England PL1    
6 England PL2

答案2

得分: 1

你可以使用dplyr的group_by和summarize函数来组合每个国家的联赛以获得所需的输出。以下是用于实现结果的代码，没有警告消息：

library(dplyr)
t1 <- structure(list(Country = c("France", "Spain", "England")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
t2 <- structure(list(Country = c("France", "France", "France", "France", "France", "France", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "England", "England", "England", "England", "England", "England", "England", "England"), League = c("Ligue 1", "Ligue 1", "Ligue 1", "Ligue 2", "Ligue 2", "Ligue 2", "Liga", "Liga", "Liga", "Liga 2", "Liga 2", "Liga 2", "Liga 2", "PL1", "PL1", "PL1", "PL1", "PL2", "PL2", "PL2", "PL2"), ID = c("EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L))
result <- t1 %>%
  left_join(t2, by = "Country") %>%
  group_by(Country) %>%
  summarize(League = toString(unique(League)))
print(result)

输出:

# A tibble: 3 × 2
  Country League                      
  <chr>   <chr>                       
1 England PL1, PL2                    
2 France  Ligue 1, Ligue 2            
3 Spain   Liga, Liga 2

英文:

You can achieve the desired output using dplyr's group_by and summarize functions to combine the leagues for each country. Here's the code to achieve the result without the warning message:

library(dplyr)
t1 &lt;- structure(list(Country = c(&quot;France&quot;, &quot;Spain&quot;, &quot;England&quot;)), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -3L))
t2 &lt;- structure(list(Country = c(&quot;France&quot;, &quot;France&quot;, &quot;France&quot;, &quot;France&quot;, &quot;France&quot;, &quot;France&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;Spain&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;, &quot;England&quot;), League = c(&quot;Ligue 1&quot;, &quot;Ligue 1&quot;, &quot;Ligue 1&quot;, &quot;Ligue 2&quot;, &quot;Ligue 2&quot;, &quot;Ligue 2&quot;, &quot;Liga&quot;, &quot;Liga&quot;, &quot;Liga&quot;, &quot;Liga 2&quot;, &quot;Liga 2&quot;, &quot;Liga 2&quot;, &quot;Liga 2&quot;, &quot;PL1&quot;, &quot;PL1&quot;, &quot;PL1&quot;, &quot;PL1&quot;, &quot;PL2&quot;, &quot;PL2&quot;, &quot;PL2&quot;, &quot;PL2&quot;), ID = c(&quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;, &quot;EUR&quot;)), class = c(&quot;tbl_df&quot;, &quot;tbl&quot;, &quot;data.frame&quot;), row.names = c(NA, -21L))
result &lt;- t1 %&gt;% 
  left_join(t2, by = &quot;Country&quot;) %&gt;% 
  group_by(Country) %&gt;% 
  summarize(League = toString(unique(League)))
print(result)

Output:

# A tibble: 3 &#215; 2
  Country League                      
  &lt;chr&gt;   &lt;chr&gt;                       
1 England PL1, PL2                    
2 France  Ligue 1, Ligue 2            
3 Spain   Liga, Liga 2

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

左连接并保留唯一值

问题

答案1

答案2

只在整数列上使用 purrr::map_df 函数。

创建一个分组箱线图，每个分组列有不同数量的行？

如何在R中对列表进行子集化？

增加一个值，如果一行中的数字发生变化。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。