英文:
left join and keep unique values
问题
我想将这两个数据框根据Country进行左连接,以便我可以获得每个国家的每个League。我不能在**left_join()中使用multiple="any"**参数,因为每个国家都有多个联赛。
期望的输出如下:
# A tibble: 6 × 2
Country League
<chr> <chr>
1 France Ligue 1
2 France Ligue 2
3 Spain Liga
4 Spain Liga 2
5 England PL1
6 England PL2
我可以通过以下方式获得这个结果,但会出现警告消息。而且我有一种印象,可以使用dplyr更友好的方式来完成这个任务。
dtl <- dtl %>%
left_join(select(t2, Country, League), by = c("Country")) %>%
dplyr::distinct(Country, League, .keep_all = TRUE)
谢谢你的帮助。
英文:
I have the following df :
t1<-structure(list(Country = c("France", "Spain", "England")), class = c("tbl_df",
"tbl", "data.frame"), row.names = c(NA, -3L))
t2<-structure(list(Country = c("France", "France", "France", "France",
"France", "France", "Spain", "Spain", "Spain", "Spain", "Spain",
"Spain", "Spain", "England", "England", "England", "England",
"England", "England", "England", "England"), League = c("Ligue 1",
"Ligue 1", "Ligue 1", "Ligue 2", "Ligue 2", "Ligue 2", "Liga",
"Liga", "Liga", "Liga 2", "Liga 2", "Liga 2", "Liga 2", "PL1",
"PL1", "PL1", "PL1", "PL2", "PL2", "PL2", "PL2"), ID = c("EUR",
"EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR",
"EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR",
"EUR", "EUR")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA,
-21L))
I want to left join the two based on Country so that I have every League per country. I can't use multiple="any" argument in left_join() since there are several leagues per country.
The desired output would be this:
# A tibble: 6 × 2
Country League
<chr> <chr>
1 France Ligue 1
2 France Ligue 2
3 Spain Liga
4 Spain Liga 2
5 England PL1
6 England PL2
I'm able to get this result by doing the following but it throws a warning message. Also I'm under the impression that this could be done in a more R friendly way using dplyr.
dtl<- dtl %>%
left_join(select(t2,Country,League),by=c("Country")) %>%
dplyr::distinct(Country,League,.keep_all = T)
Thank you for your help.
答案1
得分: 2
如Darren的评论所示,问题在于t2
中的重复行。移除这些重复行,您将获得所需的结果:
library(dplyr)
left_join(t1, distinct(t2), by = "Country") %>%
select(-ID)
# 输出:
# 一个 tibble: 6 × 2
Country League
<chr> <chr>
1 France Ligue 1
2 France Ligue 2
3 Spain Liga
4 Spain Liga 2
5 England PL1
6 England PL2
英文:
As Darren's comment illustrates, the issue is the duplicate rows in t2
. Remove those, and you get the desired result:
library(dplyr)
left_join(t1, distinct(t2), by = "Country") %>%
select(-ID)
# Output:
# A tibble: 6 × 2
Country League
<chr> <chr>
1 France Ligue 1
2 France Ligue 2
3 Spain Liga
4 Spain Liga 2
5 England PL1
6 England PL2
答案2
得分: 1
你可以使用dplyr的group_by和summarize函数来组合每个国家的联赛以获得所需的输出。以下是用于实现结果的代码,没有警告消息:
library(dplyr)
t1 <- structure(list(Country = c("France", "Spain", "England")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
t2 <- structure(list(Country = c("France", "France", "France", "France", "France", "France", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "England", "England", "England", "England", "England", "England", "England", "England"), League = c("Ligue 1", "Ligue 1", "Ligue 1", "Ligue 2", "Ligue 2", "Ligue 2", "Liga", "Liga", "Liga", "Liga 2", "Liga 2", "Liga 2", "Liga 2", "PL1", "PL1", "PL1", "PL1", "PL2", "PL2", "PL2", "PL2"), ID = c("EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L))
result <- t1 %>%
left_join(t2, by = "Country") %>%
group_by(Country) %>%
summarize(League = toString(unique(League)))
print(result)
输出:
# A tibble: 3 × 2
Country League
<chr> <chr>
1 England PL1, PL2
2 France Ligue 1, Ligue 2
3 Spain Liga, Liga 2
英文:
You can achieve the desired output using dplyr's group_by and summarize functions to combine the leagues for each country. Here's the code to achieve the result without the warning message:
library(dplyr)
t1 <- structure(list(Country = c("France", "Spain", "England")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -3L))
t2 <- structure(list(Country = c("France", "France", "France", "France", "France", "France", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "Spain", "England", "England", "England", "England", "England", "England", "England", "England"), League = c("Ligue 1", "Ligue 1", "Ligue 1", "Ligue 2", "Ligue 2", "Ligue 2", "Liga", "Liga", "Liga", "Liga 2", "Liga 2", "Liga 2", "Liga 2", "PL1", "PL1", "PL1", "PL1", "PL2", "PL2", "PL2", "PL2"), ID = c("EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR", "EUR")), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, -21L))
result <- t1 %>%
left_join(t2, by = "Country") %>%
group_by(Country) %>%
summarize(League = toString(unique(League)))
print(result)
Output:
# A tibble: 3 × 2
Country League
<chr> <chr>
1 England PL1, PL2
2 France Ligue 1, Ligue 2
3 Spain Liga, Liga 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论