英文:
Subtracting values of a shared variable between two data frames with unequal size if their categorical variables are the same
问题
我想比较两个dataframe中年份为2020和2019的共享列value。由于表中添加了一个新的国家,2020年的数据行数更多。
我编写了下面的function,但它没有产生任何结果。如果有人能帮助我,我将不胜感激。
我的期望结果:如果dat2019中的一行与dat2020中的分类变量完全相同,则返回它们在2020年和2019年的value之间的差异。
请注意,dat2020(Banggladesh)和dat2019(Chiina)中有故意的拼写错误。
英文:
I wanted to compare the shared column, value, in two dataframe for year 2020 and 2019. The 2020 data has more rows since a new country has added to the table.
I wrote the below function but it didn’t produce any result. I would appreciate if anyone could help me on this.
dat2020 <- tribble(
~Country, ~Gender, ~Indicator, ~value,
"Bangladesh", "Male", "A", 3.7,
"Bangladesh", "Female", "A", 2.6,
"Banggladesh", "Male", "B", 6.8,
"Bangladesh", "Female", "B", 4.1,
"China", "Male", "A", 7.6,
"China", "Female", "A", 3.9,
"China", "Male", "B", 1.5,
"China", "Female", "B", 2.9,
"Laos", "Male", "A", 7.6,
"Laos", "Female", "A", 5.1,
"Laos", "Male", "B", 3.8,
"Laos", "Female", "B", 2.8,
)
dat2019 <- tribble(
~Country, ~Gender, ~Indicator, ~value,
"Bangladesh", "Male", "A", 3.6,
"Bangladesh", "Female", "A", 6.8,
"Bangladesh", "Male", "B", 9.2,
"Bangladesh", "Female", "B", 1.5,
"China", "Male", "A", 8.5,
"Chiina", "Female", "A", 3.9,
"China", "Male", "B", 4.6,
"China", "Female", "B", 5.3,
)
CheckList <- c()
checkValue <- function(data1, data2){
if(data1$Country == data2$Country & data1$Gender == data2$Gender & data1$Indicator == data2$Indicator){
CheckList$Diff = data1$value - data2$value
}
else{
CheckList$Diff = NA
}
}
checkValue(data1 = dat2019, data2 = dat2020)
My desired outcome: if a row in dat2019 has exactly same categorical variables as in dat2020, return the difference between their value in 2020 and 2019.
Note there are intentional typos in dat2020 (Banggladesh) and dat2019 (Chiina).
答案1
得分: 1
做联接并减去
library(dplyr)
left_join(dat2020, dat2019, by = names(dat2020)[1:3]) %>%
mutate(Diff = value.x - value.y, value = value.x, .keep = "unused")
英文:
Do a join and subtract
library(dplyr)
left_join(dat2020, dat2019, by = names(dat2020)[1:3]) %>%
mutate(Diff = value.x - value.y, value = value.x, .keep = "unused")
</details>
# 答案2
**得分**: 0
我会通过`join`操作来解决这个问题:
```R
library(dplyr)
dat2019 %>%
dplyr::rename(value2019 = value) %>%
dplyr::left_join(dplyr::rename(dat2020, value2020 = value)) %>%
dplyr::mutate(diff = value2020 - value2019)
#> 正在连接,按照 = c("Country", "Gender", "Indicator")
#> # A tibble: 8 x 6
#> Country Gender Indicator value2019 value2020 diff
#>
#> 1 Bangladesh 男性 A 3.6 3.7 0.1
#> 2 Bangladesh 女性 A 6.8 2.6 -4.20
#> 3 Bangladesh 男性 B 9.2 NA NA
#> 4 Bangladesh 女性 B 1.5 4.1 2.60
#> 5 中国 男性 A 8.5 7.6 -0.9
#> 6 中国 女性 A 3.9 NA NA
#> 7 中国 男性 B 4.6 1.5 -3.10
#> 8 中国 女性 B 5.3 2.9 -2.4
<details>
<summary>英文:</summary>
I would solve this via a `join` operation:
library(dplyr)
dat2019 %>%
dplyr::rename(value2019 = value) %>%
dplyr::left_join(dplyr::rename(dat2020, value2020 = value)) %>%
dplyr::mutate(diff = value2020 - value2019)
#> Joining, by = c("Country", "Gender", "Indicator")
#> # A tibble: 8 x 6
#> Country Gender Indicator value2019 value2020 diff
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Bangladesh Male A 3.6 3.7 0.1
#> 2 Bangladesh Female A 6.8 2.6 -4.20
#> 3 Bangladesh Male B 9.2 NA NA
#> 4 Bangladesh Female B 1.5 4.1 2.60
#> 5 China Male A 8.5 7.6 -0.9
#> 6 Chiina Female A 3.9 NA NA
#> 7 China Male B 4.6 1.5 -3.10
#> 8 China Female B 5.3 2.9 -2.4
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。


评论