英文:
Subtracting values of a shared variable between two data frames with unequal size if their categorical variables are the same
问题
我想比较两个dataframe
中年份为2020和2019的共享列value
。由于表中添加了一个新的国家,2020年的数据行数更多。
我编写了下面的function
,但它没有产生任何结果。如果有人能帮助我,我将不胜感激。
我的期望结果:如果dat2019
中的一行与dat2020
中的分类变量完全相同,则返回它们在2020年和2019年的value
之间的差异。
请注意,dat2020(Banggladesh)和dat2019(Chiina)中有故意的拼写错误。
英文:
I wanted to compare the shared column, value
, in two dataframe
for year 2020 and 2019. The 2020 data has more rows since a new country has added to the table.
I wrote the below function
but it didn’t produce any result. I would appreciate if anyone could help me on this.
dat2020 <- tribble(
~Country, ~Gender, ~Indicator, ~value,
"Bangladesh", "Male", "A", 3.7,
"Bangladesh", "Female", "A", 2.6,
"Banggladesh", "Male", "B", 6.8,
"Bangladesh", "Female", "B", 4.1,
"China", "Male", "A", 7.6,
"China", "Female", "A", 3.9,
"China", "Male", "B", 1.5,
"China", "Female", "B", 2.9,
"Laos", "Male", "A", 7.6,
"Laos", "Female", "A", 5.1,
"Laos", "Male", "B", 3.8,
"Laos", "Female", "B", 2.8,
)
dat2019 <- tribble(
~Country, ~Gender, ~Indicator, ~value,
"Bangladesh", "Male", "A", 3.6,
"Bangladesh", "Female", "A", 6.8,
"Bangladesh", "Male", "B", 9.2,
"Bangladesh", "Female", "B", 1.5,
"China", "Male", "A", 8.5,
"Chiina", "Female", "A", 3.9,
"China", "Male", "B", 4.6,
"China", "Female", "B", 5.3,
)
CheckList <- c()
checkValue <- function(data1, data2){
if(data1$Country == data2$Country & data1$Gender == data2$Gender & data1$Indicator == data2$Indicator){
CheckList$Diff = data1$value - data2$value
}
else{
CheckList$Diff = NA
}
}
checkValue(data1 = dat2019, data2 = dat2020)
My desired outcome: if a row in dat2019
has exactly same categorical variables as in dat2020
, return the difference between their value
in 2020 and 2019.
Note there are intentional typos in dat2020 (Banggladesh) and dat2019 (Chiina).
答案1
得分: 1
做联接并减去
library(dplyr)
left_join(dat2020, dat2019, by = names(dat2020)[1:3]) %>%
mutate(Diff = value.x - value.y, value = value.x, .keep = "unused")
英文:
Do a join and subtract
library(dplyr)
left_join(dat2020, dat2019, by = names(dat2020)[1:3]) %>%
mutate(Diff = value.x - value.y, value = value.x, .keep = "unused")
</details>
# 答案2
**得分**: 0
我会通过`join`操作来解决这个问题:
```R
library(dplyr)
dat2019 %>%
dplyr::rename(value2019 = value) %>%
dplyr::left_join(dplyr::rename(dat2020, value2020 = value)) %>%
dplyr::mutate(diff = value2020 - value2019)
#> 正在连接,按照 = c("Country", "Gender", "Indicator")
#> # A tibble: 8 x 6
#> Country Gender Indicator value2019 value2020 diff
#>
#> 1 Bangladesh 男性 A 3.6 3.7 0.1
#> 2 Bangladesh 女性 A 6.8 2.6 -4.20
#> 3 Bangladesh 男性 B 9.2 NA NA
#> 4 Bangladesh 女性 B 1.5 4.1 2.60
#> 5 中国 男性 A 8.5 7.6 -0.9
#> 6 中国 女性 A 3.9 NA NA
#> 7 中国 男性 B 4.6 1.5 -3.10
#> 8 中国 女性 B 5.3 2.9 -2.4
<details>
<summary>英文:</summary>
I would solve this via a `join` operation:
library(dplyr)
dat2019 %>%
dplyr::rename(value2019 = value) %>%
dplyr::left_join(dplyr::rename(dat2020, value2020 = value)) %>%
dplyr::mutate(diff = value2020 - value2019)
#> Joining, by = c("Country", "Gender", "Indicator")
#> # A tibble: 8 x 6
#> Country Gender Indicator value2019 value2020 diff
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Bangladesh Male A 3.6 3.7 0.1
#> 2 Bangladesh Female A 6.8 2.6 -4.20
#> 3 Bangladesh Male B 9.2 NA NA
#> 4 Bangladesh Female B 1.5 4.1 2.60
#> 5 China Male A 8.5 7.6 -0.9
#> 6 Chiina Female A 3.9 NA NA
#> 7 China Male B 4.6 1.5 -3.10
#> 8 China Female B 5.3 2.9 -2.4
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论