Subtracting values of a shared variable between two data frames with unequal size if their categorical variables are the same

huangapple go评论73阅读模式
英文:

Subtracting values of a shared variable between two data frames with unequal size if their categorical variables are the same

问题

我想比较两个dataframe中年份为2020和2019的共享列value。由于表中添加了一个新的国家,2020年的数据行数更多。
我编写了下面的function,但它没有产生任何结果。如果有人能帮助我,我将不胜感激。

我的期望结果:如果dat2019中的一行与dat2020中的分类变量完全相同,则返回它们在2020年和2019年的value之间的差异。

请注意,dat2020(Banggladesh)和dat2019(Chiina)中有故意的拼写错误

英文:

I wanted to compare the shared column, value, in two dataframe for year 2020 and 2019. The 2020 data has more rows since a new country has added to the table.
I wrote the below function but it didn’t produce any result. I would appreciate if anyone could help me on this.

dat2020 <- tribble(
  ~Country, ~Gender, ~Indicator, ~value,  
  "Bangladesh", "Male", "A", 3.7,
  "Bangladesh", "Female", "A", 2.6,
  "Banggladesh", "Male", "B", 6.8,
  "Bangladesh", "Female", "B", 4.1,
  "China", "Male", "A", 7.6,
  "China", "Female", "A", 3.9,
  "China", "Male", "B", 1.5,
  "China", "Female", "B", 2.9,
  "Laos", "Male", "A", 7.6,
  "Laos", "Female", "A", 5.1,
  "Laos", "Male", "B", 3.8,
  "Laos", "Female", "B", 2.8,
)

dat2019 <- tribble(
  ~Country, ~Gender, ~Indicator, ~value,  
  "Bangladesh", "Male", "A", 3.6,
  "Bangladesh", "Female", "A", 6.8,
  "Bangladesh", "Male", "B", 9.2,
  "Bangladesh", "Female", "B", 1.5,
  "China", "Male", "A", 8.5,
  "Chiina", "Female", "A", 3.9,
  "China", "Male", "B", 4.6,
  "China", "Female", "B", 5.3,
)

CheckList <- c()
checkValue <- function(data1, data2){
  if(data1$Country == data2$Country & data1$Gender == data2$Gender & data1$Indicator == data2$Indicator){
    CheckList$Diff = data1$value - data2$value
  }
  else{
    CheckList$Diff = NA
  }
}

checkValue(data1 = dat2019, data2 = dat2020)

My desired outcome: if a row in dat2019 has exactly same categorical variables as in dat2020, return the difference between their value in 2020 and 2019.

Note there are intentional typos in dat2020 (Banggladesh) and dat2019 (Chiina).

答案1

得分: 1

做联接并减去

library(dplyr)
left_join(dat2020, dat2019, by = names(dat2020)[1:3]) %>%
  mutate(Diff = value.x - value.y, value = value.x, .keep = "unused")
英文:

Do a join and subtract

library(dplyr)
left_join(dat2020, dat2019, by = names(dat2020)[1:3]) %>% 
  mutate(Diff = value.x - value.y, value = value.x, .keep = "unused")

</details>



# 答案2
**得分**: 0

我会通过`join`操作来解决这个问题:

```R
library(dplyr)

dat2019 %>%
  dplyr::rename(value2019 = value) %>%
  dplyr::left_join(dplyr::rename(dat2020, value2020 = value)) %>%
  dplyr::mutate(diff = value2020 - value2019)

#> 正在连接,按照 = c("Country", "Gender", "Indicator")
#> # A tibble: 8 x 6
#> Country Gender Indicator value2019 value2020 diff
#>
#> 1 Bangladesh 男性 A 3.6 3.7 0.1
#> 2 Bangladesh 女性 A 6.8 2.6 -4.20
#> 3 Bangladesh 男性 B 9.2 NA NA
#> 4 Bangladesh 女性 B 1.5 4.1 2.60
#> 5 中国 男性 A 8.5 7.6 -0.9
#> 6 中国 女性 A 3.9 NA NA
#> 7 中国 男性 B 4.6 1.5 -3.10
#> 8 中国 女性 B 5.3 2.9 -2.4


<details>
<summary>英文:</summary>

I would solve this via a `join` operation:

library(dplyr)

dat2019 %>%
dplyr::rename(value2019 = value) %>%
dplyr::left_join(dplyr::rename(dat2020, value2020 = value)) %>%
dplyr::mutate(diff = value2020 - value2019)

#> Joining, by = c("Country", "Gender", "Indicator")
#> # A tibble: 8 x 6
#> Country Gender Indicator value2019 value2020 diff
#> <chr> <chr> <chr> <dbl> <dbl> <dbl>
#> 1 Bangladesh Male A 3.6 3.7 0.1
#> 2 Bangladesh Female A 6.8 2.6 -4.20
#> 3 Bangladesh Male B 9.2 NA NA
#> 4 Bangladesh Female B 1.5 4.1 2.60
#> 5 China Male A 8.5 7.6 -0.9
#> 6 Chiina Female A 3.9 NA NA
#> 7 China Male B 4.6 1.5 -3.10
#> 8 China Female B 5.3 2.9 -2.4


</details>



huangapple
  • 本文由 发表于 2023年5月17日 14:53:25
  • 转载请务必保留本文链接:https://go.coder-hub.com/76269268.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定