英文:
How to create a new variable only if a condition if satisfied?
问题
我想创建一个新变量,从两个不同数据集中减去两个变量。但我需要R仅计算相同值的差异... 在我的情况下,"相同的事物" 是第三个变量 COD_PROV
,它指示城市。
我需要R计算数据集1的 Real Wage
变量和数据集2的 Real Wage
变量之间的差异,但仅当这些实际工资的 COD_PROV
相同时。
这是数据集1的示例:
COD_PROV Real wage
1 1962.18
6 1742.85
5 1541.81
96 1612.2
4 1574
3 1823.53
103 1584.49
2 1666.21
7 1747.81
10 2066.42
8 1498.01
11 1871.34
9 1770.41
15 2240.03
16 1729.17
17 1773.38
13 1832.57
数据集2具有相同的框架,但缺少一些 COD_PROV
值:
COD_PROV Real wage
1 4962.18
6 1542.85
5 3541.81
4 1564
3 1223.53
2 1446.21
7 1557.81
10 2226.42
8 1458.01
11 1843.34
16 1439.17
17 1883.38
13 1992.57
我尝试了这个:
new <- mutate(dataset1, `Wage Difference` = dataset1$`Real wage` - dataset2$`Real wage`)
但R当然会回复:
Error in mutate():
ℹ In argument: Wage difference = ... - dataset1$Real wage.
Caused by error:
! Wage difference must be size 105 or 1, not 106.
Run rlang::last_error() to see where the error occurred.
我认为原因是数据集2的观测数量比数据集1少(特别是缺少一些 COD_PROV
值)... 我如何只对相同的 COD_PROV
值应用差异?
英文:
I want to create a new variable subtracting two variables from two different datasets. But I need that R make the difference only for the values that are referred to the same thing... The "same thing" in my case is the a third variable COD_PROV
which indicates the city.
I need that R make the difference between the Real Wage
variable of dataset 1 and the Real Wage
variable of dataset 2, but only if the COD_PROV
of these real wages is the same.
This is an example of the dataset 1
COD_PROV Real wage
1 1962,18
6 1742,85
5 1541,81
96 1612,2
4 1574
3 1823,53
103 1584,49
2 1666,21
7 1747,81
10 2066,42
8 1498,01
11 1871,34
9 1770,41
15 2240,03
16 1729,17
17 1773,38
13 1832,57
Datset 2 has the same framework, but some values of COD_PROV are missing
COD_PROV Real wage
1 4962,18
6 1542,85
5 3541,81
4 1564
3 1223,53
2 1446,21
7 1557,81
10 2226,42
8 1458,01
11 1843,34
16 1439,17
17 1883,38
13 1992,57
I've tried this
new <- mutate( dataset1, `Wage Difference ` = dataset1$`Real wage` - dataset2$`Real wage` )
but R of course replies
Error in mutate():
ℹ In argument: Wage difference = ... - dataset1$Real wage.
Caused by error:
! Wage difference must be size 105 or 1, not 106.
Run rlang::last_error() to see where the error occurred.
I suppose that the reason is that dataset 2 has less observations than dataset1 ( in particular some values of COD_PROV are missing)... How can I apply the difference only for the same values of COD_PROV ?
答案1
得分: 1
我认为你应该将它们合并在一起,然后计算差异。
library(dplyr)
dataset1 %>%
left_join(dataset2, by = "COD_PROV", suffix = c("", ".y")) %>%
mutate(diff = `Real wage` - `Real wage.y`)
# COD_PROV Real wage Real wage.y diff
# 1 1 1962.18 4962.18 -3000
# 2 6 1742.85 1542.85 200
# 3 5 1541.81 3541.81 -2000
# 4 96 1612.20 NA NA
# 5 4 1574.00 1564.00 10
# 6 3 1823.53 1223.53 600
# 7 103 1584.49 NA NA
# 8 2 1666.21 1446.21 220
# 9 7 1747.81 1557.81 190
# 10 10 2066.42 2226.42 -160
# 11 8 1498.01 1458.01 40
# 12 11 1871.34 1843.34 28
# 13 9 1770.41 NA NA
# 14 15 2240.03 NA NA
# 15 16 1729.17 1439.17 290
# 16 17 1773.38 1883.38 -110
# 17 13 1832.57 1992.57 -160
有关"join/merge"的更多信息,请参见https://stackoverflow.com/q/1299871/3358272,https://stackoverflow.com/q/5706437/3358272。
数据:
dataset1 <- structure(list(COD_PROV = c(1L, 6L, 5L, 96L, 4L, 3L, 103L, 2L, 7L, 10L, 8L, 11L, 9L, 15L, 16L, 17L, 13L), "Real wage" = c(1962.18, 1742.85, 1541.81, 1612.2, 1574, 1823.53, 1584.49, 1666.21, 1747.81, 2066.42, 1498.01, 1871.34, 1770.41, 2240.03, 1729.17, 1773.38, 1832.57)), row.names = c(NA, -17L), class = "data.frame")
dataset2 <- structure(list(COD_PROV = c(1L, 6L, 5L, 4L, 3L, 2L, 7L, 10L, 8L, 11L, 16L, 17L, 13L), "Real wage" = c(4962.18, 1542.85, 3541.81, 1564, 1223.53, 1446.21, 1557.81, 2226.42, 1458.01, 1843.34, 1439.17, 1883.38, 1992.57)), row.names = c(NA, -13L), class = "data.frame")
英文:
I think you should join/merge these together and then calculate the difference.
library(dplyr)
dataset1 %>%
left_join(dataset2, by = "COD_PROV", suffix = c("", ".y")) %>%
mutate(diff = `Real wage` - `Real wage.y`)
# COD_PROV Real wage Real wage.y diff
# 1 1 1962.18 4962.18 -3000
# 2 6 1742.85 1542.85 200
# 3 5 1541.81 3541.81 -2000
# 4 96 1612.20 NA NA
# 5 4 1574.00 1564.00 10
# 6 3 1823.53 1223.53 600
# 7 103 1584.49 NA NA
# 8 2 1666.21 1446.21 220
# 9 7 1747.81 1557.81 190
# 10 10 2066.42 2226.42 -160
# 11 8 1498.01 1458.01 40
# 12 11 1871.34 1843.34 28
# 13 9 1770.41 NA NA
# 14 15 2240.03 NA NA
# 15 16 1729.17 1439.17 290
# 16 17 1773.38 1883.38 -110
# 17 13 1832.57 1992.57 -160
For more information on what join/merge means, see https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272.
Data
dataset1 <- structure(list(COD_PROV = c(1L, 6L, 5L, 96L, 4L, 3L, 103L, 2L, 7L, 10L, 8L, 11L, 9L, 15L, 16L, 17L, 13L), "Real wage" = c(1962.18, 1742.85, 1541.81, 1612.2, 1574, 1823.53, 1584.49, 1666.21, 1747.81, 2066.42, 1498.01, 1871.34, 1770.41, 2240.03, 1729.17, 1773.38, 1832.57)), row.names = c(NA, -17L), class = "data.frame")
dataset2 <- structure(list(COD_PROV = c(1L, 6L, 5L, 4L, 3L, 2L, 7L, 10L, 8L, 11L, 16L, 17L, 13L), "Real wage" = c(4962.18, 1542.85, 3541.81, 1564, 1223.53, 1446.21, 1557.81, 2226.42, 1458.01, 1843.34, 1439.17, 1883.38, 1992.57)), row.names = c(NA, -13L), class = "data.frame")
答案2
得分: 0
我们可以使用 {powerjoin}:
powerjoin::power_left_join(dataset1, dataset2, by = "COD_PROV", conflict = `-`)
#> COD_PROV 实际工资
#> 1 1 -3000
#> 2 6 200
#> 3 5 -2000
#> 4 96 <NA>
#> 5 4 10
#> 6 3 600
#> 7 103 <NA>
#> 8 2 220
#> 9 7 190
#> 10 10 -160
#> 11 8 40
#> 12 11 28
#> 13 9 <NA>
#> 14 15 <NA>
#> 15 16 290
#> 16 17 -110
#> 17 13 -160
创建于2023年3月17日,使用 reprex v2.0.2
英文:
We can use {powerjoin}:
powerjoin::power_left_join(dataset1, dataset2, by = "COD_PROV", conflict = `-`)
#> COD_PROV Real wage
#> 1 1 -3000
#> 2 6 200
#> 3 5 -2000
#> 4 96 NA
#> 5 4 10
#> 6 3 600
#> 7 103 NA
#> 8 2 220
#> 9 7 190
#> 10 10 -160
#> 11 8 40
#> 12 11 28
#> 13 9 NA
#> 14 15 NA
#> 15 16 290
#> 16 17 -110
#> 17 13 -160
<sup>Created on 2023-03-17 with reprex v2.0.2</sup>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论