创建一个新变量仅当满足条件时。

huangapple go评论98阅读模式
英文:

How to create a new variable only if a condition if satisfied?

问题

我想创建一个新变量,从两个不同数据集中减去两个变量。但我需要R仅计算相同值的差异... 在我的情况下,"相同的事物" 是第三个变量 COD_PROV,它指示城市。

我需要R计算数据集1的 Real Wage 变量和数据集2的 Real Wage 变量之间的差异,但仅当这些实际工资的 COD_PROV 相同时。

这是数据集1的示例:

  1. COD_PROV Real wage
  2. 1 1962.18
  3. 6 1742.85
  4. 5 1541.81
  5. 96 1612.2
  6. 4 1574
  7. 3 1823.53
  8. 103 1584.49
  9. 2 1666.21
  10. 7 1747.81
  11. 10 2066.42
  12. 8 1498.01
  13. 11 1871.34
  14. 9 1770.41
  15. 15 2240.03
  16. 16 1729.17
  17. 17 1773.38
  18. 13 1832.57

数据集2具有相同的框架,但缺少一些 COD_PROV 值:

  1. COD_PROV Real wage
  2. 1 4962.18
  3. 6 1542.85
  4. 5 3541.81
  5. 4 1564
  6. 3 1223.53
  7. 2 1446.21
  8. 7 1557.81
  9. 10 2226.42
  10. 8 1458.01
  11. 11 1843.34
  12. 16 1439.17
  13. 17 1883.38
  14. 13 1992.57

我尝试了这个:

  1. new <- mutate(dataset1, `Wage Difference` = dataset1$`Real wage` - dataset2$`Real wage`)

但R当然会回复:

  1. Error in mutate():
  2. In argument: Wage difference = ... - dataset1$Real wage.
  3. Caused by error:
  4. ! Wage difference must be size 105 or 1, not 106.
  5. Run rlang::last_error() to see where the error occurred.

我认为原因是数据集2的观测数量比数据集1少(特别是缺少一些 COD_PROV 值)... 我如何只对相同的 COD_PROV 值应用差异?

英文:

I want to create a new variable subtracting two variables from two different datasets. But I need that R make the difference only for the values that are referred to the same thing... The "same thing" in my case is the a third variable COD_PROV which indicates the city.

I need that R make the difference between the Real Wage variable of dataset 1 and the Real Wage variable of dataset 2, but only if the COD_PROV of these real wages is the same.

This is an example of the dataset 1

  1. COD_PROV Real wage
  2. 1 1962,18
  3. 6 1742,85
  4. 5 1541,81
  5. 96 1612,2
  6. 4 1574
  7. 3 1823,53
  8. 103 1584,49
  9. 2 1666,21
  10. 7 1747,81
  11. 10 2066,42
  12. 8 1498,01
  13. 11 1871,34
  14. 9 1770,41
  15. 15 2240,03
  16. 16 1729,17
  17. 17 1773,38
  18. 13 1832,57

Datset 2 has the same framework, but some values of COD_PROV are missing

  1. COD_PROV Real wage
  2. 1 4962,18
  3. 6 1542,85
  4. 5 3541,81
  5. 4 1564
  6. 3 1223,53
  7. 2 1446,21
  8. 7 1557,81
  9. 10 2226,42
  10. 8 1458,01
  11. 11 1843,34
  12. 16 1439,17
  13. 17 1883,38
  14. 13 1992,57

I've tried this

  1. new &lt;- mutate( dataset1, `Wage Difference ` = dataset1$`Real wage` - dataset2$`Real wage` )

but R of course replies

  1. Error in mutate():
  2. In argument: Wage difference = ... - dataset1$Real wage.
  3. Caused by error:
  4. ! Wage difference must be size 105 or 1, not 106.
  5. Run rlang::last_error() to see where the error occurred.

I suppose that the reason is that dataset 2 has less observations than dataset1 ( in particular some values of COD_PROV are missing)... How can I apply the difference only for the same values of COD_PROV ?

答案1

得分: 1

我认为你应该将它们合并在一起,然后计算差异。

  1. library(dplyr)
  2. dataset1 %>%
  3. left_join(dataset2, by = "COD_PROV", suffix = c("", ".y")) %>%
  4. mutate(diff = `Real wage` - `Real wage.y`)
  5. # COD_PROV Real wage Real wage.y diff
  6. # 1 1 1962.18 4962.18 -3000
  7. # 2 6 1742.85 1542.85 200
  8. # 3 5 1541.81 3541.81 -2000
  9. # 4 96 1612.20 NA NA
  10. # 5 4 1574.00 1564.00 10
  11. # 6 3 1823.53 1223.53 600
  12. # 7 103 1584.49 NA NA
  13. # 8 2 1666.21 1446.21 220
  14. # 9 7 1747.81 1557.81 190
  15. # 10 10 2066.42 2226.42 -160
  16. # 11 8 1498.01 1458.01 40
  17. # 12 11 1871.34 1843.34 28
  18. # 13 9 1770.41 NA NA
  19. # 14 15 2240.03 NA NA
  20. # 15 16 1729.17 1439.17 290
  21. # 16 17 1773.38 1883.38 -110
  22. # 17 13 1832.57 1992.57 -160

有关"join/merge"的更多信息,请参见https://stackoverflow.com/q/1299871/3358272,https://stackoverflow.com/q/5706437/3358272。

数据:

  1. dataset1 <- structure(list(COD_PROV = c(1L, 6L, 5L, 96L, 4L, 3L, 103L, 2L, 7L, 10L, 8L, 11L, 9L, 15L, 16L, 17L, 13L), "Real wage" = c(1962.18, 1742.85, 1541.81, 1612.2, 1574, 1823.53, 1584.49, 1666.21, 1747.81, 2066.42, 1498.01, 1871.34, 1770.41, 2240.03, 1729.17, 1773.38, 1832.57)), row.names = c(NA, -17L), class = "data.frame")
  2. dataset2 <- structure(list(COD_PROV = c(1L, 6L, 5L, 4L, 3L, 2L, 7L, 10L, 8L, 11L, 16L, 17L, 13L), "Real wage" = c(4962.18, 1542.85, 3541.81, 1564, 1223.53, 1446.21, 1557.81, 2226.42, 1458.01, 1843.34, 1439.17, 1883.38, 1992.57)), row.names = c(NA, -13L), class = "data.frame")
英文:

I think you should join/merge these together and then calculate the difference.

  1. library(dplyr)
  2. dataset1 %&gt;%
  3. left_join(dataset2, by = &quot;COD_PROV&quot;, suffix = c(&quot;&quot;, &quot;.y&quot;)) %&gt;%
  4. mutate(diff = `Real wage` - `Real wage.y`)
  5. # COD_PROV Real wage Real wage.y diff
  6. # 1 1 1962.18 4962.18 -3000
  7. # 2 6 1742.85 1542.85 200
  8. # 3 5 1541.81 3541.81 -2000
  9. # 4 96 1612.20 NA NA
  10. # 5 4 1574.00 1564.00 10
  11. # 6 3 1823.53 1223.53 600
  12. # 7 103 1584.49 NA NA
  13. # 8 2 1666.21 1446.21 220
  14. # 9 7 1747.81 1557.81 190
  15. # 10 10 2066.42 2226.42 -160
  16. # 11 8 1498.01 1458.01 40
  17. # 12 11 1871.34 1843.34 28
  18. # 13 9 1770.41 NA NA
  19. # 14 15 2240.03 NA NA
  20. # 15 16 1729.17 1439.17 290
  21. # 16 17 1773.38 1883.38 -110
  22. # 17 13 1832.57 1992.57 -160

For more information on what join/merge means, see https://stackoverflow.com/q/1299871/3358272, https://stackoverflow.com/q/5706437/3358272.


Data

  1. dataset1 &lt;- structure(list(COD_PROV = c(1L, 6L, 5L, 96L, 4L, 3L, 103L, 2L, 7L, 10L, 8L, 11L, 9L, 15L, 16L, 17L, 13L), &quot;Real wage&quot; = c(1962.18, 1742.85, 1541.81, 1612.2, 1574, 1823.53, 1584.49, 1666.21, 1747.81, 2066.42, 1498.01, 1871.34, 1770.41, 2240.03, 1729.17, 1773.38, 1832.57)), row.names = c(NA, -17L), class = &quot;data.frame&quot;)
  2. dataset2 &lt;- structure(list(COD_PROV = c(1L, 6L, 5L, 4L, 3L, 2L, 7L, 10L, 8L, 11L, 16L, 17L, 13L), &quot;Real wage&quot; = c(4962.18, 1542.85, 3541.81, 1564, 1223.53, 1446.21, 1557.81, 2226.42, 1458.01, 1843.34, 1439.17, 1883.38, 1992.57)), row.names = c(NA, -13L), class = &quot;data.frame&quot;)

答案2

得分: 0

我们可以使用 {powerjoin}:

  1. powerjoin::power_left_join(dataset1, dataset2, by = "COD_PROV", conflict = `-`)
  2. #> COD_PROV 实际工资
  3. #> 1 1 -3000
  4. #> 2 6 200
  5. #> 3 5 -2000
  6. #> 4 96 <NA>
  7. #> 5 4 10
  8. #> 6 3 600
  9. #> 7 103 <NA>
  10. #> 8 2 220
  11. #> 9 7 190
  12. #> 10 10 -160
  13. #> 11 8 40
  14. #> 12 11 28
  15. #> 13 9 <NA>
  16. #> 14 15 <NA>
  17. #> 15 16 290
  18. #> 16 17 -110
  19. #> 17 13 -160

创建于2023年3月17日,使用 reprex v2.0.2

英文:

We can use {powerjoin}:

  1. powerjoin::power_left_join(dataset1, dataset2, by = &quot;COD_PROV&quot;, conflict = `-`)
  2. #&gt; COD_PROV Real wage
  3. #&gt; 1 1 -3000
  4. #&gt; 2 6 200
  5. #&gt; 3 5 -2000
  6. #&gt; 4 96 NA
  7. #&gt; 5 4 10
  8. #&gt; 6 3 600
  9. #&gt; 7 103 NA
  10. #&gt; 8 2 220
  11. #&gt; 9 7 190
  12. #&gt; 10 10 -160
  13. #&gt; 11 8 40
  14. #&gt; 12 11 28
  15. #&gt; 13 9 NA
  16. #&gt; 14 15 NA
  17. #&gt; 15 16 290
  18. #&gt; 16 17 -110
  19. #&gt; 17 13 -160

<sup>Created on 2023-03-17 with reprex v2.0.2</sup>

huangapple
  • 本文由 发表于 2023年2月27日 01:27:23
  • 转载请务必保留本文链接:https://go.coder-hub.com/75573785.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定