Why am I getting zeros when computing growth rates by country-year-sector in my data using tidyverse?

huangapple go评论90阅读模式
英文:

Why am I getting zeros when computing growth rates by country-year-sector in my data using tidyverse?

问题

我想为以下数据集中的每个国家-年份-部门计算增长率:

  1. > sapply(sa1, class)
  2. country year sector sector_share
  3. "factor" "numeric" "factor" "numeric"
  4. > print(sa1)
  5. country year sector sector_share
  6. 1 Sub-Saharan Africa 1981 agriculture 15.724457
  7. 2 Sub-Saharan Africa 1982 agriculture 16.165780
  8. 3 Sub-Saharan Africa 1983 agriculture 15.908671
  9. 4 Sub-Saharan Africa 1984 agriculture 17.593971
  10. 5 Sub-Saharan Africa 1985 agriculture 19.428871
  11. 6 Sub-Saharan Africa 1986 agriculture 19.593291
  12. 7 Sub-Saharan Africa 1987 agriculture 19.789807
  13. 8 Sub-Saharan Africa 1988 agriculture 20.597277
  14. 9 Sub-Saharan Africa 1989 agriculture 19.933259
  15. 10 Sub-Saharan Africa 1990 agriculture 19.790467
  16. 42 Sub-Saharan Africa 1981 industry 35.516119
  17. 43 Sub-Saharan Africa 1982 industry 32.407578
  18. ...

我使用以下代码:

  1. sa1 <- sa1 %>%
  2. group_by(country, year, sector) %>%
  3. arrange(year) %>%
  4. mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))

但我得到了零,这是不应该的,因为sector_share列中没有NA。

  1. > print(sa1)
  2. # A tibble: 164 × 5
  3. # Groups: country, year, sector [164]
  4. country year sector sector_share growth_rate
  5. <fct> <dbl> <fct> <dbl> <dbl>
  6. 1 Sub-Saharan Africa 1981 agriculture 15.7 0
  7. 2 Sub-Saharan Africa 1981 industry 35.5 0
  8. 3 Sub-Saharan Africa 1981 manufacturing 18.4 0
  9. 4 Sub-Saharan Africa 1981 services 44.9 0
  10. 5 Sub-Saharan Africa 1982 agriculture 16.2 0
  11. 6 Sub-Saharan Africa 1982 industry 32.4 0
  12. 7 Sub-Saharan Africa 1982 manufacturing 17.9 0
  13. 8 Sub-Saharan Africa 1982 services 46.3 0
  14. 9 Sub-Saharan Africa 1983 agriculture 15.9 0
  15. 10 Sub-Saharan Africa 1983 industry 32.3 0
  16. # ℹ 154 more rows
  17. # ℹ Use `print(n = ...)` to see more rows

我尝试计算增长率,但得到了零。这不合理,因为我的数据在sector_share列中没有NA,而且代码中我也进行了检查以防万一。

有人能帮助我吗?谢谢!

英文:

I want to compute a growth rate for each country-year-sector in the following dataset:

  1. > sapply(sa1, class)
  2. country year sector sector_share
  3. "factor" "numeric" "factor" "numeric"
  4. > print(sa1)
  5. country year sector sector_share
  6. 1 Sub-Saharan Africa 1981 agriculture 15.724457
  7. 2 Sub-Saharan Africa 1982 agriculture 16.165780
  8. 3 Sub-Saharan Africa 1983 agriculture 15.908671
  9. 4 Sub-Saharan Africa 1984 agriculture 17.593971
  10. 5 Sub-Saharan Africa 1985 agriculture 19.428871
  11. 6 Sub-Saharan Africa 1986 agriculture 19.593291
  12. 7 Sub-Saharan Africa 1987 agriculture 19.789807
  13. 8 Sub-Saharan Africa 1988 agriculture 20.597277
  14. 9 Sub-Saharan Africa 1989 agriculture 19.933259
  15. 10 Sub-Saharan Africa 1990 agriculture 19.790467
  16. 42 Sub-Saharan Africa 1981 industry 35.516119
  17. 43 Sub-Saharan Africa 1982 industry 32.407578
  18. 44 Sub-Saharan Africa 1983 industry 32.303477
  19. 45 Sub-Saharan Africa 1984 industry 30.437994
  20. 46 Sub-Saharan Africa 1985 industry 30.544564
  21. 47 Sub-Saharan Africa 1986 industry 29.458658
  22. 48 Sub-Saharan Africa 1987 industry 29.490104
  23. 49 Sub-Saharan Africa 1988 industry 29.009534
  24. 50 Sub-Saharan Africa 1989 industry 29.340000
  25. 51 Sub-Saharan Africa 1990 industry 29.698078
  26. 52 Sub-Saharan Africa 1991 industry 28.727260
  27. 83 Sub-Saharan Africa 1981 manufacturing 18.419694
  28. 84 Sub-Saharan Africa 1982 manufacturing 17.895412
  29. 85 Sub-Saharan Africa 1983 manufacturing 18.037958
  30. 86 Sub-Saharan Africa 1984 manufacturing 16.316419
  31. 87 Sub-Saharan Africa 1985 manufacturing 16.256940
  32. 88 Sub-Saharan Africa 1986 manufacturing 15.728073
  33. 89 Sub-Saharan Africa 1987 manufacturing 15.825253
  34. 90 Sub-Saharan Africa 1988 manufacturing 16.320170
  35. 91 Sub-Saharan Africa 1989 manufacturing 16.062034
  36. 92 Sub-Saharan Africa 1990 manufacturing 16.134401
  37. 93 Sub-Saharan Africa 1991 manufacturing 15.826331
  38. 124 Sub-Saharan Africa 1981 services 44.946512
  39. 125 Sub-Saharan Africa 1982 services 46.323757
  40. 126 Sub-Saharan Africa 1983 services 46.071141
  41. 127 Sub-Saharan Africa 1984 services 45.820815
  42. 128 Sub-Saharan Africa 1985 services 43.226268
  43. 129 Sub-Saharan Africa 1986 services 43.409858
  44. 130 Sub-Saharan Africa 1987 services 44.298582
  45. 131 Sub-Saharan Africa 1988 services 43.191570
  46. 132 Sub-Saharan Africa 1989 services 43.023115
  47. 133 Sub-Saharan Africa 1990 services 44.043939
  48. 134 Sub-Saharan Africa 1991 services 44.995853

I use the following code:

  1. sa1 <- sa1 %>%
  2. group_by(country, year, sector) %>%
  3. arrange(year) %>%
  4. mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))

But I obtain zeros, which should not be since the are no NAs in the sector_share column.

  1. > print(sa1)
  2. # A tibble: 164 × 5
  3. # Groups: country, year, sector [164]
  4. country year sector sector_share growth_rate
  5. <fct> <dbl> <fct> <dbl> <dbl>
  6. 1 Sub-Saharan Africa 1981 agriculture 15.7 0
  7. 2 Sub-Saharan Africa 1981 industry 35.5 0
  8. 3 Sub-Saharan Africa 1981 manufacturing 18.4 0
  9. 4 Sub-Saharan Africa 1981 services 44.9 0
  10. 5 Sub-Saharan Africa 1982 agriculture 16.2 0
  11. 6 Sub-Saharan Africa 1982 industry 32.4 0
  12. 7 Sub-Saharan Africa 1982 manufacturing 17.9 0
  13. 8 Sub-Saharan Africa 1982 services 46.3 0
  14. 9 Sub-Saharan Africa 1983 agriculture 15.9 0
  15. 10 Sub-Saharan Africa 1983 industry 32.3 0
  16. # ℹ 154 more rows
  17. # ℹ Use `print(n = ...)` to see more rows

I tried to compute the growth rate, but I obtain zeros. It does not make sense since my data has no NAs in the sector_share column and I am doing a check even in the code just in case.

Can someone help me? Thank you!

答案1

得分: 0

由于您正在按年份分组,您的计算一次只“看到”一年,因此无法计算多年间的增长。所以不要按年份分组:

  1. library(dplyr)
  2. sa1 %>%
  3. group_by(country, sector) %>%
  4. arrange(year) %>%
  5. mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))
  1. # A tibble: 43 × 5
  2. # Groups: country, sector [4]
  3. country year sector sector_share growth_rate
  4. <chr> <int> <chr> <dbl> <dbl>
  5. 1 Africa 1981 agriculture 15.7 0
  6. 2 Africa 1981 industry 35.5 0
  7. 3 Africa 1981 manufacturing 18.4 0
  8. 4 Africa 1981 services 44.9 0
  9. 5 Africa 1982 agriculture 16.2 2.81
  10. 6 Africa 1982 industry 32.4 -8.75
  11. 7 Africa 1982 manufacturing 17.9 -2.85
  12. 8 Africa 1982 services 46.3 3.06
  13. 9 Africa 1983 agriculture 15.9 -1.59
  14. 10 Africa 1983 industry 32.3 -0.321
  15. # ℹ 33 more rows
英文:

Since you’re grouping by year, your computation only “sees” one year at a time, making it impossible to compute growth across multiple years. So don’t group by year:

  1. library(dplyr)
  2. sa1 %&gt;%
  3. group_by(country, sector) %&gt;%
  4. arrange(year) %&gt;%
  5. mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))
  1. # A tibble: 43 &#215; 5
  2. # Groups: country, sector [4]
  3. country year sector sector_share growth_rate
  4. &lt;chr&gt; &lt;int&gt; &lt;chr&gt; &lt;dbl&gt; &lt;dbl&gt;
  5. 1 Africa 1981 agriculture 15.7 0
  6. 2 Africa 1981 industry 35.5 0
  7. 3 Africa 1981 manufacturing 18.4 0
  8. 4 Africa 1981 services 44.9 0
  9. 5 Africa 1982 agriculture 16.2 2.81
  10. 6 Africa 1982 industry 32.4 -8.75
  11. 7 Africa 1982 manufacturing 17.9 -2.85
  12. 8 Africa 1982 services 46.3 3.06
  13. 9 Africa 1983 agriculture 15.9 -1.59
  14. 10 Africa 1983 industry 32.3 -0.321
  15. # ℹ 33 more rows

huangapple
  • 本文由 发表于 2023年5月28日 18:46:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76351083.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定