Why am I getting zeros when computing growth rates by country-year-sector in my data using tidyverse?

huangapple go评论60阅读模式
英文:

Why am I getting zeros when computing growth rates by country-year-sector in my data using tidyverse?

问题

我想为以下数据集中的每个国家-年份-部门计算增长率:

> sapply(sa1, class)
     country         year       sector sector_share 
    "factor"    "numeric"     "factor"    "numeric" 
> print(sa1)
               country year        sector sector_share
1   Sub-Saharan Africa 1981   agriculture    15.724457
2   Sub-Saharan Africa 1982   agriculture    16.165780
3   Sub-Saharan Africa 1983   agriculture    15.908671
4   Sub-Saharan Africa 1984   agriculture    17.593971
5   Sub-Saharan Africa 1985   agriculture    19.428871
6   Sub-Saharan Africa 1986   agriculture    19.593291
7   Sub-Saharan Africa 1987   agriculture    19.789807
8   Sub-Saharan Africa 1988   agriculture    20.597277
9   Sub-Saharan Africa 1989   agriculture    19.933259
10  Sub-Saharan Africa 1990   agriculture    19.790467

42  Sub-Saharan Africa 1981      industry    35.516119
43  Sub-Saharan Africa 1982      industry    32.407578
...

我使用以下代码:

sa1 <- sa1 %>%
  group_by(country, year, sector) %>%
  arrange(year) %>%
  mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))

但我得到了零,这是不应该的,因为sector_share列中没有NA。

> print(sa1)
# A tibble: 164 × 5
# Groups:   country, year, sector [164]
   country             year sector        sector_share growth_rate
   <fct>              <dbl> <fct>                <dbl>       <dbl>
 1 Sub-Saharan Africa  1981 agriculture           15.7           0
 2 Sub-Saharan Africa  1981 industry              35.5           0
 3 Sub-Saharan Africa  1981 manufacturing         18.4           0
 4 Sub-Saharan Africa  1981 services              44.9           0
 5 Sub-Saharan Africa  1982 agriculture           16.2           0
 6 Sub-Saharan Africa  1982 industry              32.4           0
 7 Sub-Saharan Africa  1982 manufacturing         17.9           0
 8 Sub-Saharan Africa  1982 services              46.3           0
 9 Sub-Saharan Africa  1983 agriculture           15.9           0
10 Sub-Saharan Africa  1983 industry              32.3           0
# ℹ 154 more rows
# ℹ Use `print(n = ...)` to see more rows

我尝试计算增长率,但得到了零。这不合理,因为我的数据在sector_share列中没有NA,而且代码中我也进行了检查以防万一。

有人能帮助我吗?谢谢!

英文:

I want to compute a growth rate for each country-year-sector in the following dataset:

> sapply(sa1, class)
     country         year       sector sector_share 
    "factor"    "numeric"     "factor"    "numeric" 
> print(sa1)
               country year        sector sector_share
1   Sub-Saharan Africa 1981   agriculture    15.724457
2   Sub-Saharan Africa 1982   agriculture    16.165780
3   Sub-Saharan Africa 1983   agriculture    15.908671
4   Sub-Saharan Africa 1984   agriculture    17.593971
5   Sub-Saharan Africa 1985   agriculture    19.428871
6   Sub-Saharan Africa 1986   agriculture    19.593291
7   Sub-Saharan Africa 1987   agriculture    19.789807
8   Sub-Saharan Africa 1988   agriculture    20.597277
9   Sub-Saharan Africa 1989   agriculture    19.933259
10  Sub-Saharan Africa 1990   agriculture    19.790467

42  Sub-Saharan Africa 1981      industry    35.516119
43  Sub-Saharan Africa 1982      industry    32.407578
44  Sub-Saharan Africa 1983      industry    32.303477
45  Sub-Saharan Africa 1984      industry    30.437994
46  Sub-Saharan Africa 1985      industry    30.544564
47  Sub-Saharan Africa 1986      industry    29.458658
48  Sub-Saharan Africa 1987      industry    29.490104
49  Sub-Saharan Africa 1988      industry    29.009534
50  Sub-Saharan Africa 1989      industry    29.340000
51  Sub-Saharan Africa 1990      industry    29.698078
52  Sub-Saharan Africa 1991      industry    28.727260

83  Sub-Saharan Africa 1981 manufacturing    18.419694
84  Sub-Saharan Africa 1982 manufacturing    17.895412
85  Sub-Saharan Africa 1983 manufacturing    18.037958
86  Sub-Saharan Africa 1984 manufacturing    16.316419
87  Sub-Saharan Africa 1985 manufacturing    16.256940
88  Sub-Saharan Africa 1986 manufacturing    15.728073
89  Sub-Saharan Africa 1987 manufacturing    15.825253
90  Sub-Saharan Africa 1988 manufacturing    16.320170
91  Sub-Saharan Africa 1989 manufacturing    16.062034
92  Sub-Saharan Africa 1990 manufacturing    16.134401
93  Sub-Saharan Africa 1991 manufacturing    15.826331

124 Sub-Saharan Africa 1981      services    44.946512
125 Sub-Saharan Africa 1982      services    46.323757
126 Sub-Saharan Africa 1983      services    46.071141
127 Sub-Saharan Africa 1984      services    45.820815
128 Sub-Saharan Africa 1985      services    43.226268
129 Sub-Saharan Africa 1986      services    43.409858
130 Sub-Saharan Africa 1987      services    44.298582
131 Sub-Saharan Africa 1988      services    43.191570
132 Sub-Saharan Africa 1989      services    43.023115
133 Sub-Saharan Africa 1990      services    44.043939
134 Sub-Saharan Africa 1991      services    44.995853


I use the following code:

sa1 <- sa1 %>%
  group_by(country, year, sector) %>%
  arrange(year) %>%
  mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))

But I obtain zeros, which should not be since the are no NAs in the sector_share column.

> print(sa1)
# A tibble: 164 × 5
# Groups:   country, year, sector [164]
   country             year sector        sector_share growth_rate
   <fct>              <dbl> <fct>                <dbl>       <dbl>
 1 Sub-Saharan Africa  1981 agriculture           15.7           0
 2 Sub-Saharan Africa  1981 industry              35.5           0
 3 Sub-Saharan Africa  1981 manufacturing         18.4           0
 4 Sub-Saharan Africa  1981 services              44.9           0
 5 Sub-Saharan Africa  1982 agriculture           16.2           0
 6 Sub-Saharan Africa  1982 industry              32.4           0
 7 Sub-Saharan Africa  1982 manufacturing         17.9           0
 8 Sub-Saharan Africa  1982 services              46.3           0
 9 Sub-Saharan Africa  1983 agriculture           15.9           0
10 Sub-Saharan Africa  1983 industry              32.3           0
# ℹ 154 more rows
# ℹ Use `print(n = ...)` to see more rows

I tried to compute the growth rate, but I obtain zeros. It does not make sense since my data has no NAs in the sector_share column and I am doing a check even in the code just in case.

Can someone help me? Thank you!

答案1

得分: 0

由于您正在按年份分组,您的计算一次只“看到”一年,因此无法计算多年间的增长。所以不要按年份分组:


library(dplyr)

sa1 %>%
  group_by(country, sector) %>%
  arrange(year) %>%
  mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))
# A tibble: 43 × 5
# Groups:   country, sector [4]
   country  year sector        sector_share growth_rate
   <chr>   <int> <chr>                <dbl>       <dbl>
 1 Africa   1981 agriculture           15.7       0    
 2 Africa   1981 industry              35.5       0    
 3 Africa   1981 manufacturing         18.4       0    
 4 Africa   1981 services              44.9       0    
 5 Africa   1982 agriculture           16.2       2.81 
 6 Africa   1982 industry              32.4      -8.75 
 7 Africa   1982 manufacturing         17.9      -2.85 
 8 Africa   1982 services              46.3       3.06 
 9 Africa   1983 agriculture           15.9      -1.59 
10 Africa   1983 industry              32.3      -0.321
# ℹ 33 more rows
英文:

Since you’re grouping by year, your computation only “sees” one year at a time, making it impossible to compute growth across multiple years. So don’t group by year:


library(dplyr)

sa1 %&gt;%
  group_by(country, sector) %&gt;%
  arrange(year) %&gt;%
  mutate(growth_rate = ifelse(!is.na(lag(sector_share)), (sector_share / lag(sector_share) - 1) * 100, 0))
# A tibble: 43 &#215; 5
# Groups:   country, sector [4]
   country  year sector        sector_share growth_rate
   &lt;chr&gt;   &lt;int&gt; &lt;chr&gt;                &lt;dbl&gt;       &lt;dbl&gt;
 1 Africa   1981 agriculture           15.7       0    
 2 Africa   1981 industry              35.5       0    
 3 Africa   1981 manufacturing         18.4       0    
 4 Africa   1981 services              44.9       0    
 5 Africa   1982 agriculture           16.2       2.81 
 6 Africa   1982 industry              32.4      -8.75 
 7 Africa   1982 manufacturing         17.9      -2.85 
 8 Africa   1982 services              46.3       3.06 
 9 Africa   1983 agriculture           15.9      -1.59 
10 Africa   1983 industry              32.3      -0.321
# ℹ 33 more rows

huangapple
  • 本文由 发表于 2023年5月28日 18:46:59
  • 转载请务必保留本文链接:https://go.coder-hub.com/76351083.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定