ggplot和boxplot:是否可以添加权重?

huangapple go评论152阅读模式
英文:

ggplot & boxplot: is it possible to add weights?

问题

我正在尝试根据地区绘制工资的箱线图。

这是我的数据集的示例(由研究机构提供):

  1. > head(final2, 20)
  2. nquest nord ireg staciv etalav acontrib nome_reg tpens pesofit
  3. 1 173 1 18 3 25 35 Calabria 1800 0.3801668
  4. 2 2886 1 13 1 26 35 Abruzzo 1211 0.2383701
  5. 3 2886 2 13 1 20 42 Abruzzo 2100 0.2383701
  6. 4 5416 1 8 3 16 30 Emilia Romagna 700 0.8819879
  7. 5 7886 1 9 1 22 35 Toscana 2000 1.2452078
  8. 6 20297 1 5 1 14 39 Veneto 1200 1.6694498
  9. 7 20711 2 4 1 15 37 Trentino 2000 3.3746801
  10. 8 22169 1 15 4 40 5 Campania 600 1.6875562
  11. 9 22276 1 8 2 18 37 Emilia Romagna 1200 2.1782894
  12. 10 22286 1 8 1 15 19 Emilia Romagna 850 3.0333999
  13. 11 22286 2 8 1 15 35 Emilia Romagna 650 3.0333999
  14. 12 22657 1 16 1 25 40 Puglie 1400 0.3616937
  15. 13 22657 2 16 1 26 36 Puglie 1500 0.3616937
  16. 14 23490 1 5 2 23 36 Veneto 1400 0.9763965
  17. 15 24147 1 4 1 26 35 Trentino 1730 1.2479984
  18. 16 24147 2 4 1 18 45 Trentino 1600 1.2479984
  19. 17 24853 1 11 1 18 38 Marche 2180 0.3475683
  20. 18 27238 1 12 1 16 31 Lazio 1050 3.6358952
  21. 19 27730 1 20 1 15 37 Sardegna 1470 0.7232677
  22. 20 27734 1 20 1 16 45 Sardegna 1159 0.6959107

变量说明:

  1. nquest = 家庭代码
  2. nord = 家庭成员
  3. nome_reg = 他们所居住的地区
  4. tpens = 每个人赚的工资
  5. pesofit = 每个观测的权重

这是我正在使用的代码:

  1. final2 %>%
  2. filter(nome_reg == "Piemonte"|
  3. nome_reg == "Valle D'Aosta" |
  4. nome_reg == "Lombardia" |
  5. nome_reg == "Liguria"
  6. ) %>%
  7. ggplot(aes( x = factor(nome_reg,
  8. levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
  9. y = tpens , fill = nome_reg ))+
  10. geom_boxplot(varwidth = TRUE)

这给我这个图:

[ggplot和boxplot:是否可以添加权重?]

有没有办法绘制加权箱线图?也就是说,绘制一个考虑每个观察的权重(在这种情况下,每个地区中每个个体的工资 tpens )的箱线图?

我已经在进行加权回归,因此我想可视化加权数据。

我尝试在 aes 中使用 weight = pesofit

  1. final2 %>%
  2. filter(nome_reg == "Piemonte"|
  3. nome_reg == "Valle D'Aosta" |
  4. nome_reg == "Lombardia" |
  5. nome_reg == "Liguria") %>%
  6. ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
  7. y = tpens , fill = nome_reg, weight = pesofit ))+
  8. geom_boxplot(varwidth = TRUE)

但R给出警告信息:

  1. Warning message:
  2. The following aesthetics were dropped during statistical transformation: weight
  3. i This can happen when ggplot fails to infer the correct grouping structure in the data.
  4. i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?

如何解决这个问题?

英文:

I'm trying to plot boxplots about wage according with the area.

This is a sample of my dataset ( It is provided by a research institute)

  1. > head(final2, 20)
  2. nquest nord ireg staciv etalav acontrib nome_reg tpens pesofit
  3. 1 173 1 18 3 25 35 Calabria 1800 0.3801668
  4. 2 2886 1 13 1 26 35 Abruzzo 1211 0.2383701
  5. 3 2886 2 13 1 20 42 Abruzzo 2100 0.2383701
  6. 4 5416 1 8 3 16 30 Emilia Romagna 700 0.8819879
  7. 5 7886 1 9 1 22 35 Toscana 2000 1.2452078
  8. 6 20297 1 5 1 14 39 Veneto 1200 1.6694498
  9. 7 20711 2 4 1 15 37 Trentino 2000 3.3746801
  10. 8 22169 1 15 4 40 5 Campania 600 1.6875562
  11. 9 22276 1 8 2 18 37 Emilia Romagna 1200 2.1782894
  12. 10 22286 1 8 1 15 19 Emilia Romagna 850 3.0333999
  13. 11 22286 2 8 1 15 35 Emilia Romagna 650 3.0333999
  14. 12 22657 1 16 1 25 40 Puglie 1400 0.3616937
  15. 13 22657 2 16 1 26 36 Puglie 1500 0.3616937
  16. 14 23490 1 5 2 23 36 Veneto 1400 0.9763965
  17. 15 24147 1 4 1 26 35 Trentino 1730 1.2479984
  18. 16 24147 2 4 1 18 45 Trentino 1600 1.2479984
  19. 17 24853 1 11 1 18 38 Marche 2180 0.3475683
  20. 18 27238 1 12 1 16 31 Lazio 1050 3.6358952
  21. 19 27730 1 20 1 15 37 Sardegna 1470 0.7232677
  22. 20 27734 1 20 1 16 45 Sardegna 1159 0.6959107

The variables:

  1. nquest = is the code of the family
  2. nord = is the component of the family
  3. nome_reg = is the area where they live
  4. tpens = is the wage that each one of them earn
  5. pesofit = is the weight for each observation

This is the code I'm using

  1. final2 %>%
  2. filter(nome_reg == "Piemonte"|
  3. nome_reg == "Valle D'Aosta" |
  4. nome_reg == "Lombardia" |
  5. nome_reg == "Liguria"
  6. ) %>%
  7. ggplot(aes( x = factor(nome_reg,
  8. levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
  9. y = tpens , fill = nome_reg ))+
  10. geom_boxplot(varwidth = TRUE)

Which gives me this plot

ggplot和boxplot:是否可以添加权重?

Is there a way to plot a weighted boxplot?? I mean a boxplot that takes into account the weights for each observation ( in this case the wage tpens for each individual in each area)?

I'm already performing a weighted regression, hence I would like to visualize the weighted data

I've tried weight = pesofit in aes

  1. final2 %>%
  2. filter(nome_reg == "Piemonte"|
  3. nome_reg == "Valle D'Aosta" |
  4. nome_reg == "Lombardia" |
  5. nome_reg == "Liguria") %>%
  6. ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
  7. y = tpens , fill = nome_reg, weight = pesofit ))+
  8. geom_boxplot(varwidth = TRUE)

but R answers

  1. Warning message:
  2. The following aesthetics were dropped during statistical transformation: weight
  3. i This can happen when ggplot fails to infer the correct grouping structure in the data.
  4. i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?

How can I solve??

答案1

得分: 2

基于一个简单的例子,似乎指定权重会按预期的方式工作,尽管有警告,请参见以下权重如何影响绘图的简单示例:

  1. set.seed(0)
  2. tmp <- data.frame(x=rnorm(100)) #要绘制的一些随机数据
  3. tmp$y <- ifelse(tmp$x>0, 1, 0.1) #将正值的权重设置得很高
  4. ggplot(tmp, aes(x=x)) + geom_boxplot()
  5. [![没有权重的输出][1]][1]
  6. ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
  7. #警告信息:
  8. #在统计变换期间被省略的以下美学属性:weight
  9. #ℹ 当ggplot未能推断数据中的正确分组结构时,可能会发生这种情况。
  10. #ℹ 您是否忘记指定`group`美学属性,或将数值变量转换为因子?
  11. [![带有权重的输出][2]][2]
  12. 看起来这个警告可能是虚假的,可能与[这个错误][3]有关。
  13. [1]: https://i.stack.imgur.com/q08fP.png
  14. [2]: https://i.stack.imgur.com/lEvRx.png
  15. [3]: https://github.com/tidyverse/ggplot2/issues/5053
英文:

Based on a simple example, it seems that specifying the weights does what's expected, despite the warning, see the following simple example of how the weights affect the plot:

  1. set.seed(0)
  2. tmp &lt;- data.frame(x=rnorm(100)) #Some random data to plot
  3. tmp$y &lt;- ifelse(tmp$x&gt;0, 1, 0.1) #weight positive values highly
  4. ggplot(tmp, aes(x=x)) + geom_boxplot()

ggplot和boxplot:是否可以添加权重?

  1. ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
  2. #Warning message:
  3. #The following aesthetics were dropped during statistical transformation: weight
  4. #ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
  5. #ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?

ggplot和boxplot:是否可以添加权重?

It seems like the warning may be spurious, possibly related to this bug

huangapple
  • 本文由 发表于 2023年3月9日 17:06:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682432.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定