ggplot和boxplot:是否可以添加权重?

huangapple go评论92阅读模式
英文:

ggplot & boxplot: is it possible to add weights?

问题

我正在尝试根据地区绘制工资的箱线图。

这是我的数据集的示例(由研究机构提供):

> head(final2, 20)
   nquest nord ireg staciv etalav acontrib       nome_reg tpens   pesofit
1     173    1   18      3     25       35       Calabria  1800 0.3801668
2    2886    1   13      1     26       35        Abruzzo  1211 0.2383701
3    2886    2   13      1     20       42        Abruzzo  2100 0.2383701
4    5416    1    8      3     16       30 Emilia Romagna   700 0.8819879
5    7886    1    9      1     22       35        Toscana  2000 1.2452078
6   20297    1    5      1     14       39         Veneto  1200 1.6694498
7   20711    2    4      1     15       37       Trentino  2000 3.3746801
8   22169    1   15      4     40        5       Campania   600 1.6875562
9   22276    1    8      2     18       37 Emilia Romagna  1200 2.1782894
10  22286    1    8      1     15       19 Emilia Romagna   850 3.0333999
11  22286    2    8      1     15       35 Emilia Romagna   650 3.0333999
12  22657    1   16      1     25       40         Puglie  1400 0.3616937
13  22657    2   16      1     26       36         Puglie  1500 0.3616937
14  23490    1    5      2     23       36         Veneto  1400 0.9763965
15  24147    1    4      1     26       35       Trentino  1730 1.2479984
16  24147    2    4      1     18       45       Trentino  1600 1.2479984
17  24853    1   11      1     18       38         Marche  2180 0.3475683
18  27238    1   12      1     16       31          Lazio  1050 3.6358952
19  27730    1   20      1     15       37       Sardegna  1470 0.7232677
20  27734    1   20      1     16       45       Sardegna  1159 0.6959107

变量说明:

  1. nquest = 家庭代码
  2. nord = 家庭成员
  3. nome_reg = 他们所居住的地区
  4. tpens = 每个人赚的工资
  5. pesofit = 每个观测的权重

这是我正在使用的代码:

final2 %>%
  filter(nome_reg == "Piemonte"|
         nome_reg == "Valle D'Aosta" | 
         nome_reg == "Lombardia" | 
         nome_reg == "Liguria"
        ) %>%
  ggplot(aes( x = factor(nome_reg, 
                      levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")), 
             y = tpens , fill = nome_reg ))+
  geom_boxplot(varwidth = TRUE) 

这给我这个图:

[ggplot和boxplot:是否可以添加权重?]

有没有办法绘制加权箱线图?也就是说,绘制一个考虑每个观察的权重(在这种情况下,每个地区中每个个体的工资 tpens )的箱线图?

我已经在进行加权回归,因此我想可视化加权数据。

我尝试在 aes 中使用 weight = pesofit

final2 %>%
  filter(nome_reg == "Piemonte"|
         nome_reg == "Valle D'Aosta" | 
         nome_reg == "Lombardia" | 
         nome_reg == "Liguria") %>%
  ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")), 
             y = tpens , fill = nome_reg, weight = pesofit ))+
  geom_boxplot(varwidth = TRUE)

但R给出警告信息:

Warning message:
The following aesthetics were dropped during statistical transformation: weight
i This can happen when ggplot fails to infer the correct grouping structure in the data.
i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?

如何解决这个问题?

英文:

I'm trying to plot boxplots about wage according with the area.

This is a sample of my dataset ( It is provided by a research institute)

> head(final2, 20)
nquest nord ireg staciv etalav acontrib       nome_reg tpens   pesofit
1     173    1   18      3     25       35       Calabria  1800 0.3801668
2    2886    1   13      1     26       35        Abruzzo  1211 0.2383701
3    2886    2   13      1     20       42        Abruzzo  2100 0.2383701
4    5416    1    8      3     16       30 Emilia Romagna   700 0.8819879
5    7886    1    9      1     22       35        Toscana  2000 1.2452078
6   20297    1    5      1     14       39         Veneto  1200 1.6694498
7   20711    2    4      1     15       37       Trentino  2000 3.3746801
8   22169    1   15      4     40        5       Campania   600 1.6875562
9   22276    1    8      2     18       37 Emilia Romagna  1200 2.1782894
10  22286    1    8      1     15       19 Emilia Romagna   850 3.0333999
11  22286    2    8      1     15       35 Emilia Romagna   650 3.0333999
12  22657    1   16      1     25       40         Puglie  1400 0.3616937
13  22657    2   16      1     26       36         Puglie  1500 0.3616937
14  23490    1    5      2     23       36         Veneto  1400 0.9763965
15  24147    1    4      1     26       35       Trentino  1730 1.2479984
16  24147    2    4      1     18       45       Trentino  1600 1.2479984
17  24853    1   11      1     18       38         Marche  2180 0.3475683
18  27238    1   12      1     16       31          Lazio  1050 3.6358952
19  27730    1   20      1     15       37       Sardegna  1470 0.7232677
20  27734    1   20      1     16       45       Sardegna  1159 0.6959107

The variables:

  1. nquest = is the code of the family
  2. nord = is the component of the family
  3. nome_reg = is the area where they live
  4. tpens = is the wage that each one of them earn
  5. pesofit = is the weight for each observation

This is the code I'm using

final2 %>%
filter(nome_reg == "Piemonte"| 
nome_reg == "Valle D'Aosta" | 
nome_reg == "Lombardia" | 
nome_reg == "Liguria"
) %>%
ggplot(aes( x = factor(nome_reg, 
levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")), 
y = tpens , fill = nome_reg ))+
geom_boxplot(varwidth = TRUE) 

Which gives me this plot

ggplot和boxplot:是否可以添加权重?

Is there a way to plot a weighted boxplot?? I mean a boxplot that takes into account the weights for each observation ( in this case the wage tpens for each individual in each area)?

I'm already performing a weighted regression, hence I would like to visualize the weighted data

I've tried weight = pesofit in aes

 final2 %>%
filter(nome_reg == "Piemonte"| 
nome_reg == "Valle D'Aosta" | 
nome_reg == "Lombardia" | 
nome_reg == "Liguria") %>%
ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")), 
y = tpens , fill = nome_reg, weight = pesofit ))+
geom_boxplot(varwidth = TRUE)

but R answers

Warning message:
The following aesthetics were dropped during statistical transformation: weight
i This can happen when ggplot fails to infer the correct grouping structure in the data.
i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?

How can I solve??

答案1

得分: 2

基于一个简单的例子,似乎指定权重会按预期的方式工作,尽管有警告,请参见以下权重如何影响绘图的简单示例:

set.seed(0)
tmp <- data.frame(x=rnorm(100))   #要绘制的一些随机数据
tmp$y <- ifelse(tmp$x>0, 1, 0.1)  #将正值的权重设置得很高

ggplot(tmp, aes(x=x)) + geom_boxplot()    

[![没有权重的输出][1]][1]
    
ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
#警告信息:
#在统计变换期间被省略的以下美学属性:weight
#ℹ 当ggplot未能推断数据中的正确分组结构时,可能会发生这种情况。
#ℹ 您是否忘记指定`group`美学属性,或将数值变量转换为因子?

[![带有权重的输出][2]][2]

看起来这个警告可能是虚假的,可能与[这个错误][3]有关。

  [1]: https://i.stack.imgur.com/q08fP.png
  [2]: https://i.stack.imgur.com/lEvRx.png
  [3]: https://github.com/tidyverse/ggplot2/issues/5053
英文:

Based on a simple example, it seems that specifying the weights does what's expected, despite the warning, see the following simple example of how the weights affect the plot:

set.seed(0)
tmp &lt;- data.frame(x=rnorm(100))   #Some random data to plot
tmp$y &lt;- ifelse(tmp$x&gt;0, 1, 0.1)  #weight positive values highly
ggplot(tmp, aes(x=x)) + geom_boxplot()    

ggplot和boxplot:是否可以添加权重?

ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
#Warning message:
#The following aesthetics were dropped during statistical transformation: weight
#ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
#ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor? 

ggplot和boxplot:是否可以添加权重?

It seems like the warning may be spurious, possibly related to this bug

huangapple
  • 本文由 发表于 2023年3月9日 17:06:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682432.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定