英文:
ggplot & boxplot: is it possible to add weights?
问题
我正在尝试根据地区绘制工资的箱线图。
这是我的数据集的示例(由研究机构提供):
> head(final2, 20)
nquest nord ireg staciv etalav acontrib nome_reg tpens pesofit
1 173 1 18 3 25 35 Calabria 1800 0.3801668
2 2886 1 13 1 26 35 Abruzzo 1211 0.2383701
3 2886 2 13 1 20 42 Abruzzo 2100 0.2383701
4 5416 1 8 3 16 30 Emilia Romagna 700 0.8819879
5 7886 1 9 1 22 35 Toscana 2000 1.2452078
6 20297 1 5 1 14 39 Veneto 1200 1.6694498
7 20711 2 4 1 15 37 Trentino 2000 3.3746801
8 22169 1 15 4 40 5 Campania 600 1.6875562
9 22276 1 8 2 18 37 Emilia Romagna 1200 2.1782894
10 22286 1 8 1 15 19 Emilia Romagna 850 3.0333999
11 22286 2 8 1 15 35 Emilia Romagna 650 3.0333999
12 22657 1 16 1 25 40 Puglie 1400 0.3616937
13 22657 2 16 1 26 36 Puglie 1500 0.3616937
14 23490 1 5 2 23 36 Veneto 1400 0.9763965
15 24147 1 4 1 26 35 Trentino 1730 1.2479984
16 24147 2 4 1 18 45 Trentino 1600 1.2479984
17 24853 1 11 1 18 38 Marche 2180 0.3475683
18 27238 1 12 1 16 31 Lazio 1050 3.6358952
19 27730 1 20 1 15 37 Sardegna 1470 0.7232677
20 27734 1 20 1 16 45 Sardegna 1159 0.6959107
变量说明:
nquest
= 家庭代码nord
= 家庭成员nome_reg
= 他们所居住的地区tpens
= 每个人赚的工资pesofit
= 每个观测的权重
这是我正在使用的代码:
final2 %>%
filter(nome_reg == "Piemonte"|
nome_reg == "Valle D'Aosta" |
nome_reg == "Lombardia" |
nome_reg == "Liguria"
) %>%
ggplot(aes( x = factor(nome_reg,
levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
y = tpens , fill = nome_reg ))+
geom_boxplot(varwidth = TRUE)
这给我这个图:
[]
有没有办法绘制加权箱线图?也就是说,绘制一个考虑每个观察的权重(在这种情况下,每个地区中每个个体的工资 tpens
)的箱线图?
我已经在进行加权回归,因此我想可视化加权数据。
我尝试在 aes
中使用 weight = pesofit
:
final2 %>%
filter(nome_reg == "Piemonte"|
nome_reg == "Valle D'Aosta" |
nome_reg == "Lombardia" |
nome_reg == "Liguria") %>%
ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
y = tpens , fill = nome_reg, weight = pesofit ))+
geom_boxplot(varwidth = TRUE)
但R给出警告信息:
Warning message:
The following aesthetics were dropped during statistical transformation: weight
i This can happen when ggplot fails to infer the correct grouping structure in the data.
i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
如何解决这个问题?
英文:
I'm trying to plot boxplots about wage according with the area.
This is a sample of my dataset ( It is provided by a research institute)
> head(final2, 20)
nquest nord ireg staciv etalav acontrib nome_reg tpens pesofit
1 173 1 18 3 25 35 Calabria 1800 0.3801668
2 2886 1 13 1 26 35 Abruzzo 1211 0.2383701
3 2886 2 13 1 20 42 Abruzzo 2100 0.2383701
4 5416 1 8 3 16 30 Emilia Romagna 700 0.8819879
5 7886 1 9 1 22 35 Toscana 2000 1.2452078
6 20297 1 5 1 14 39 Veneto 1200 1.6694498
7 20711 2 4 1 15 37 Trentino 2000 3.3746801
8 22169 1 15 4 40 5 Campania 600 1.6875562
9 22276 1 8 2 18 37 Emilia Romagna 1200 2.1782894
10 22286 1 8 1 15 19 Emilia Romagna 850 3.0333999
11 22286 2 8 1 15 35 Emilia Romagna 650 3.0333999
12 22657 1 16 1 25 40 Puglie 1400 0.3616937
13 22657 2 16 1 26 36 Puglie 1500 0.3616937
14 23490 1 5 2 23 36 Veneto 1400 0.9763965
15 24147 1 4 1 26 35 Trentino 1730 1.2479984
16 24147 2 4 1 18 45 Trentino 1600 1.2479984
17 24853 1 11 1 18 38 Marche 2180 0.3475683
18 27238 1 12 1 16 31 Lazio 1050 3.6358952
19 27730 1 20 1 15 37 Sardegna 1470 0.7232677
20 27734 1 20 1 16 45 Sardegna 1159 0.6959107
The variables:
nquest
= is the code of the familynord
= is the component of the familynome_reg
= is the area where they livetpens
= is the wage that each one of them earnpesofit
= is the weight for each observation
This is the code I'm using
final2 %>%
filter(nome_reg == "Piemonte"|
nome_reg == "Valle D'Aosta" |
nome_reg == "Lombardia" |
nome_reg == "Liguria"
) %>%
ggplot(aes( x = factor(nome_reg,
levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
y = tpens , fill = nome_reg ))+
geom_boxplot(varwidth = TRUE)
Which gives me this plot
Is there a way to plot a weighted boxplot?? I mean a boxplot that takes into account the weights
for each observation ( in this case the wage tpens
for each individual in each area)?
I'm already performing a weighted regression, hence I would like to visualize the weighted data
I've tried weight = pesofit
in aes
final2 %>%
filter(nome_reg == "Piemonte"|
nome_reg == "Valle D'Aosta" |
nome_reg == "Lombardia" |
nome_reg == "Liguria") %>%
ggplot(aes( x = factor(nome_reg, levels=c("Piemonte", "Valle D'Aosta", "Lombardia", "Liguria")),
y = tpens , fill = nome_reg, weight = pesofit ))+
geom_boxplot(varwidth = TRUE)
but R answers
Warning message:
The following aesthetics were dropped during statistical transformation: weight
i This can happen when ggplot fails to infer the correct grouping structure in the data.
i Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
How can I solve??
答案1
得分: 2
基于一个简单的例子,似乎指定权重会按预期的方式工作,尽管有警告,请参见以下权重如何影响绘图的简单示例:
set.seed(0)
tmp <- data.frame(x=rnorm(100)) #要绘制的一些随机数据
tmp$y <- ifelse(tmp$x>0, 1, 0.1) #将正值的权重设置得很高
ggplot(tmp, aes(x=x)) + geom_boxplot()
[![没有权重的输出][1]][1]
ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
#警告信息:
#在统计变换期间被省略的以下美学属性:weight
#ℹ 当ggplot未能推断数据中的正确分组结构时,可能会发生这种情况。
#ℹ 您是否忘记指定`group`美学属性,或将数值变量转换为因子?
[![带有权重的输出][2]][2]
看起来这个警告可能是虚假的,可能与[这个错误][3]有关。
[1]: https://i.stack.imgur.com/q08fP.png
[2]: https://i.stack.imgur.com/lEvRx.png
[3]: https://github.com/tidyverse/ggplot2/issues/5053
英文:
Based on a simple example, it seems that specifying the weights does what's expected, despite the warning, see the following simple example of how the weights affect the plot:
set.seed(0)
tmp <- data.frame(x=rnorm(100)) #Some random data to plot
tmp$y <- ifelse(tmp$x>0, 1, 0.1) #weight positive values highly
ggplot(tmp, aes(x=x)) + geom_boxplot()
ggplot(tmp, aes(x=x, weight=y)) + geom_boxplot()
#Warning message:
#The following aesthetics were dropped during statistical transformation: weight
#ℹ This can happen when ggplot fails to infer the correct grouping structure in the data.
#ℹ Did you forget to specify a `group` aesthetic or to convert a numerical variable into a factor?
It seems like the warning may be spurious, possibly related to this bug
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论