找到分位数档位阈值 R。

huangapple go评论71阅读模式
英文:

Find quantile bracket threshold R

问题

我有一个数据集,其中人们按收入组存储如下:

test<-tibble(income_group = c(1:3), pop = c(20,25,10),max_income_from=c(100,200,500))

并且想知道是否有一种函数可以检索与每个值相对应的百分位数(在这里,有20/55的人收入低于200,所以200将是第36百分位数,而500将是第82百分位数)。

目前我使用的是cumsum(test$pop)/sum(test$pop),但它看起来相当丑陋,我想知道是否有一种用于加权百分位数计算的函数。

谢谢!

英文:

I have a dataset where people are stored in income group like this :

test&lt;-tibble(income_group = c(1:3), pop = c(20,25,10),max_income_from=c(100,200,500))

And would like to know if there is a function to retrieve the percentile corresponding to each value (here 20/55 people earn less than 200 so 200 would be the 36th percentile and 500 the 82th)

for now I use cumsum(test$pop)/sum(test$pop) but it looks quite ugly and I wondered if there is a function for weighted percentile calculations.

thanks guys !

答案1

得分: 1

你可以使用ecdf()来实现这个功能。在ecdf()函数中,你需要指定分布范围1:55(人口总和),然后传入人口累积和的向量。这可能比你的解决方案看起来更加"丑陋",但这是一个替代方法。

ecdf(1:sum(test$pop))(cumsum(test$pop))

另一种选择是使用spatstat.geom中的ewcdf()函数。你可以将收入组作为观测值传递,将人口作为权重传递,然后传入收入向量以获得相同的结果。

library(spatstat.geom)
ewcdf(test$income_group, weights = test$pop)(test$income_group)
英文:

you can use ecdf() for this. In the ecdf() function you specify the distribution 1:55 (the sum of the population) and then pass the vector of cumulative sums for the population. This might be 'uglier' than your solution but it is an alternative.

ecdf(1:sum(test$pop))(cumsum(test$pop))

Another option is to use the ewcdf() function from spatstat.geom. you can pass income group as the observations, population as the weight, and then pass the vector of income and get the same result.

library(spatstat.geom)
ewcdf(test$income_group, weights = test$pop)(test$income_group)

huangapple
  • 本文由 发表于 2023年6月21日 23:59:56
  • 转载请务必保留本文链接:https://go.coder-hub.com/76525167.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定