英文:
generating and drawing from a trapezoid-rule integration of a bell curve distribution of known total area in R
问题
我想从呈钟曲线形状的分布中获取数值。我有曲线下总面积的数值,我希望这个数值成立,并且希望指定使用的梯形的总数。
例如;我想要一个大致正态分布,曲线下总面积等于200。我想使用28个梯形来构成这个总面积。所以,28个梯形的和为200,但第1和28个梯形(两端)的总面积要远小于第14和15个位置的梯形。
我正在建模一个需要28天的生物过程,并且需要一些资源。目前我只是将总值(200)均匀地除以天数(28)。但为了更现实地建模这个过程,我希望在过程的中间天数中,总值的比例要大于开始或结束的天数,其中在两者之间有平滑的过渡。
这是一个我不知道如何解决的问题。我可以生成具有已知均值和标准差的正态分布,但这个问题有些不同。我想从具有相关面积值且总和为参数化面积的梯形序列中提取数值,同时遵循钟曲线形状。
我知道定义钟的形状的具体细节是必要的,但目前,假设采用任何导致明显钟形的方法,并对曲线的尖峰度具有一定控制的方法都是可以接受的。
英文:
I would like to draw values from a bell curve shaped distribution. I have a total value for the area under the curve that I would like to be true and would like to designate the total number of trapezoids used.
For example; I would like an approximately-normal distribution with a total area under the curve equal to 200. I want to use 28 trapezoids to make up that total area. so, the 28 trapezoids sum to 200, but the total area of trapezoids 1 and 28 (the ends) are much less than those at positions 14 and 15.
I am modeling a biological process which takes 28 days and costs some amount of resources. Currently I am just dividing the total value (200) by the number of days (28) evenly. But in the interest of modeling the process with a greater amount of realism I'd like the proportion of the total value in the middle days of the process to be greater than the beginning or end, with a smooth transition in between.
This is a problem that I do not know how to solve. I can generate a normal distribution of known mean and sd, but this problem is somewhat different. I want to draw from a sequence of trapezoids which have an associated area value and sum to a parameterized area while following a bell curve shape.
I am aware defining the specifics of the shape of the bell will be necessary, but for the moment, presume any approach that results in a noticeable bell shape with some control over the peaked-ness of the curve would be acceptable.
答案1
得分: 1
考虑到钟形图是由以下函数给出的:
它的面积如下:
所以,面积(200)等于ac * sqrt(2pi)(eq2)。这里是a和c之间的关系。
考虑到b=10(eq3)(这将使曲线的中心移到x=10),你仍然有三个方程和四个变量。
为了获得分布的值,您需要为a或c定义一些值。
library(plotly)
p <- plot_ly()
area <- data.frame()
x = 1:20
ydf <- data.frame("x" = x)
# Write Excel
for (a in c(30,40,50)) {
#a <- 100
b <- 10
y = a*exp((-0.5)*((x-b)^2)/((200/(a*sqrt(2*pi)))^2))
ydf[,as.character(a)] <- y
p <- p %>% add_trace(x = x, y= y, mode="lines", name = paste("a =",a))
for (i in 2:length(x)){
area[i-1,as.character(a)] <- ((y[i]+y[i-1])/2)/((x[i]-x[i-1]))
}
}
p
sum(area$"30")
sum(area$"40")
sum(area$"50")
# Output
# > sum(area$"30")
# [1] 199.8985
# > sum(area$"40")
# [1] 199.999
# > sum(area$"50")
# [1] 200
你可以看到我需要设置a的值才能得到一个分布。
现在你有了ydf对象,其中包含不同a的y的值,还有area对象,其中包含多边形的面积
英文:
Considering the bell shape is given by the following function:
the area of it is the following:
So, the area (200) is equal ac*sqrt(2pi) (eq2). Here you have the relationship between a and c.
Considering b=10 (eq3) (which will deslocate the center of the curve to x=10), You still have three equations and four variables.
In order to obtain values to a distribution, you need to define some values to a or c.
library(plotly)
p <- plot_ly()
area <- data.frame()
x = 1:20
ydf <- data.frame("x" = x)
# Escrever excel
for (a in c(30,40,50)) {
#a <- 100
b <- 10
y = a*exp((-0.5)*((x-b)^2)/((200/(a*sqrt(2*pi)))^2))
ydf[,as.character(a)] <- y
p <- p %>% add_trace(x = x, y= y, mode="lines", name = paste("a =",a))
for (i in 2:length(x)){
area[i-1,as.character(a)] <- ((y[i]+y[i-1])/2)/((x[i]-x[i-1]))
}
}
p
sum(area$"30")
sum(area$"40")
sum(area$"50")
> sum(area$"30")
[1] 199.8985
> sum(area$"40")
[1] 199.999
> sum(area$"50")
[1] 200
You can see that i needed to set the value of a in order to have a distribution.
You have now the ydf object with the values to y with different a´s, and also the area object, that have the areas of the polygons
答案2
得分: 0
这不是一个特定于 R 的问题。 你只是在寻求某种“反向”算法。
第一个问题是你需要定义生成梯形的范围。+/- 1e-2?+/- 1e-10?
一旦你解决了这个问题,只需使用任何算法计算钟形曲线的每个 28 个切片的分段定积分。归一化生成的值,使它们相加到你想要的总和。
我想知道为什么你想使用梯形形状。有更好的(抛物线)和更差的(矩形)逼近方法。你需要的精度是多少?
英文:
This is not an R-specific question. You're just asking for some kind of "reverse" algorithm.
The first problem is that you need to define the limits over which you generate trapezoids. +/- 1e-2? +/- 1e-10 ?
Once you settle that, just use any algorithm to calculate the piece-wise definite integral for each of your 28 slices of the bell curve. Normalize the values thus produced so they add up to whatever total you desire.
I do wonder why you wish to use trapezoids. There are both better (parabolic) and worse (rectangular) approximation methods. What is the accuracy you need?
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。





评论