英文:
Visualizing density function - difference between displot() and plot()
问题
I visualize density function (PDF) using two plotting approaches: displot()
and plot()
. I don't understand why displot()
doesn't produce normally distributed plot whereas plot()
does this perfectly. Density plots should look alike but they don't. What's wrong with displot()
here?
from scipy.stats import norm
import seaborn as sns
import numpy as np
data_x = np.arange(-4, 4, 0.001)
norm_pdf = norm.pdf(data_x)
sns.displot(data=norm_pdf, x=data_x, kind='kde')
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np
data_x = np.arange(-4, 4, 0.001)
plt.plot(data_x, norm.pdf(data_x))
plt.show()
英文:
I visualize density function (PDF) using two plotting approaches: displot()
and plot()
. I don't understand why displot()
doesn't produce normally distributed plot wheras plot()
do this perfectly. Density plots should look alike but they don't. What's wrong with displot()
here?
from scipy.stats import norm
import seaborn as sns
import numpy as np
data_x= np.arange(-4, 4, 0.001)
norm_pdf = norm.pdf(data_x)
sns.displot(data = norm_pdf, x = data_x, kind='kde')
from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np
data_x= np.arange(-4, 4, 0.001)
plt.plot(data_x, norm.pdf(data_x))
plt.show()
答案1
得分: 2
displot
(或其底层的kdeplot
)创建了一个概率密度函数(pdf)的近似,以模拟可能生成给定随机数据的函数。作为输入,您需要随机数据。该函数将模仿这些数据作为高斯钟形曲线的总和(使用高斯核的“核密度估计”)。
以下是一个使用8000个随机点作为输入的示例。您会注意到曲线类似于正态分布的概率密度函数,但也有点“颠簸”(这就是随机性的外观)。
data_x = norm.rvs(size=8000)
sns.kdeplot(x=data_x)
在调用kdeplot
(或displot(..., kind='kde')
)时,如果同时指定了data=
和x=
,而x=
不是数据帧中的列名,data=
将被忽略。因此,您正在使用在-4和4之间均匀分布的8000个值。这种数据的核密度估计看起来在-4和4之间是一条平直线。但由于核密度估计假设底层函数在局部上类似于高斯函数,因此起始点和结束点会被平滑处理。
data_x = np.arange(-4, 4, 0.001)
sns.kdeplot(x=data_x)
英文:
displot
(or the underlying kdeplot
) creates an approximation of a probability density function (pdf) to resemble the function that might have generated the given random data. As input, you'll need random data. The function will mimic these data as a sum of Gaussian bell shapes (a "kernel density estimation" with a Gaussian kernel).
Here is an example using 8000 random points as input. You'll notice the curve resembles the normal pdf, but is also a bit "bumpier" (that's how randomness looks like).
data_x = norm.rvs(size=8000)
sns.kdeplot(x=data_x)
When you call kdeplot
(or displot(..., kind='kde')
) with both data=
and x=
, while x=
isn't a columnname in a dataframe, data=
gets ignored. So, you are using 8000 evenly distributed values between -4
and 4
. The kde of such data looks like a flat line between -4 and 4. But as the kde supposes the underlying function locally resembles a Gaussian, the start and end are smoothed out.
data_x = np.arange(-4, 4, 0.001)
sns.kdeplot(x=data_x)
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论