Visualizing density function – difference between displot() and plot()

huangapple go评论74阅读模式
英文:

Visualizing density function - difference between displot() and plot()

问题

I visualize density function (PDF) using two plotting approaches: displot() and plot(). I don't understand why displot() doesn't produce normally distributed plot whereas plot() does this perfectly. Density plots should look alike but they don't. What's wrong with displot() here?

from scipy.stats import norm
import seaborn as sns 
import numpy as np

data_x = np.arange(-4, 4, 0.001)
norm_pdf = norm.pdf(data_x)
sns.displot(data=norm_pdf, x=data_x, kind='kde')

Visualizing density function – difference between displot() and plot()

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

data_x = np.arange(-4, 4, 0.001)
plt.plot(data_x, norm.pdf(data_x))
plt.show()

Visualizing density function – difference between displot() and plot()

英文:

I visualize density function (PDF) using two plotting approaches: displot() and plot(). I don't understand why displot() doesn't produce normally distributed plot wheras plot() do this perfectly. Density plots should look alike but they don't. What's wrong with displot() here?

from scipy.stats import norm
import seaborn as sns 
import numpy as np

data_x= np.arange(-4, 4, 0.001)
norm_pdf = norm.pdf(data_x)
sns.displot(data = norm_pdf, x = data_x, kind='kde')

Visualizing density function – difference between displot() and plot()

from scipy.stats import norm
import matplotlib.pyplot as plt
import numpy as np

data_x= np.arange(-4, 4, 0.001)
plt.plot(data_x, norm.pdf(data_x))
plt.show()

Visualizing density function – difference between displot() and plot()

答案1

得分: 2

displot(或其底层的kdeplot)创建了一个概率密度函数(pdf)的近似,以模拟可能生成给定随机数据的函数。作为输入,您需要随机数据。该函数将模仿这些数据作为高斯钟形曲线的总和(使用高斯核的“核密度估计”)。

以下是一个使用8000个随机点作为输入的示例。您会注意到曲线类似于正态分布的概率密度函数,但也有点“颠簸”(这就是随机性的外观)。

data_x = norm.rvs(size=8000)
sns.kdeplot(x=data_x)

在调用kdeplot(或displot(..., kind='kde'))时,如果同时指定了data=x=,而x=不是数据帧中的列名,data=将被忽略。因此,您正在使用在-4和4之间均匀分布的8000个值。这种数据的核密度估计看起来在-4和4之间是一条平直线。但由于核密度估计假设底层函数在局部上类似于高斯函数,因此起始点和结束点会被平滑处理。

data_x = np.arange(-4, 4, 0.001)
sns.kdeplot(x=data_x)

Visualizing density function – difference between displot() and plot()

英文:

displot (or the underlying kdeplot) creates an approximation of a probability density function (pdf) to resemble the function that might have generated the given random data. As input, you'll need random data. The function will mimic these data as a sum of Gaussian bell shapes (a "kernel density estimation" with a Gaussian kernel).

Here is an example using 8000 random points as input. You'll notice the curve resembles the normal pdf, but is also a bit "bumpier" (that's how randomness looks like).

data_x = norm.rvs(size=8000)
sns.kdeplot(x=data_x)

Visualizing density function – difference between displot() and plot()

When you call kdeplot (or displot(..., kind='kde')) with both data= and x=, while x= isn't a columnname in a dataframe, data= gets ignored. So, you are using 8000 evenly distributed values between -4 and 4. The kde of such data looks like a flat line between -4 and 4. But as the kde supposes the underlying function locally resembles a Gaussian, the start and end are smoothed out.

data_x = np.arange(-4, 4, 0.001)
sns.kdeplot(x=data_x)

Visualizing density function – difference between displot() and plot()

huangapple
  • 本文由 发表于 2023年2月6日 05:15:06
  • 转载请务必保留本文链接:https://go.coder-hub.com/75355560.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定