如何使用np.cumsum来复制scipy.stats.expon.cdf的输出?

huangapple go评论97阅读模式
英文:

How to use np.cumsum to replicate the output of scipy.stats.expon.cdf?

问题

  1. import numpy as np
  2. from scipy.stats import expon, kstest
  3. from matplotlib import pyplot as plt
  4. def custom_exponential_cdf(x, lamb):
  5. x = x.copy()
  6. x[x < 0] = 0.0
  7. pdf = lamb * np.exp(-lamb * x)
  8. cdf = np.cumsum(pdf * np.diff(np.concatenate(([0], x))))
  9. return cdf
  10. unique_values = np.array([0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.85, 0.87, 0.89])
  11. counts = np.array([1597, 1525, 1438, 1471, 1311, 1303, 1202, 1147, 1002, 918, 893, 801, 713, 680, 599, 578, 478, 430, 409, 353, 350, 292, 245, 211, 224, 182, 171, 151, 125, 111, 94, 85, 72, 73, 57, 36, 53, 35, 35, 27, 19, 20, 15, 10, 20, 12, 10, 13, 11, 10, 17, 15, 8, 3, 3, 3, 5, 6, 6, 2, 3, 3, 4, 6, 1, 1, 3, 1, 2, 1, 3, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2])
  12. x = np.repeat(unique_values, counts)
  13. lamb = 9.23
  14. fig, ax = plt.subplots()
  15. ax.plot(x, expon.cdf(x, 0.0, 1.0 / lamb))
  16. ax.plot(x, custom_exponential_cdf(x, lamb))
  17. print(kstest(x, custom_exponential_cdf, (lamb,)))
  18. print(kstest(x, "expon", (0.0, 1.0 / lamb)))
英文:

For context, I'm trying to understand how to use np.cumsum to replicate scipy.stats.expon.cdf because I need to write a function compatible with scipy.stats.kstest which is not one of the distributions already available in scipy.stats.

I am having issues with finding resources online which give good guides to numerically computing the CDF, because the majority of resources just point you to in-built methods. I've gotten as far as the custom_exponential_cdf function defined below, which works well in some cases but breaks down in others (for example, the one below).

Does anyone know what I am doing wrong, and how I can modify custom_exponential_cdf to match the output from scipy.stats.expon.cdf?

  1. import numpy as np
  2. from matplotlib import pyplot as plt
  3. from scipy.stats import kstest, expon
  4. def custom_exponential_cdf(x, lamb):
  5. x = x.copy()
  6. x[x &lt; 0] = 0.0
  7. pdf = lamb * np.exp(-lamb * x)
  8. cdf = np.cumsum(pdf * np.diff(np.concatenate(([0], x))))
  9. return cdf
  10. unique_values = np.array([0.01, 0.02, 0.03, 0.04, 0.05, 0.06, 0.07, 0.08, 0.09, 0.1, 0.11, 0.12, 0.13, 0.14, 0.15, 0.16, 0.17, 0.18, 0.19, 0.20, 0.21, 0.22, 0.23, 0.24, 0.25, 0.26, 0.27, 0.28, 0.29, 0.3, 0.31, 0.32, 0.33, 0.34, 0.35, 0.36, 0.37, 0.38, 0.39, 0.4, 0.41, 0.42, 0.43, 0.44, 0.45, 0.46, 0.47, 0.48, 0.49, 0.5, 0.51, 0.52, 0.53, 0.54, 0.55, 0.56, 0.57, 0.58, 0.59, 0.6, 0.61, 0.62, 0.63, 0.64, 0.65, 0.66, 0.67, 0.68, 0.69, 0.71, 0.72, 0.73, 0.74, 0.75, 0.76, 0.77, 0.78, 0.79, 0.85, 0.87, 0.89])
  11. counts = np.array([1597, 1525, 1438, 1471, 1311, 1303, 1202, 1147, 1002, 918, 893, 801, 713, 680, 599, 578, 478, 430, 409, 353, 350, 292, 245, 211, 224, 182, 171, 151, 125, 111, 94, 85, 72, 73, 57, 36, 53, 35, 35, 27, 19, 20, 15, 10, 20, 12, 10, 13, 11, 10, 17, 15, 8, 3, 3, 3, 5, 6, 6, 2, 3, 3, 4, 6, 1, 1, 3, 1, 2, 1, 3, 1, 1, 2, 2, 2, 2, 2, 1, 1, 2])
  12. x = np.repeat(unique_values, counts)
  13. lamb = 9.23
  14. fig, ax = plt.subplots()
  15. ax.plot(x, expon.cdf(x, 0.0, 1.0 / lamb))
  16. ax.plot(x, custom_exponential_cdf(x, lamb))
  17. print(kstest(x, custom_exponential_cdf, (lamb,)))
  18. print(kstest(x, &quot;expon&quot;, (0.0, 1.0 / lamb)))

The output from the print statements is

  1. KstestResult(statistic=0.08740741955472273, pvalue=6.857709296861777e-145, statistic_location=0.02, statistic_sign=-1)
  2. KstestResult(statistic=0.0988550670723162, pvalue=2.7098163860110364e-185, statistic_location=0.04, statistic_sign=-1)

The output from the plot is:

如何使用np.cumsum来复制scipy.stats.expon.cdf的输出?

答案1

得分: 1

指数分布的概率密度函数在 x < 0 时为 0,自定义函数需要反映这一点。以下是修复的一种方法:

  1. import numpy as np
  2. from matplotlib import pyplot as plt
  3. from scipy.stats import kstest, expon
  4. def custom_exponential_cdf(x, lamb):
  5. x = x.copy()
  6. x[x < 0] = 0
  7. pdf = lamb * np.exp(-lamb * x)
  8. cdf = np.cumsum(pdf * np.diff(np.concatenate(([0], x)))
  9. return cdf
  10. x = np.sort(np.random.normal(loc=0.1, scale=0.09, size=21729))
  11. lamb = 9.23
  12. cdf = custom_exponential_cdf(x, lamb)
  13. fig, ax = plt.subplots()
  14. ax.plot(x, expon.cdf(x, 0.0, 1.0 / lamb), lw=7, alpha=0.7, label="scipy")
  15. ax.plot(x, cdf, 'r', label="custom")
  16. plt.legend()

它生成的结果如下所示:

如何使用np.cumsum来复制scipy.stats.expon.cdf的输出?

英文:

The pdf of the exponential distribution is 0 for x < 0 and the custom function needs to reflect this. Here is one way to fix it:

  1. import numpy as np
  2. from matplotlib import pyplot as plt
  3. from scipy.stats import kstest, expon
  4. def custom_exponential_cdf(x, lamb):
  5. x = x.copy()
  6. x[x &lt; 0] = 0
  7. pdf = lamb * np.exp(-lamb * x)
  8. cdf = np.cumsum(pdf * np.diff(np.concatenate(([0], x))))
  9. return cdf
  10. x = np.sort(np.random.normal(loc=0.1, scale=0.09, size=21729))
  11. lamb = 9.23
  12. cdf = custom_exponential_cdf(x, lamb)
  13. fig, ax = plt.subplots()
  14. ax.plot(x, expon.cdf(x, 0.0, 1.0 / lamb), lw=7, alpha = 0.7, label=&quot;scipy&quot;)
  15. ax.plot(x, cdf, &#39;r&#39;, label=&quot;custom&quot;)
  16. plt.legend()

It gives:

如何使用np.cumsum来复制scipy.stats.expon.cdf的输出?

huangapple
  • 本文由 发表于 2023年3月3日 20:25:16
  • 转载请务必保留本文链接:https://go.coder-hub.com/75627051.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定