使用矩阵操作来实现计算对数似然的特定方法。

huangapple go评论89阅读模式
英文:

Implementing a particular approach to calculating a log-likelihood using matrix operations

问题

我看到一个有关对数似然(log-likelihood)的数学表达式,它出现在CrossValidated.com的一个回答 中,但我不清楚如何在R中实现它。我不确定SO是否可以像CV一样表示MathML,但这是第二个回答中的第一个方程式(未被接受的回答):

  1. $$
  2. \begin{eqnarray}
  3. \ell(\mu, \Sigma) &=& C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
  4. $$

我关注该方程式中的第三项,我认为另一页上的另一个回答表明不需要迹运算。我可以查看各种包中存在的几种实现之一,但我认为它们使用的方法更经济,不明确遵循该方程的步骤,就像这个回答中的@onyambu一样。

我从早期的SO示例中提取了一些代码:

  1. library(MASS)
  2. # 创建协方差矩阵。请参考上面关于使用相关矩阵的影响的注意事项。
  3. S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
  4. 0.2, 1.0, 0.0, 0.4, 0.0,
  5. 0.1, 0.0, 1.0, 0.0, 0.4,
  6. 0.35, 0.4, 0.0, 1.0, 0.6,
  7. 0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
  8. colnames(S) = c("Y1", "X1", "X2", "Z1", "Z2")
  9. rownames(S) = colnames(S)
  10. # 创建均值向量
  11. mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
  12. # 生成5347个观测值
  13. obs = mvrnorm(n = 200, mu = mus, Sigma = S)

这个工作是回应一个问题,现在已经得到了正确的回答,但没有使用矩阵表达式的求和。我认为可以使用for循环为每个数据点创建单独的贡献:

  1. llmat.term3 <- matrix(NA, 200, 1)
  2. for (n in 1:200) {
  3. llmat.term3[n] <- t(obs[n,] - mus) %*% solve(S) %*% (obs[n,] - mus)
  4. }
  5. sum(llmat.term3)
  6. #[1] 982.7356

... 但我想知道是否有更紧凑的矩阵方法?或者,也许填补我线性代数知识中的空白,解释为什么 sum(u * solve(sig, u)sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) ) 相同。

英文:

I cam across a mathematical expression for log-likelihood in a CrossValidated.com answer and am unclear how I should implement in R. I'm not sure if SO can represent MathML the same as CV, but this is the first equation in the second (not accepted) anser:

  1. $$
  2. \begin{eqnarray}
  3. \ell(\mu, \Sigma) &amp;=&amp; C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
  4. $$

使用矩阵操作来实现计算对数似然的特定方法。

I focusing on the 3rd term in that equation and I do not think the trace operation is necessary according to another answer on that page. I suppose I could look at one of the several implementations in the various packages that exist, but I'm thinking they use more economical approaches that don't clearly follow that equation's procedure, as did @onyambu in the answer here:

I'm ripping out code from an earlier SO example:

  1. library(MASS)
  2. # Make covariance matrix. See note above re the implications of using a correlation matrix.
  3. S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
  4. 0.2, 1.0, 0.0, 0.4, 0.0,
  5. 0.1, 0.0, 1.0, 0.0, 0.4,
  6. 0.35, 0.4, 0.0, 1.0, 0.6,
  7. 0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
  8. colnames(S) = c(&quot;Y1&quot;, &quot;X1&quot;, &quot;X2&quot;, &quot;Z1&quot; ,&quot;Z2&quot;)
  9. rownames(S) = colnames(S)
  10. # Make mean vector
  11. mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
  12. # Generate 5347 observations
  13. obs = mvrnorm(n = 200, mu = mus, Sigma = S)

This effort was in response to a question correctly answered now but not using a summation of a matrix expression. I think I can do it with a for-loop to create individual contributions for each data point:

  1. llmat.term3 &lt;- matrix(NA, 200,1)
  2. for(n in 1:200) {
  3. llmat.term3[n] &lt;- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
  4. sum(llmat.term3)
  5. #[1] 982.7356

.... but I'm wondering if there is a more compact matrix approach? Or I suppose, filled in the gaps in my linear algebra knowledge that explains why sum(u * solve(sig, u) is the same as sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) ).

答案1

得分: 1

在你的代码中,有以下部分需要翻译:

  1. S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
  2. 0.2, 1.0, 0.0, 0.4, 0.0,
  3. 0.1, 0.0, 1.0, 0.0, 0.4,
  4. 0.35, 0.4, 0.0, 1.0, 0.6,
  5. 0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
  6. colnames(S) = c("Y1", "X1", "X2", "Z1", "Z2")
  7. rownames(S) = colnames(S)
  8. # 创建均值向量
  9. mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
  10. # 生成5347个观测值
  11. set.seed(123)
  12. obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
  13. llmat.term3 <- matrix(NA, 200, 1)
  14. for (n in 1:200) {
  15. llmat.term3[n] <- t(obs[n,] - mus) %*% solve(S) %*% (obs[n,] - mus)
  16. }
  17. sum(llmat.term3)
  18. #[1] 982.7356
  19. 与更紧凑的方法进行比较:
  20. u <- t(obs) - mus
  21. sum(diag(solve(S, tcrossprod(u))))
  22. #>[1] 982.7356
  23. sum(u * solve(S, u))
  24. #>[1] 982.7356
  25. 尽管这两个表达式给出了类似的结果,但第一个似乎比第二个更快。我不知道为什么,因为在第一个表达式中有一个n * n矩阵的计算。for循环需要很长时间来计算。
  26. 单位:毫秒
  27. expr min lq mean median uq max neval
  28. a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116 10
  29. b 2.8991 3.2693 3.73495 3.3675 3.7777 6.9719 10
  30. c 7.8176 8.5473 12.03060 9.2542 16.4089 20.1742 10
  31. ---
  32. set.seed(123)
  33. n <- 200000
  34. obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
  35. u <- t(obs) - mus
  36. microbenchmark::microbenchmark(a = {
  37. llmat.term3 <- matrix(NA, n, 1)
  38. for (i in seq(n)) {
  39. llmat.term3[i] <- t(obs[i,] - mus) %*% solve(S) %*% (obs[i,] - mus)
  40. }
  41. sum(llmat.term3)
  42. },
  43. b = sum(diag(solve(S, tcrossprod(u)))),
  44. c = sum(u * solve(S, u)),
  45. check = 'equal', times = 10)
  46. 注意:花了一些时间才找到你使用的随机种子。下次请将其包含在数据生成中。
  47. <details>
  48. <summary>英文:</summary>
  49. in your code you have
  50. S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
  51. 0.2, 1.0, 0.0, 0.4, 0.0,
  52. 0.1, 0.0, 1.0, 0.0, 0.4,
  53. 0.35, 0.4, 0.0, 1.0, 0.6,
  54. 0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
  55. colnames(S) = c(&quot;Y1&quot;, &quot;X1&quot;, &quot;X2&quot;, &quot;Z1&quot; ,&quot;Z2&quot;)
  56. rownames(S) = colnames(S)
  57. # Make mean vector
  58. mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
  59. # Generate 5347 observations
  60. set.seed(123)
  61. obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
  62. llmat.term3 &lt;- matrix(NA, 200,1)
  63. for(n in 1:200) {
  64. llmat.term3[n] &lt;- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
  65. sum(llmat.term3)
  66. #[1] 982.7356
  67. compare to more compact approaches:
  68. u &lt;- t(obs) - mus
  69. sum(diag(solve(S, tcrossprod(u))))
  70. #&gt; [1] 982.7356
  71. sum(u * solve(S, u))
  72. #&gt; [1] 982.7356
  73. Though the two expressions give similar results, The first one seems to be quicker than the second. I do not know why since in the first one there is a computation of n * n matrix. The for loop takes for-ever to compute.
  74. Unit: milliseconds
  75. expr min lq mean median uq max neval
  76. a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116 10
  77. b 2.8991 3.2693 3.73495 3.3675 3.7777 6.9719 10
  78. c 7.8176 8.5473 12.03060 9.2542 16.4089 20.1742 10
  79. ---
  80. set.seed(123)
  81. n &lt;- 200000
  82. obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
  83. u &lt;- t(obs) -mus
  84. microbenchmark::microbenchmark(a = {
  85. llmat.term3 &lt;- matrix(NA, n,1)
  86. for(i in seq(n)) {
  87. llmat.term3[i] &lt;- t(obs[i,]-mus) %*% solve(S) %*% (obs[i,]-mus) }
  88. sum(llmat.term3)
  89. },
  90. b = sum(diag(solve(S, tcrossprod(u)))),
  91. c = sum(u * solve(S, u)),
  92. check = &#39;equal&#39;, times = 10)
  93. NB: took me a while to get the seed you used. Next time include it in your data generation
  94. </details>

huangapple
  • 本文由 发表于 2023年2月18日 04:07:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75488820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定