使用矩阵操作来实现计算对数似然的特定方法。

huangapple go评论65阅读模式
英文:

Implementing a particular approach to calculating a log-likelihood using matrix operations

问题

我看到一个有关对数似然(log-likelihood)的数学表达式,它出现在CrossValidated.com的一个回答 中,但我不清楚如何在R中实现它。我不确定SO是否可以像CV一样表示MathML,但这是第二个回答中的第一个方程式(未被接受的回答):

$$
\begin{eqnarray}
    \ell(\mu, \Sigma) &=& C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
$$

我关注该方程式中的第三项,我认为另一页上的另一个回答表明不需要迹运算。我可以查看各种包中存在的几种实现之一,但我认为它们使用的方法更经济,不明确遵循该方程的步骤,就像这个回答中的@onyambu一样。

我从早期的SO示例中提取了一些代码:

library(MASS)
# 创建协方差矩阵。请参考上面关于使用相关矩阵的影响的注意事项。
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1", "Z2")
rownames(S) = colnames(S)
# 创建均值向量
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# 生成5347个观测值
obs = mvrnorm(n = 200, mu = mus, Sigma = S)

这个工作是回应一个问题,现在已经得到了正确的回答,但没有使用矩阵表达式的求和。我认为可以使用for循环为每个数据点创建单独的贡献:

llmat.term3 <- matrix(NA, 200, 1)
for (n in 1:200) {
    llmat.term3[n] <- t(obs[n,] - mus) %*% solve(S) %*% (obs[n,] - mus)
}
sum(llmat.term3)
#[1] 982.7356

... 但我想知道是否有更紧凑的矩阵方法?或者,也许填补我线性代数知识中的空白,解释为什么 sum(u * solve(sig, u)sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) ) 相同。

英文:

I cam across a mathematical expression for log-likelihood in a CrossValidated.com answer and am unclear how I should implement in R. I'm not sure if SO can represent MathML the same as CV, but this is the first equation in the second (not accepted) anser:

$$
\begin{eqnarray}
    \ell(\mu, \Sigma) &amp;=&amp; C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
$$

使用矩阵操作来实现计算对数似然的特定方法。

I focusing on the 3rd term in that equation and I do not think the trace operation is necessary according to another answer on that page. I suppose I could look at one of the several implementations in the various packages that exist, but I'm thinking they use more economical approaches that don't clearly follow that equation's procedure, as did @onyambu in the answer here:

I'm ripping out code from an earlier SO example:

library(MASS)
# Make covariance matrix. See note above re the implications of using a correlation matrix.
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c(&quot;Y1&quot;, &quot;X1&quot;, &quot;X2&quot;, &quot;Z1&quot; ,&quot;Z2&quot;)
rownames(S) = colnames(S)
# Make mean vector
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# Generate 5347 observations
obs = mvrnorm(n = 200, mu = mus, Sigma = S)

This effort was in response to a question correctly answered now but not using a summation of a matrix expression. I think I can do it with a for-loop to create individual contributions for each data point:

llmat.term3 &lt;- matrix(NA, 200,1)
for(n in 1:200) {
     llmat.term3[n] &lt;- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
sum(llmat.term3)
#[1] 982.7356

.... but I'm wondering if there is a more compact matrix approach? Or I suppose, filled in the gaps in my linear algebra knowledge that explains why sum(u * solve(sig, u) is the same as sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) ).

答案1

得分: 1

在你的代码中,有以下部分需要翻译:

S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
             0.2, 1.0, 0.0, 0.4, 0.0,
             0.1, 0.0, 1.0, 0.0, 0.4,
             0.35, 0.4, 0.0, 1.0, 0.6,
             0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1", "Z2")
rownames(S) = colnames(S)
# 创建均值向量
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# 生成5347个观测值
set.seed(123)
obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
llmat.term3 <- matrix(NA, 200, 1)
for (n in 1:200) {
  llmat.term3[n] <- t(obs[n,] - mus) %*% solve(S) %*% (obs[n,] - mus)
}
sum(llmat.term3)
#[1] 982.7356

与更紧凑的方法进行比较:

u <- t(obs) - mus

sum(diag(solve(S, tcrossprod(u))))
#>[1] 982.7356    
sum(u * solve(S, u))
#>[1] 982.7356

尽管这两个表达式给出了类似的结果,但第一个似乎比第二个更快。我不知道为什么,因为在第一个表达式中有一个n * n矩阵的计算。for循环需要很长时间来计算。

单位:毫秒
 expr       min        lq       mean    median        uq       max  neval
    a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116     10
    b    2.8991    3.2693    3.73495    3.3675    3.7777    6.9719     10
    c    7.8176    8.5473   12.03060    9.2542   16.4089   20.1742     10

---

set.seed(123)
n <- 200000

obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
u <- t(obs) - mus

microbenchmark::microbenchmark(a = {
  llmat.term3 <- matrix(NA, n, 1)
  for (i in seq(n)) {
    llmat.term3[i] <- t(obs[i,] - mus) %*% solve(S) %*% (obs[i,] - mus)
  }
  sum(llmat.term3)
},
b = sum(diag(solve(S, tcrossprod(u)))),
c = sum(u * solve(S, u)),
check = 'equal', times = 10)

注意:花了一些时间才找到你使用的随机种子。下次请将其包含在数据生成中。

<details>
<summary>英文:</summary>

in your code you have 

    S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
                 0.2, 1.0, 0.0, 0.4, 0.0,
                 0.1, 0.0, 1.0, 0.0, 0.4,
                 0.35, 0.4, 0.0, 1.0, 0.6,
                 0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
    colnames(S) = c(&quot;Y1&quot;, &quot;X1&quot;, &quot;X2&quot;, &quot;Z1&quot; ,&quot;Z2&quot;)
    rownames(S) = colnames(S)
    # Make mean vector
    mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
    # Generate 5347 observations
    set.seed(123)
    obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
    llmat.term3 &lt;- matrix(NA, 200,1)
    for(n in 1:200) {
      llmat.term3[n] &lt;- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
    sum(llmat.term3)
    #[1] 982.7356

compare to more compact approaches:

    u &lt;- t(obs) - mus

    sum(diag(solve(S, tcrossprod(u))))
    #&gt; [1] 982.7356    
    sum(u * solve(S, u))
    #&gt; [1] 982.7356

Though the two expressions give similar results, The first one seems to be quicker than the second. I do not know why since in the first one there is a computation of n * n matrix.  The for loop takes for-ever to compute.

    Unit: milliseconds
     expr       min        lq       mean    median        uq       max  neval
        a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116     10
        b    2.8991    3.2693    3.73495    3.3675    3.7777    6.9719     10
        c    7.8176    8.5473   12.03060    9.2542   16.4089   20.1742     10

---

    set.seed(123)
    n &lt;- 200000
    
    obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
    u &lt;- t(obs) -mus
    
    microbenchmark::microbenchmark(a = {
      llmat.term3 &lt;- matrix(NA, n,1)
      for(i in seq(n)) {
       llmat.term3[i] &lt;- t(obs[i,]-mus) %*% solve(S) %*% (obs[i,]-mus) }
      sum(llmat.term3)
      },
      b = sum(diag(solve(S, tcrossprod(u)))),
      c = sum(u * solve(S, u)),
      check = &#39;equal&#39;, times = 10)

NB: took me a while to get the seed you used. Next time include it in your data generation




</details>



huangapple
  • 本文由 发表于 2023年2月18日 04:07:48
  • 转载请务必保留本文链接:https://go.coder-hub.com/75488820.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定