英文:
Implementing a particular approach to calculating a log-likelihood using matrix operations
问题
我看到一个有关对数似然(log-likelihood)的数学表达式,它出现在CrossValidated.com的一个回答 中,但我不清楚如何在R中实现它。我不确定SO是否可以像CV一样表示MathML,但这是第二个回答中的第一个方程式(未被接受的回答):
$$
\begin{eqnarray}
\ell(\mu, \Sigma) &=& C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
$$
我关注该方程式中的第三项,我认为另一页上的另一个回答表明不需要迹运算。我可以查看各种包中存在的几种实现之一,但我认为它们使用的方法更经济,不明确遵循该方程的步骤,就像这个回答中的@onyambu一样。
我从早期的SO示例中提取了一些代码:
library(MASS)
# 创建协方差矩阵。请参考上面关于使用相关矩阵的影响的注意事项。
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1", "Z2")
rownames(S) = colnames(S)
# 创建均值向量
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# 生成5347个观测值
obs = mvrnorm(n = 200, mu = mus, Sigma = S)
这个工作是回应一个问题,现在已经得到了正确的回答,但没有使用矩阵表达式的求和。我认为可以使用for循环为每个数据点创建单独的贡献:
llmat.term3 <- matrix(NA, 200, 1)
for (n in 1:200) {
llmat.term3[n] <- t(obs[n,] - mus) %*% solve(S) %*% (obs[n,] - mus)
}
sum(llmat.term3)
#[1] 982.7356
... 但我想知道是否有更紧凑的矩阵方法?或者,也许填补我线性代数知识中的空白,解释为什么 sum(u * solve(sig, u)
与 sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) )
相同。
英文:
I cam across a mathematical expression for log-likelihood in a CrossValidated.com answer and am unclear how I should implement in R. I'm not sure if SO can represent MathML the same as CV, but this is the first equation in the second (not accepted) anser:
$$
\begin{eqnarray}
\ell(\mu, \Sigma) &=& C - \frac{m}{2}\log|\Sigma|-\frac{1}{2} \sum_{i=1}^m \text{tr}\left[(\mathbf{x}^{(i)}-\mu)^T \Sigma^{-1} (\mathbf{x}^{(i)}-\mu)\right]\\
$$
I focusing on the 3rd term in that equation and I do not think the trace operation is necessary according to another answer on that page. I suppose I could look at one of the several implementations in the various packages that exist, but I'm thinking they use more economical approaches that don't clearly follow that equation's procedure, as did @onyambu in the answer here:
I'm ripping out code from an earlier SO example:
library(MASS)
# Make covariance matrix. See note above re the implications of using a correlation matrix.
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1" ,"Z2")
rownames(S) = colnames(S)
# Make mean vector
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# Generate 5347 observations
obs = mvrnorm(n = 200, mu = mus, Sigma = S)
This effort was in response to a question correctly answered now but not using a summation of a matrix expression. I think I can do it with a for-loop to create individual contributions for each data point:
llmat.term3 <- matrix(NA, 200,1)
for(n in 1:200) {
llmat.term3[n] <- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
sum(llmat.term3)
#[1] 982.7356
.... but I'm wondering if there is a more compact matrix approach? Or I suppose, filled in the gaps in my linear algebra knowledge that explains why sum(u * solve(sig, u)
is the same as sum{i=1,N} ( t(obs[n,]-mu) %*% S^-1 %*% (obs[n,]-mu) )
.
答案1
得分: 1
在你的代码中,有以下部分需要翻译:
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1", "Z2")
rownames(S) = colnames(S)
# 创建均值向量
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# 生成5347个观测值
set.seed(123)
obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
llmat.term3 <- matrix(NA, 200, 1)
for (n in 1:200) {
llmat.term3[n] <- t(obs[n,] - mus) %*% solve(S) %*% (obs[n,] - mus)
}
sum(llmat.term3)
#[1] 982.7356
与更紧凑的方法进行比较:
u <- t(obs) - mus
sum(diag(solve(S, tcrossprod(u))))
#>[1] 982.7356
sum(u * solve(S, u))
#>[1] 982.7356
尽管这两个表达式给出了类似的结果,但第一个似乎比第二个更快。我不知道为什么,因为在第一个表达式中有一个n * n矩阵的计算。for循环需要很长时间来计算。
单位:毫秒
expr min lq mean median uq max neval
a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116 10
b 2.8991 3.2693 3.73495 3.3675 3.7777 6.9719 10
c 7.8176 8.5473 12.03060 9.2542 16.4089 20.1742 10
---
set.seed(123)
n <- 200000
obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
u <- t(obs) - mus
microbenchmark::microbenchmark(a = {
llmat.term3 <- matrix(NA, n, 1)
for (i in seq(n)) {
llmat.term3[i] <- t(obs[i,] - mus) %*% solve(S) %*% (obs[i,] - mus)
}
sum(llmat.term3)
},
b = sum(diag(solve(S, tcrossprod(u)))),
c = sum(u * solve(S, u)),
check = 'equal', times = 10)
注意:花了一些时间才找到你使用的随机种子。下次请将其包含在数据生成中。
<details>
<summary>英文:</summary>
in your code you have
S = matrix(c(1.0, 0.2, 0.1, 0.35, 0.0,
0.2, 1.0, 0.0, 0.4, 0.0,
0.1, 0.0, 1.0, 0.0, 0.4,
0.35, 0.4, 0.0, 1.0, 0.6,
0.0, 0.0, 0.4, 0.6, 1.0), ncol = 5)
colnames(S) = c("Y1", "X1", "X2", "Z1" ,"Z2")
rownames(S) = colnames(S)
# Make mean vector
mus = c(1, 2, 3, 4, 5); names(mus) = colnames(S)
# Generate 5347 observations
set.seed(123)
obs = MASS::mvrnorm(n = 200, mu = mus, Sigma = S)
llmat.term3 <- matrix(NA, 200,1)
for(n in 1:200) {
llmat.term3[n] <- t(obs[n,]-mus) %*% solve(S) %*% (obs[n,]-mus) }
sum(llmat.term3)
#[1] 982.7356
compare to more compact approaches:
u <- t(obs) - mus
sum(diag(solve(S, tcrossprod(u))))
#> [1] 982.7356
sum(u * solve(S, u))
#> [1] 982.7356
Though the two expressions give similar results, The first one seems to be quicker than the second. I do not know why since in the first one there is a computation of n * n matrix. The for loop takes for-ever to compute.
Unit: milliseconds
expr min lq mean median uq max neval
a 4532.6753 4679.4043 5470.94765 4815.1294 6061.3284 7789.5116 10
b 2.8991 3.2693 3.73495 3.3675 3.7777 6.9719 10
c 7.8176 8.5473 12.03060 9.2542 16.4089 20.1742 10
---
set.seed(123)
n <- 200000
obs = MASS::mvrnorm(n = n, mu = mus, Sigma = S)
u <- t(obs) -mus
microbenchmark::microbenchmark(a = {
llmat.term3 <- matrix(NA, n,1)
for(i in seq(n)) {
llmat.term3[i] <- t(obs[i,]-mus) %*% solve(S) %*% (obs[i,]-mus) }
sum(llmat.term3)
},
b = sum(diag(solve(S, tcrossprod(u)))),
c = sum(u * solve(S, u)),
check = 'equal', times = 10)
NB: took me a while to get the seed you used. Next time include it in your data generation
</details>
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论