PCA在生成模型潜在空间中的有限数据上:几乎100%的方差被捕获意味着什么?

huangapple go评论55阅读模式
英文:

PCA on limited data in generative model latent space: what does almost 100% of variance captured mean?

问题

我正在研究生成模型的潜在空间,其中潜在空间中的数据具有形状为(64, 64, 3)。我想要在2D图中可视化这些数据的一个子集,比如n=5。为了实现这一目标,我已经将数据重塑为形状为(5, 12288),并使用PCA将其减少到前2个主成分,然后使用matplotlib进行绘制。

然而,我对PCA捕获的方差量不确定。当我检查时,显示捕获了超过99%的方差。我认为这可能是由于我使用的样本数量较小,因此在这种情况下奇异值最多只能为5。我的理解正确吗?这是否意味着PCA捕获的方差对于完整的潜在空间没有意义?

这是我用于重塑数据、使用PCA减少数据并检查捕获方差的代码:

import numpy as np
from sklearn.decomposition import PCA

def matrix_to_point(A):
    # 将矩阵转换为点,通过将其展平
    return A.reshape(-1)

n = 5
latent_sample = np.random.rand(n, *(64, 64, 3))
data =  np.asarray([ matrix_to_point(m) for m in latent_sample])

pca= PCA(n_components=2)
pca = pca.fit(data)

reduced_data = pca.transform(data)

print(f'PCA捕获的方差: {pca.explained_variance_ratio_}')
# 在代码中的输出:PCA捕获的方差: [0.25629761 0.25391076]
# 在完整代码中的输出:PCA捕获的方差: [0.96852827 0.03129395]

在这段代码中,我用一些随机样本替代了实际的潜在样本,以便能够执行代码。感谢您提前的帮助。

英文:

I am studying the latent space of a generative model, where the data in my latent space have a shape of (64, 64, 3). I would like to visualize a subset of this data, say n=5, in a 2D plot. To achieve this, I have reshaped the data to have a shape of (5, 12288) and used PCA to reduce it to the first 2 principal components, which I then plot using matplotlib.

However, I am uncertain about the amount of variance captured by the PCA. When I check, it shows that more than 99% of the variance is captured. I think this might be due to the small number of samples that I used, such that the singular values can only be at most 5 in this case. Is my understanding correct? Does this mean that the variance captured by the PCA is not meaningful for the full latent space?

Here is the code I used to reshape my data, reduce it with PCA, and check the captured variance:

import numpy as np
from sklearn.decomposition import PCA

def matrix_to_point(A):
    # Convert a matrix to a point by flattening it
    return A.reshape(-1)

n = 5
latent_sample = np.random.rand(n, *(64, 64, 3))
data =  np.asarray([ matrix_to_point(m) for m in latent_sample])

pca= PCA(n_components=2)
pca = pca.fit(data)

reduced_data = pca.transform(data)

print(f'Variance captured by the PCA: {pca.explained_variance_ratio_}')
#output with the posted code: Variance captured by the PCA: [0.25629761 0.25391076]
#output with the complete code: Variance captured by the PCA: [0.96852827 0.03129395]

In this code, I substituted the actual latent sample with some random samples to make it executable. Thank you in advance for your assistance

答案1

得分: 1

我会尝试通过反向转换减少的数据来评估PCA的质量,并评估结果,这里我使用了RSME,但如果适合您的用例,您可以使用另一个度量标准:

import numpy as np
from sklearn.decomposition import PCA

def matrix_to_point(A):
    # 将矩阵转换为点,通过展平它
    return A.reshape(-1)

n = 5
latent_sample = np.random.rand(n, *(64, 64, 3))
data =  np.asarray([ matrix_to_point(m) for m in latent_sample])

pca= PCA(n_components=2)
pca = pca.fit(data)

reduced_data = pca.transform(data)
print(f'PCA捕获的方差: {pca.explained_variance_ratio_}')

expanded_data = pca.inverse_transform(reduced_data)
rmse = np.mean(np.sqrt((expanded_data - data)**2))
print(f'均方根误差: {rmse}')

如果您的数据实际上是整个空间的二维子空间,拟合将非常好,RSME将非常小。

英文:

I would try judging the quality of the PCA by inversely transforming the reduced data and judge the result, here I used RSME, but you can use another metric if that suits your use case better:

import numpy as np
from sklearn.decomposition import PCA

def matrix_to_point(A):
    # Convert a matrix to a point by flattening it
    return A.reshape(-1)

n = 5
latent_sample = np.random.rand(n, *(64, 64, 3))
data =  np.asarray([ matrix_to_point(m) for m in latent_sample])

pca= PCA(n_components=2)
pca = pca.fit(data)

reduced_data = pca.transform(data)
print(f'Variance captured by the PCA: {pca.explained_variance_ratio_}')

expanded_data = pca.inverse_transform(reduced_data)
rmse = np.mean(np.sqrt((expanded_data - data)**2))
print(f'Root mean square error: {rmse}')

In case your data is actually a two-dimensional subspace of the entire space, the fit will be very good and the RSME will be very small.

huangapple
  • 本文由 发表于 2023年6月12日 17:05:45
  • 转载请务必保留本文链接:https://go.coder-hub.com/76455097.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定