问题

I'm here to provide the translation for your code-related content:

我目前正在阅读《Scikit-Learn、Keras和TensorFlow实战机器学习》，其中提到了一个提示：“如果您想要对稀疏矩阵进行缩放，而不需要先将其转换为密集矩阵，可以使用StandardScaler，并将其with_mean超参数设置为False：这将仅通过标准差除以数据，而不会减去均值（因为这会破坏稀疏性）。”因此，我尝试了一下以了解其作用。然而，结果似乎没有进行缩放。我从np.array创建了一个csr_matrix，并使用with_mean=False作为参数使用了StandardScaler。之后，我对矩阵进行了fit_transform。非零结果都相同，没有进行缩放。我甚至不理解结果是如何计算的。我认为均值被设置为零，然后我们基于相应列的标准差来缩放每个非零值，但这种方法本应给我缩放后的值1.732，与输出不同。
这是代码示例：
from sklearn.preprocessing import StandardScaler
from scipy.sparse import csr_matrix
import numpy as np
X = csr_matrix(np.array([[0, 0, 1], [0, 2, 0], [3, 0, 0]]))
scaler = StandardScaler(with_mean=False)
X_scaled = scaler.fit_transform(X)
print(X_scaled)
print(X_scaled.toarray())
这会输出：
  (0, 2)	2.1213203435596424
  (1, 1)	2.1213203435596424
  (2, 0)	2.1213203435596424
[[0.         0.         2.12132034]
 [0.         2.12132034 0.        ]
 [2.12132034 0.         0.        ]]
我是否做错了什么，还是我理解错了什么？
我不确定这是否是我所期望的。

英文:

I am currently reading "Hands-On Machine Learning with Scikit-Learn, Keras and TensorFlow" and came across a Tip stating "If you want to scale a sparse matrix without converting it to a dense matrix first, you can use a StandardScaler with its with_mean hyperparameter set to False: it will only divide the data by the standard deviation, without subtracting the mean (as this would break sparsity)." so I tried it out to understand what it is doing. However, the result does not seem to be scaled at all. I created a csr_matrix from a np.array and used a StandardScaler with with_mean=False as parameter. After that, I fit_transformed the matrix. The non-zero results are all the same and nothing is scaled. I don't even understand how the results are calculated. I thought the mean value is set to zero and we are scaling every non-zero value based on the standard deviation of its corresponding column but this method would've given me the scaled value 1.732 which is not the same as the output.

from sklearn.preprocessing import StandardScaler
from scipy.sparse import csr_matrix
import numpy as np
X = csr_matrix(np.array([[0, 0, 1], [0, 2, 0], [3, 0, 0]]))
scaler = StandardScaler(with_mean=False)
X_scaled = scaler.fit_transform(X)
print(X_scaled)
print(X_scaled.toarray())
This outputs:
  (0, 2)	2.1213203435596424
  (1, 1)	2.1213203435596424
  (2, 0)	2.1213203435596424
[[0.         0.         2.12132034]
 [0.         2.12132034 0.        ]
 [2.12132034 0.         0.        ]]

Am I doing something wrong or am I misunderstanding something?

I'm not sure if this is what I expected.

答案1

得分: 0

使用with_mean=False，StandardScaler仅将每列除以其标准差。如下所示，对于任何数字i，这将返回2.12132...

for i in range(1,4):
    s = np.std([0,0,i])
    print(f"{i} / {s:0.5f} = {i/s:0.5f}")
>>> 1 / 0.47140 = 2.12132
>>> 2 / 0.94281 = 2.12132
>>> 3 / 1.41421 = 2.12132

英文:

With with_mean=False, StandardScaler is only dividing each column by its standard deviation.
As you can see below, for any number i, this will return 2.12132...

for i in range(1,4):
    s = np.std([0,0,i])
    print(f&quot;{i} / {s:0.5f} = {i/s:0.5f}&quot;)
&gt;&gt;&gt; 1 / 0.47140 = 2.12132
&gt;&gt;&gt; 2 / 0.94281 = 2.12132
&gt;&gt;&gt; 3 / 1.41421 = 2.12132

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

标准缩放稀疏矩阵的确切工作原理是怎样的？

问题

答案1

移除Python中列表内元组的括号

环境变量在Jupyter Notebook中不可用，但我在基本的`printenv`中看到它们。

Pandas Vlookup True

Python比较（str、enum）类的类型

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。