问题

sklearn.preprocessing.MinMaxScaler 和 sklearn.preprocessing.RobustScaler 存在奇怪的行为
当数据的最大值非常小，小于10^(-16)时，变换器不会改变数据的最大值，仍然保持原始数据的最大值。为什么？df_small.dtypes 是 float64，这种类型可以表示更小的数字。如何在不手动处理的情况下修复它：data = data / data.max()

df_small = pd.DataFrame((np.arange(5)*10.0**(-16)))
scaler_small = MinMaxScaler()
small_transformed = scaler.fit_transform(df_small)
print(small_transformed)

[[0.e+00]
 [1.e-16]
 [2.e-16]
 [3.e-16]
 [4.e-16]]

df_not_small = pd.DataFrame((np.arange(5)*10.0**(-15)))
scaler_not_small = MinMaxScaler()
not_small_transformed = scaler_not_small.fit_transform(df_not_small)
print(not_small_transformed)

[[0.  ]
 [0.25]
 [0.5 ]
 [0.75]
 [1.  ]]

英文:

I found weird behavior of sklearn.preprocessing.MinMaxScaler and same for sklearn.preprocessing.RobustScaler
When data max value is very small < 10^(-16) transformer doesn't change data max value from raw data max value. Why? df_small.dtypes is float64, this type can represent smaller numbers. How can I fix it without handcrafted: data = data / data.max()

df_small = pd.DataFrame((np.arange(5)*10.0**(-16)))
scaler_small = MinMaxScaler()
small_transformed = scaler.fit_transform(df_small)
print(small_transformed)

[[0.e+00]
 [1.e-16]
 [2.e-16]
 [3.e-16]
 [4.e-16]]

df_not_small = pd.DataFrame((np.arange(5)*10.0**(-15)))
scaler_not_small = MinMaxScaler()
not_small_transformed = scaler_not_small.fit_transform(df_not_small)
print(not_small_transformed)

[[0.  ]
 [0.25]
 [0.5 ]
 [0.75]
 [1.  ]]

答案1

得分: 1

在应用缩放时，MinMaxScaler 调用 _handle_zeros_in_scale() 函数，其中有以下检查：

constant_mask = scale < 10 * np.finfo(scale.dtype).eps

对于 dtype 为 np.float64 的情况，10 * np.finfo(scale.dtype).eps 的值为 2.220446049250313e-15，这比第二个情况中的 4e-16 大（但小于第一个情况中的范围 4e-15）。如果缩放比这个值小，它将缩放因子设置为 1（参见此行）：

scale[constant_mask] = 1.0

不幸的是，您要么需要手动缩放数据，要么编辑 scikit-learn 以允许具有较小总体范围的样本。

英文:

When it's applying the scaling to use, the MinMaxScaler calls the _handle_zeros_in_scale() function, which has the check:

constant_mask = scale &lt; 10 * np.finfo(scale.dtype).eps

For a dtype that is np.float64, the value of 10 * np.finfo(scale.dtype).eps is 2.220446049250313e-15, which is larger than your scale of 4e-16 in the second case (but smaller than the range 4e-15 in the first case). If the scale is smaller than this, it sets the scale factor to 1 (see this line):

scale[constant_mask] = 1.0

Unfortunately, you'll either have to manually scale the data yourself, or edit scikit-learn to change it to allow samples with smaller overall ranges.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

MinMaxScaler不会将小值缩放为1。

问题

答案1

将数据框从矩阵格式转换

更改Edge的URLBlockList注册表键后，需要重新启动Edge才能反映更改。

如何使用request.get传递参数而不是完整路径

图像处理 – 如何同时增强细节并减少噪音？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论