问题

在这个代码片段中，我构建了一个转换器(transformer)，用于插入一列数据，但似乎修改了原始变量。实际上，我无法重复进行fit操作。这是否是预期行为？这会破坏例如grid_search的行为，因为它会尝试为每个网格拟合(insert)列。

from sklearn.base import BaseEstimator, TransformerMixin

class customColumnTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        X.insert(0, 'newCol', 1)
        return X

df = pd.DataFrame([[1, 2], [3, 4]])
display(customColumnTransformer().fit_transform(df))
display(df)
# display(customColumnTransformer().fit_transform(df)) 这会生成一个错误 ValueError: 无法插入 newCol，因为它已经存在

我找到的唯一解决方法是使用：

customColumnTransformer().fit_transform(df.copy())

英文:

in this snippet I built a transformer that insert one column, but it seems to modify the original variable. In fact I cannot repeat the fit operation. Is this expected? This breaks the behavior of grid_search for example because it tries to insert the column for every grid fit.

from sklearn.base import BaseEstimator,TransformerMixin
class customColumnTransformer(BaseEstimator,TransformerMixin):
    def fit(self,X,y=None):
        return self
    
    def transform(self,X):
        X.insert(0,&#39;newCol&#39;,1)
        return X
    
df = pd.DataFrame([[1,2],[3,4]])
display(customColumnTransformer().fit_transform(df))
display(df)
# display(customColumnTransformer().fit_transform(df))   THIS GENERATE AN ERROR  ValueError: cannot insert newCol, already exists

The only solution I found is to use
customColumnTransformer().fit_transform(df.copy())

答案1

得分: 1

当您在 transform 函数内部使用 insert 方法修改 DataFrame X 时，这种行为是完全符合预期的。它会原地修改 DataFrame。因此，如果在同一个 DataFrame 上连续调用 fit_transform，会导致错误，因为列 'newCol' 已经存在。

为了解决这个问题并避免修改原始 DataFrame，您可以在插入新列之前创建一个它的副本。这样，每次调用 fit_transform 都会在一个单独的副本上操作，不会影响原始 DataFrame。

下面是修改后的代码版本：

from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd

class CustomColumnTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self

    def transform(self, X):
        X_copy = X.copy()  # 创建 DataFrame 的副本
        X_copy.insert(0, 'newCol', 1)
        return X_copy

df = pd.DataFrame([[1, 2], [3, 4]])
display(CustomColumnTransformer().fit_transform(df))
display(df)

通过使用 X_copy = X.copy()，您生成了一个新的 DataFrame 对象 X_copy，可以独立修改，而不会影响原始的 df DataFrame。因此，您可以多次重复使用转换器而不会遇到“已存在”错误。

这个修改后的代码允许您在同一个 DataFrame 上多次调用 fit_transform 而不会生成任何错误。

英文:

Certainly! It's completely expected behavior. When you use the insert method to modify the DataFrame X within your transform function, it modifies the DataFrame in-place. Consequently, subsequent calls to fit_transform using the same DataFrame will result in an error since the column 'newCol' already exists.

To circumvent this issue and avoid modifying the original DataFrame, you can create a copy of it before inserting the new column. This way, each call to fit_transform will operate on a separate copy, leaving the original DataFrame intact.

Here's an updated version of your code that incorporates this change:

from sklearn.base import BaseEstimator, TransformerMixin
import pandas as pd

class CustomColumnTransformer(BaseEstimator, TransformerMixin):
    def fit(self, X, y=None):
        return self
    
    def transform(self, X):
        X_copy = X.copy()  # Create a copy of the DataFrame
        X_copy.insert(0, &#39;newCol&#39;, 1)
        return X_copy

df = pd.DataFrame([[1, 2], [3, 4]])
display(CustomColumnTransformer().fit_transform(df))
display(df)

By utilizing X_copy = X.copy(), you generate a new DataFrame object X_copy that can be modified independently without affecting the original df DataFrame. Consequently, you can reuse the transformer multiple times without encountering the "already exists" error.

This revised code allows you to call fit_transform repeatedly on the same DataFrame without generating any errors.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

scikit-learn中Column Transformer中的全局变量

问题

答案1

Scipy optimize curve_fit未正确响应

TypeError: WebDriver.init() got multiple values for argument ‘options’

替换字符串中的随机字符

从CRL中提取版本属性。

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论