在pipeline中使用sklearn的FunctionTransformer功能时,使用numpy。

huangapple go评论59阅读模式
英文:

Using numpy in sklearn FunctionTransformer inside pipeline

问题

我正在训练一个回归模型,在我的管道内部有类似这样的内容:

best_pipeline = Pipeline(
    steps=[
        (
            "features",
            ColumnTransformer(
                transformers=[
                    (
                        "area",
                        make_pipeline(
                            impute.SimpleImputer(),
                            pr.FunctionTransformer(lambda x: np.log1p(x)),
                            StandardScaler(),
                        ),
                        ["area"],
                    )
                ]
            ),
        ),
        (
            "regressor",
            TransformedTargetRegressor(
                regressor=model,
                transformer=PowerTransformer(method='box-cox')
            ),
        ),
    ]
)

很显然还有更多特征,但代码会太长。所以我训练模型,如果我在同一个脚本中进行预测,一切都正常。我使用dill存储模型,然后尝试在另一个Python文件中使用它。

在另一个文件中,我加载模型并尝试这样做:

import numpy as np
df['prediction'] = self.model.predict(df)

在内部,当它尝试执行transform时,它返回:

NameError: name 'np' is not defined
英文:

I'm training a regression model and inside my pipeline I have something like this:

best_pipeline = Pipeline(
    steps=[
        (
            "features",
            ColumnTransformer(
                transformers=[
                    (
                        "area",
                        make_pipeline(
                            impute.SimpleImputer(),
                            pr.FunctionTransformer(lambda x: np.log1p(x)),
                            StandardScaler(),
                        ),
                        ["area"],
                    )
                ]
            ),
        ),
        (
            "regressor",
            TransformedTargetRegressor(
                regressor=model,
                transformer=PowerTransformer(method='box-cox')
            ),
        ),
    ]
)

There are obviously more features but the code will be too long. So I train the model and if I predict in the same script everything is fine. I store the model using dill and then try to use it in another python file.

In this other file I load the model and try this:

import numpy as np
df['prediction'] = self.model.predict(df)

And internally, when it tries to do the transform it returns:

NameError: name 'np' is not defined

答案1

得分: 2

你可以通过将函数名称作为 func 参数传递来使用第三方库函数:

import numpy

transformer = FunctionTransformer(numpy.log1p)

无需使用lambda函数或自定义包装类。而且,上述解决方案可以在普通的pickle数据格式中进行持久化。

当在不同环境之间移植对象时,最好使用规范的模块名称。因此,应使用 numpy.log1p 而不是 np.log1p

英文:

You can use third-party library functions by simply passing the name of the function as a func argument:

import numpy

transformer = FunctionTransformer(numpy.log1p)

There is no need for lambdas or custom wrapper classes. Also, the above solution is persistable in plain pickle data format.

When porting objects between different environments, then it's probably a good idea to use canonical module names. Hence numpy.log1p instead of np.log1p.

答案2

得分: 0

我找到了一种解决方法,尽管可能有更好的方法。

我创建了一个封装了numpy函数的类:

class LogTransformer(pr.FunctionTransformer):
    
    def transform(self, X):
        import numpy as np

        return np.log1p(X)

然后当我创建管道时:

make_pipeline(
     impute.SimpleImputer(),
     LogTransformer(),
     StandardScaler(),
),

欢迎其他方法。

英文:

I've found a way to fix it, although there might be a better approach.

I create a class encapsulating the numpy function

class LogTransformer(pr.FunctionTransformer):
    
    def transform(self, X):
        import numpy as np

        return np.log1p(X)

Then when I create the pipeline:

 make_pipeline(
      impute.SimpleImputer(),
      LogTransformer(),
      StandardScaler(),
 ),

Any other approaches are welcomed

huangapple
  • 本文由 发表于 2023年3月9日 17:00:10
  • 转载请务必保留本文链接:https://go.coder-hub.com/75682363.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定