英文:
Using numpy in sklearn FunctionTransformer inside pipeline
问题
我正在训练一个回归模型,在我的管道内部有类似这样的内容:
best_pipeline = Pipeline(
steps=[
(
"features",
ColumnTransformer(
transformers=[
(
"area",
make_pipeline(
impute.SimpleImputer(),
pr.FunctionTransformer(lambda x: np.log1p(x)),
StandardScaler(),
),
["area"],
)
]
),
),
(
"regressor",
TransformedTargetRegressor(
regressor=model,
transformer=PowerTransformer(method='box-cox')
),
),
]
)
很显然还有更多特征,但代码会太长。所以我训练模型,如果我在同一个脚本中进行预测,一切都正常。我使用dill存储模型,然后尝试在另一个Python文件中使用它。
在另一个文件中,我加载模型并尝试这样做:
import numpy as np
df['prediction'] = self.model.predict(df)
在内部,当它尝试执行transform
时,它返回:
NameError: name 'np' is not defined
英文:
I'm training a regression model and inside my pipeline I have something like this:
best_pipeline = Pipeline(
steps=[
(
"features",
ColumnTransformer(
transformers=[
(
"area",
make_pipeline(
impute.SimpleImputer(),
pr.FunctionTransformer(lambda x: np.log1p(x)),
StandardScaler(),
),
["area"],
)
]
),
),
(
"regressor",
TransformedTargetRegressor(
regressor=model,
transformer=PowerTransformer(method='box-cox')
),
),
]
)
There are obviously more features but the code will be too long. So I train the model and if I predict in the same script everything is fine. I store the model using dill and then try to use it in another python file.
In this other file I load the model and try this:
import numpy as np
df['prediction'] = self.model.predict(df)
And internally, when it tries to do the transform
it returns:
NameError: name 'np' is not defined
答案1
得分: 2
你可以通过将函数名称作为 func
参数传递来使用第三方库函数:
import numpy
transformer = FunctionTransformer(numpy.log1p)
无需使用lambda函数或自定义包装类。而且,上述解决方案可以在普通的pickle数据格式中进行持久化。
当在不同环境之间移植对象时,最好使用规范的模块名称。因此,应使用 numpy.log1p
而不是 np.log1p
。
英文:
You can use third-party library functions by simply passing the name of the function as a func
argument:
import numpy
transformer = FunctionTransformer(numpy.log1p)
There is no need for lambdas or custom wrapper classes. Also, the above solution is persistable in plain pickle data format.
When porting objects between different environments, then it's probably a good idea to use canonical module names. Hence numpy.log1p
instead of np.log1p
.
答案2
得分: 0
我找到了一种解决方法,尽管可能有更好的方法。
我创建了一个封装了numpy函数的类:
class LogTransformer(pr.FunctionTransformer):
def transform(self, X):
import numpy as np
return np.log1p(X)
然后当我创建管道时:
make_pipeline(
impute.SimpleImputer(),
LogTransformer(),
StandardScaler(),
),
欢迎其他方法。
英文:
I've found a way to fix it, although there might be a better approach.
I create a class encapsulating the numpy function
class LogTransformer(pr.FunctionTransformer):
def transform(self, X):
import numpy as np
return np.log1p(X)
Then when I create the pipeline:
make_pipeline(
impute.SimpleImputer(),
LogTransformer(),
StandardScaler(),
),
Any other approaches are welcomed
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论