SynapseML LightGBM 转为 PMML

huangapple go评论46阅读模式
英文:

SynapseML LightGBM to PMML

问题

根据SynapseML文档,它指出我们可以将一个lgbm模型导出为pmml格式。要安装的包的链接在这里。但是,我无法使用指定的maven路径安装该包。在Databricks中只显示了一个红色的X。所以接下来,我尝试安装

SynapseML LightGBM 转为 PMML

但是我遇到了一个错误。

Transformer class com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel is not supported

是否有更好的方法来解决这个问题?我应该使用与SynapseML版本不同的LGBM吗?

谢谢。

英文:

According to the SynapseML documentation.
It states that we can export a lgbm model to pmml. The link to the package to install is located here. However I am unable to install that package using maven path specified. It just shows a red X in Databricks. So next I tried to install

SynapseML LightGBM 转为 PMML

but I am getting an error.

Transformer class com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel is not supported

Is there a better way to go about this? Should I use a different LGBM other than the SynapseML version?

Thanks

stages = []
for categoricalCol in categoricalColumns:
    indexers = StringIndexer(inputCol = categoricalCol, outputCol = categoricalCol+ '_Index').setHandleInvalid("keep")
    stages += [indexers]
assemblerInputs = [c + "_Index" for c in categoricalColumns] + numericColsFeatures
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]    
lgbm = LightGBMClassifier(objective="binary", featuresCol="features", labelCol="label",learningRate=0.3,numIterations=100,numLeaves=31)
stages += [lgbm]
pipeline = Pipeline(stages = stages)
print('Running model')
pipelineModel = pipeline.fit(df)  

pmmlBuilder = PMMLBuilder(spark.sparkContext, df, pipelineModel)
pmmlBuilder.buildFile("/dbfs/tmp/pmmlModel" + ts.strftime(dateFormat) + "_test.pmml")

答案1

得分: 1

The JPMML-SparkML库现在已经包括了一个专用的 org.jpmml:pmml-sparkml-lightgbm 模块已经有一段时间了。只需将其添加到你的 Apache Spark 包路径中,使用 --packages 选项:

$ $SPARK_HOME/bin/spark-submit --packages "com.microsoft.azure:synapseml-lightgbm_2.12:0.10.2,org.jpmml:pmml-sparkml-lightgbm:2.4.0" myscript.py

当从 PySpark 中访问时,这个模块不需要任何特殊的配置(与“普通”的 Apache Spark 不同)。

JPMML-SparkML库只通过Maven Central仓库进行分发。它没有被推送到诸如Databricks等专有仓库,这可能解释了你看到的“红色X”。

英文:

The JPMML-SparkML library includes a dedicated org.jpmml:pmml-sparkml-lightgbm module for quite some time now. Simply add it to your Apache Spark packagepath using the --packages options:

$ $SPARK_HOME/bin/spark-submit --packages "com.microsoft.azure:synapseml-lightgbm_2.12:0.10.2,org.jpmml:pmml-sparkml-lightgbm:2.4.0" myscript.py

This module does not need any special configuration when being accessed from within PySpark (as opposed to "plain" Apache Spark).

The JPMML-SparkML library is being distributed via Maven Central repository only. It's not being pushed to proprietary repos such as Databricks, which may explain the "red X" that you're seeing.

huangapple
  • 本文由 发表于 2023年5月22日 21:34:34
  • 转载请务必保留本文链接:https://go.coder-hub.com/76306756.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定