英文:
SynapseML LightGBM to PMML
问题
根据SynapseML文档,它指出我们可以将一个lgbm模型导出为pmml格式。要安装的包的链接在这里。但是,我无法使用指定的maven路径安装该包。在Databricks中只显示了一个红色的X。所以接下来,我尝试安装
但是我遇到了一个错误。
Transformer class com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel is not supported
是否有更好的方法来解决这个问题?我应该使用与SynapseML版本不同的LGBM吗?
谢谢。
英文:
According to the SynapseML documentation.
It states that we can export a lgbm model to pmml. The link to the package to install is located here. However I am unable to install that package using maven path specified. It just shows a red X in Databricks. So next I tried to install
but I am getting an error.
Transformer class com.microsoft.azure.synapse.ml.lightgbm.LightGBMClassificationModel is not supported
Is there a better way to go about this? Should I use a different LGBM other than the SynapseML version?
Thanks
stages = []
for categoricalCol in categoricalColumns:
indexers = StringIndexer(inputCol = categoricalCol, outputCol = categoricalCol+ '_Index').setHandleInvalid("keep")
stages += [indexers]
assemblerInputs = [c + "_Index" for c in categoricalColumns] + numericColsFeatures
assembler = VectorAssembler(inputCols=assemblerInputs, outputCol="features")
stages += [assembler]
lgbm = LightGBMClassifier(objective="binary", featuresCol="features", labelCol="label",learningRate=0.3,numIterations=100,numLeaves=31)
stages += [lgbm]
pipeline = Pipeline(stages = stages)
print('Running model')
pipelineModel = pipeline.fit(df)
pmmlBuilder = PMMLBuilder(spark.sparkContext, df, pipelineModel)
pmmlBuilder.buildFile("/dbfs/tmp/pmmlModel" + ts.strftime(dateFormat) + "_test.pmml")
答案1
得分: 1
The JPMML-SparkML库现在已经包括了一个专用的 org.jpmml:pmml-sparkml-lightgbm
模块已经有一段时间了。只需将其添加到你的 Apache Spark 包路径中,使用 --packages
选项:
$ $SPARK_HOME/bin/spark-submit --packages "com.microsoft.azure:synapseml-lightgbm_2.12:0.10.2,org.jpmml:pmml-sparkml-lightgbm:2.4.0" myscript.py
当从 PySpark 中访问时,这个模块不需要任何特殊的配置(与“普通”的 Apache Spark 不同)。
JPMML-SparkML库只通过Maven Central仓库进行分发。它没有被推送到诸如Databricks等专有仓库,这可能解释了你看到的“红色X”。
英文:
The JPMML-SparkML library includes a dedicated org.jpmml:pmml-sparkml-lightgbm
module for quite some time now. Simply add it to your Apache Spark packagepath using the --packages
options:
$ $SPARK_HOME/bin/spark-submit --packages "com.microsoft.azure:synapseml-lightgbm_2.12:0.10.2,org.jpmml:pmml-sparkml-lightgbm:2.4.0" myscript.py
This module does not need any special configuration when being accessed from within PySpark (as opposed to "plain" Apache Spark).
The JPMML-SparkML library is being distributed via Maven Central repository only. It's not being pushed to proprietary repos such as Databricks, which may explain the "red X" that you're seeing.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论