问题

I am trying to use DocumentAssembler for an array of strings. The documentation says: "DocumentAssembler可以读取String列或Array[String]。" But when I do a simple example:

data = spark.createDataFrame([["Spark NLP is an open-source text processing library."]]).toDF("text")
documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
result = documentAssembler.transform(data)

result.select("document").show(truncate=False)

I am getting an error:

AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up cast input from "ARRAY<STRING>" to "STRING".
The type path of the target object is:
- root class: "java.lang.String"
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object.
Maybe I don't understand something?

英文:

I am trying to use DocumentAssembler for array of strings. The documentation says: "The DocumentAssembler can read either a String column or an Array[String])".
But when I do a simple example:

data = spark.createDataFrame([[[&quot;Spark NLP is an open-source text processing library.&quot;]]]).toDF(&quot;text&quot;)
documentAssembler = DocumentAssembler().setInputCol(&quot;text&quot;).setOutputCol(&quot;document&quot;)
result = documentAssembler.transform(data)

result.select(&quot;document&quot;).show(truncate=False)

I am getting an error

AnalysisException: [CANNOT_UP_CAST_DATATYPE] Cannot up cast input from &quot;ARRAY&lt;STRING&gt;&quot; to &quot;STRING&quot;.
The type path of the target object is:
- root class: &quot;java.lang.String&quot;
You can either add an explicit cast to the input data or choose a higher precision type of the field in the target object

Maybe I don't understand something?

答案1

得分: 0

I think you just added an extra [] around the input

This is working:

data = spark.createDataFrame([["Spark NLP is an open-source text processing library."]]).toDF("text")
documentAssembler = DocumentAssembler().setInputCol("text").setOutputCol("document")
result = documentAssembler.transform(data)

result.select("document").show(truncate=False)

+----------------------------------------------------------------------------------------------+
|document                                                                                      |
+----------------------------------------------------------------------------------------------+
|[{document, 0, 51, Spark NLP is an open-source text processing library., {sentence -> 0}, []}]|
+----------------------------------------------------------------------------------------------+

英文:

I think you just added an extra [] around the input

This is working:

data = spark.createDataFrame([[&quot;Spark NLP is an open-source text processing library.&quot;]]).toDF(&quot;text&quot;)
documentAssembler = DocumentAssembler().setInputCol(&quot;text&quot;).setOutputCol(&quot;document&quot;)
result = documentAssembler.transform(data)

result.select(&quot;document&quot;).show(truncate=False)

+----------------------------------------------------------------------------------------------+
|document                                                                                      |
+----------------------------------------------------------------------------------------------+
|[{document, 0, 51, Spark NLP is an open-source text processing library., {sentence -&gt; 0}, []}]|
+----------------------------------------------------------------------------------------------+

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Pyspark 在数组<string> 上使用 DocumentAssembler。

问题

答案1

Airflow BashOperator在任务失败时仍返回退出代码0，应返回退出代码1。

Spark会话值未更新

如何在Databricks中将一个笔记本中的变量/函数访问到另一个笔记本

Pandas按季度和公司统计员工人数

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论