2020年8月15日 19:00:02go评论150阅读模式

英文:

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException

问题

以下是翻译好的内容：

我有以下的数据框：

dataframe1
+-----------------------+
|ID                     |
+-----------------------+
|[10,80,60,]            |
|[20,40,]               |
+-----------------------+

还有另一个数据框：

dataframe2
+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
|200               | vbb            |
+------------------+----------------+

我想要如下输出：

+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
+------------------+----------------+

我使用以下代码从第二个 dataframe 中选择行，其中 ID_2 == ID。

for (java.util.Iterator<Row> iter = dataframe1.toLocalIterator(); iter.hasNext();) {
    String item = (iter.next()).get(0).toString();
    dataframe2.registerTempTable("data2");
    Dataset<Row> res = sparkSession.sql("select * from data2 where ID_2 IN (" + item + ")");
    res.show();
}

但是我得到以下异常：

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input 'from' expecting <EOF> (line 1, pos 9)
 == SQL ==
select * from data2 where ID_2 IN ([10,80,60,])
 ---------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)

如何修复这个问题？

英文:

I have the following dataframe:

dataframe1
+-----------------------+
|ID                     |
+-----------------------+
|[10,80,60,]            |
|[20,40,]               |
+-----------------------+

And another dataframe:

dataframe2
+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
|200               | vbb            |
+------------------+----------------+

I want the following output:

+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
+------------------+----------------+

I'm using the following code to select from the second dataframe rows witch ID_2 == ID.

for (java.util.Iterator&lt;Row&gt; iter = dataframe1.toLocalIterator(); iter.hasNext();) {
        String item = (iter.next()).get(0).toString();
        dataframe2.registerTempTable(&quot;data2&quot;);
        Dataset&lt;Row&gt; res = sparkSession.sql(&quot;select * from data2 where ID_2 IN (&quot;+item+&quot;)&quot;);
        res.show();
}

But I get the following exception :

Exception in thread &quot;main&quot; org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input &#39;from&#39; expecting &lt;EOF&gt;(line 1, pos 9)
 == SQL ==
select * from data2 where ID_2 IN ([10,80,60,])
 ---------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)

How can I fix this?

答案1

得分: 1

只需使用explode函数。

df1.withColumn("ID", explode($"ID"))
  .join(df2, $"ID" === $"ID_2", "inner")
  .drop("ID")
  .show
+----+----+
|ID_2|name|
+----+----+
|  40|xyzz|
+----+----+

英文:

Simply use the explode function.

df1.withColumn(&quot;ID&quot;, explode($&quot;ID&quot;))
  .join(df2, $&quot;ID&quot; === $&quot;ID_2&quot;, &quot;inner&quot;)
  .drop(&quot;ID&quot;)
  .show
+----+----+
|ID_2|name|
+----+----+
|  40|xyzz|
+----+----+

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

Exception in thread “main” org.apache.spark.sql.catalyst.parser.ParseException

问题

答案1

Redundancy between @ResponseBody and @PostMapping(path = "/test", consumes =…, produces = MediaType.APPLICATION_JSON_VALUE)?

Maven构建期望错误版本的JAR。

相机意图打开相机应用（仅第一次）

在POJO的属性中使用Java Streams是否是一个好的实践？

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。