Exception in thread “main” org.apache.spark.sql.catalyst.parser.ParseException

huangapple go评论114阅读模式
英文:

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException

问题

以下是翻译好的内容:

我有以下的数据框:

dataframe1
+-----------------------+
|ID                     |
+-----------------------+
|[10,80,60,]            |
|[20,40,]               |
+-----------------------+

还有另一个数据框:

dataframe2
+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
|200               | vbb            |
+------------------+----------------+

我想要如下输出:

+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
+------------------+----------------+

我使用以下代码从第二个 dataframe 中选择行,其中 ID_2 == ID。

for (java.util.Iterator<Row> iter = dataframe1.toLocalIterator(); iter.hasNext();) {
    String item = (iter.next()).get(0).toString();
    dataframe2.registerTempTable("data2");
    Dataset<Row> res = sparkSession.sql("select * from data2 where ID_2 IN (" + item + ")");
    res.show();
}

但是我得到以下异常:

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input 'from' expecting <EOF> (line 1, pos 9)

 == SQL ==
select * from data2 where ID_2 IN ([10,80,60,])
 ---------^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)

如何修复这个问题?

英文:

I have the following dataframe:

dataframe1
+-----------------------+
|ID                     |
+-----------------------+
|[10,80,60,]            |
|[20,40,]               |
+-----------------------+

And another dataframe:

dataframe2
+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
|200               | vbb            |
+------------------+----------------+

I want the following output:

+------------------+----------------+
|ID_2              |   name         |
+------------------+----------------+
|40                | XYZZ           |
+------------------+----------------+

I'm using the following code to select from the second dataframe rows witch ID_2 == ID.

for (java.util.Iterator&lt;Row&gt; iter = dataframe1.toLocalIterator(); iter.hasNext();) {
        String item = (iter.next()).get(0).toString();
        dataframe2.registerTempTable(&quot;data2&quot;);
        Dataset&lt;Row&gt; res = sparkSession.sql(&quot;select * from data2 where ID_2 IN (&quot;+item+&quot;)&quot;);
        res.show();
}

But I get the following exception :

Exception in thread &quot;main&quot; org.apache.spark.sql.catalyst.parser.ParseException: 
mismatched input &#39;from&#39; expecting &lt;EOF&gt;(line 1, pos 9)

 == SQL ==
select * from data2 where ID_2 IN ([10,80,60,])
 ---------^^^

at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)

How can I fix this?

答案1

得分: 1

只需使用explode函数。

df1.withColumn("ID", explode($"ID"))
  .join(df2, $"ID" === $"ID_2", "inner")
  .drop("ID")
  .show

+----+----+
|ID_2|name|
+----+----+
|  40|xyzz|
+----+----+
英文:

Simply use the explode function.

df1.withColumn(&quot;ID&quot;, explode($&quot;ID&quot;))
  .join(df2, $&quot;ID&quot; === $&quot;ID_2&quot;, &quot;inner&quot;)
  .drop(&quot;ID&quot;)
  .show

+----+----+
|ID_2|name|
+----+----+
|  40|xyzz|
+----+----+

huangapple
  • 本文由 发表于 2020年8月15日 19:00:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/63425273.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定