Exception in thread “main” org.apache.spark.sql.catalyst.parser.ParseException

huangapple go评论150阅读模式
英文:

Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException

问题

以下是翻译好的内容:

我有以下的数据框:

  1. dataframe1
  2. +-----------------------+
  3. |ID |
  4. +-----------------------+
  5. |[10,80,60,] |
  6. |[20,40,] |
  7. +-----------------------+

还有另一个数据框:

  1. dataframe2
  2. +------------------+----------------+
  3. |ID_2 | name |
  4. +------------------+----------------+
  5. |40 | XYZZ |
  6. |200 | vbb |
  7. +------------------+----------------+

我想要如下输出:

  1. +------------------+----------------+
  2. |ID_2 | name |
  3. +------------------+----------------+
  4. |40 | XYZZ |
  5. +------------------+----------------+

我使用以下代码从第二个 dataframe 中选择行,其中 ID_2 == ID。

  1. for (java.util.Iterator<Row> iter = dataframe1.toLocalIterator(); iter.hasNext();) {
  2. String item = (iter.next()).get(0).toString();
  3. dataframe2.registerTempTable("data2");
  4. Dataset<Row> res = sparkSession.sql("select * from data2 where ID_2 IN (" + item + ")");
  5. res.show();
  6. }

但是我得到以下异常:

  1. Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
  2. mismatched input 'from' expecting <EOF> (line 1, pos 9)
  3. == SQL ==
  4. select * from data2 where ID_2 IN ([10,80,60,])
  5. ---------^^^
  6. at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
  7. at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
  8. at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  9. at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
  10. at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  11. at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
  12. at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)

如何修复这个问题?

英文:

I have the following dataframe:

  1. dataframe1
  2. +-----------------------+
  3. |ID |
  4. +-----------------------+
  5. |[10,80,60,] |
  6. |[20,40,] |
  7. +-----------------------+

And another dataframe:

  1. dataframe2
  2. +------------------+----------------+
  3. |ID_2 | name |
  4. +------------------+----------------+
  5. |40 | XYZZ |
  6. |200 | vbb |
  7. +------------------+----------------+

I want the following output:

  1. +------------------+----------------+
  2. |ID_2 | name |
  3. +------------------+----------------+
  4. |40 | XYZZ |
  5. +------------------+----------------+

I'm using the following code to select from the second dataframe rows witch ID_2 == ID.

  1. for (java.util.Iterator&lt;Row&gt; iter = dataframe1.toLocalIterator(); iter.hasNext();) {
  2. String item = (iter.next()).get(0).toString();
  3. dataframe2.registerTempTable(&quot;data2&quot;);
  4. Dataset&lt;Row&gt; res = sparkSession.sql(&quot;select * from data2 where ID_2 IN (&quot;+item+&quot;)&quot;);
  5. res.show();
  6. }

But I get the following exception :

  1. Exception in thread &quot;main&quot; org.apache.spark.sql.catalyst.parser.ParseException:
  2. mismatched input &#39;from&#39; expecting &lt;EOF&gt;(line 1, pos 9)
  3. == SQL ==
  4. select * from data2 where ID_2 IN ([10,80,60,])
  5. ---------^^^
  6. at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
  7. at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
  8. at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
  9. at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
  10. at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
  11. at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
  12. at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)

How can I fix this?

答案1

得分: 1

只需使用explode函数。

  1. df1.withColumn("ID", explode($"ID"))
  2. .join(df2, $"ID" === $"ID_2", "inner")
  3. .drop("ID")
  4. .show
  5. +----+----+
  6. |ID_2|name|
  7. +----+----+
  8. | 40|xyzz|
  9. +----+----+
英文:

Simply use the explode function.

  1. df1.withColumn(&quot;ID&quot;, explode($&quot;ID&quot;))
  2. .join(df2, $&quot;ID&quot; === $&quot;ID_2&quot;, &quot;inner&quot;)
  3. .drop(&quot;ID&quot;)
  4. .show
  5. +----+----+
  6. |ID_2|name|
  7. +----+----+
  8. | 40|xyzz|
  9. +----+----+

huangapple
  • 本文由 发表于 2020年8月15日 19:00:02
  • 转载请务必保留本文链接:https://go.coder-hub.com/63425273.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定