英文:
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException
问题
以下是翻译好的内容:
我有以下的数据框:
dataframe1
+-----------------------+
|ID |
+-----------------------+
|[10,80,60,] |
|[20,40,] |
+-----------------------+
还有另一个数据框:
dataframe2
+------------------+----------------+
|ID_2 | name |
+------------------+----------------+
|40 | XYZZ |
|200 | vbb |
+------------------+----------------+
我想要如下输出:
+------------------+----------------+
|ID_2 | name |
+------------------+----------------+
|40 | XYZZ |
+------------------+----------------+
我使用以下代码从第二个 dataframe
中选择行,其中 ID_2 == ID。
for (java.util.Iterator<Row> iter = dataframe1.toLocalIterator(); iter.hasNext();) {
String item = (iter.next()).get(0).toString();
dataframe2.registerTempTable("data2");
Dataset<Row> res = sparkSession.sql("select * from data2 where ID_2 IN (" + item + ")");
res.show();
}
但是我得到以下异常:
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'from' expecting <EOF> (line 1, pos 9)
== SQL ==
select * from data2 where ID_2 IN ([10,80,60,])
---------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)
如何修复这个问题?
英文:
I have the following dataframe:
dataframe1
+-----------------------+
|ID |
+-----------------------+
|[10,80,60,] |
|[20,40,] |
+-----------------------+
And another dataframe:
dataframe2
+------------------+----------------+
|ID_2 | name |
+------------------+----------------+
|40 | XYZZ |
|200 | vbb |
+------------------+----------------+
I want the following output:
+------------------+----------------+
|ID_2 | name |
+------------------+----------------+
|40 | XYZZ |
+------------------+----------------+
I'm using the following code to select from the second dataframe
rows witch ID_2 == ID.
for (java.util.Iterator<Row> iter = dataframe1.toLocalIterator(); iter.hasNext();) {
String item = (iter.next()).get(0).toString();
dataframe2.registerTempTable("data2");
Dataset<Row> res = sparkSession.sql("select * from data2 where ID_2 IN ("+item+")");
res.show();
}
But I get the following exception :
Exception in thread "main" org.apache.spark.sql.catalyst.parser.ParseException:
mismatched input 'from' expecting <EOF>(line 1, pos 9)
== SQL ==
select * from data2 where ID_2 IN ([10,80,60,])
---------^^^
at org.apache.spark.sql.catalyst.parser.ParseException.withCommand(ParseDriver.scala:241)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parse(ParseDriver.scala:117)
at org.apache.spark.sql.execution.SparkSqlParser.parse(SparkSqlParser.scala:48)
at org.apache.spark.sql.catalyst.parser.AbstractSqlParser.parsePlan(ParseDriver.scala:69)
at org.apache.spark.sql.SparkSession.sql(SparkSession.scala:642)
at factory.Geofencing_Alert.check(Geofencing_Alert.java:84)
at factory.Geofencing_Alert.main(Geofencing_Alert.java:158)
How can I fix this?
答案1
得分: 1
只需使用explode
函数。
df1.withColumn("ID", explode($"ID"))
.join(df2, $"ID" === $"ID_2", "inner")
.drop("ID")
.show
+----+----+
|ID_2|name|
+----+----+
| 40|xyzz|
+----+----+
英文:
Simply use the explode
function.
df1.withColumn("ID", explode($"ID"))
.join(df2, $"ID" === $"ID_2", "inner")
.drop("ID")
.show
+----+----+
|ID_2|name|
+----+----+
| 40|xyzz|
+----+----+
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论