英文:
Apache SparkSQL unable to resolve given input columns that are created within the sqlText
问题
以下是您提供的内容的翻译部分:
我正在尝试运行一个 Spark SQL 语句,试图在执行聚合操作的同时进行简单的分组,然而却报错说在给定的输入列中找不到列 "month",而我在模式中提供了这些列。但是按照教程的步骤,他们能够成功运行给定的代码。
代码:
StructField[] fields = new StructField[]{
new StructField("level", DataTypes.StringType, false, Metadata.empty()),
new StructField("datetime", DataTypes.StringType, false, Metadata.empty())
};
StructType schema = new StructType(fields);
Dataset<Row> dateSet = spark.createDataFrame(inMemory, schema);
dateSet.createOrReplaceTempView("logging_level");
Dataset<Row> results = spark.sql("select level, date_format(datetime, 'MMMM') as month, count(1) as total from logging_level group by level, month");
堆栈跟踪:
在线程 "main" 中的异常 org.apache.spark.sql.AnalysisException: 无法解析 'month',因为给定的输入列:[level, datetime];位于第 1 行位置 107
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)....
英文:
I am trying to run an spark sql statement and trying to do a simple group by while doing an aggregation however; it is complaining that it can't find the column month in the given input columns which I provided in a schema, but following a tutorial they were able to run the given code.
Code:
StructField[] fields = new StructField[]{
new StructField("level", DataTypes.StringType, false, Metadata.empty()),
new StructField("datetime", DataTypes.StringType, false, Metadata.empty())
};
StructType schema = new StructType(fields);
Dataset<Row> dateSet = spark.createDataFrame(inMemory, schema);
dateSet.createOrReplaceTempView("logging_level");
Dataset<Row> results = spark.sql("select level, date_format(datetime, 'MMMM') as month, count(1) as total from logging_level group by level, month");
stack Trace:
Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve '`month`' given input columns: [level, datetime]; line 1 pos 107
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)....
答案1
得分: 1
你不能在select
子句中定义的别名在group by
子句中重复使用。你需要重复表达式:
select level, date_format(datetime, 'MMMM') as month, count(*) as total
from logging_level
group by level, date_format(datetime, 'MMMM')
注意,我用count(1)
替换了count(*)
:这更高效,并且得到相同的结果。
许多数据库支持位置参数。我认为Spark就是其中之一,所以:
select level, date_format(datetime, 'MMMM') as month, count(*) as total
from logging_level
group by 1, 2
英文:
You cannot reuse an alias defined in the select
clause in the group by
clause. You need to repeat the expression:
select level, date_format(datetime, 'MMMM') as month, count(*) as total
from logging_level
group by level, date_format(datetime, 'MMMM')
Note that I replaced count(1)
with count(*)
: it is more efficient, and gives you the same result.
Many databases support positional parameters. I think that Spark is one of them, so:
select level, date_format(datetime, 'MMMM') as month, count(*) as total
from logging_level
group by 1, 2
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论