Apache SparkSQL无法解析sqlText中创建的给定输入列。

huangapple go评论69阅读模式
英文:

Apache SparkSQL unable to resolve given input columns that are created within the sqlText

问题

以下是您提供的内容的翻译部分:

我正在尝试运行一个 Spark SQL 语句,试图在执行聚合操作的同时进行简单的分组,然而却报错说在给定的输入列中找不到列 "month",而我在模式中提供了这些列。但是按照教程的步骤,他们能够成功运行给定的代码。

代码:

StructField[] fields = new StructField[]{
        new StructField("level", DataTypes.StringType, false, Metadata.empty()),
        new StructField("datetime", DataTypes.StringType, false, Metadata.empty())
};

StructType schema = new StructType(fields);
Dataset<Row> dateSet = spark.createDataFrame(inMemory, schema);
dateSet.createOrReplaceTempView("logging_level");
Dataset<Row> results = spark.sql("select level, date_format(datetime, 'MMMM') as month, count(1) as total from logging_level group by level, month");

堆栈跟踪:

在线程 "main" 中的异常 org.apache.spark.sql.AnalysisException: 无法解析 'month',因为给定的输入列:[level, datetime];位于第 1 行位置 107
	at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
	at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)....
英文:

I am trying to run an spark sql statement and trying to do a simple group by while doing an aggregation however; it is complaining that it can't find the column month in the given input columns which I provided in a schema, but following a tutorial they were able to run the given code.

Code:

    StructField[] fields = new StructField[]{
            new StructField(&quot;level&quot;, DataTypes.StringType, false, Metadata.empty()),
            new StructField(&quot;datetime&quot;, DataTypes.StringType, false, Metadata.empty())
    };

    StructType schema = new StructType(fields);
    Dataset&lt;Row&gt; dateSet = spark.createDataFrame(inMemory, schema);
    dateSet.createOrReplaceTempView(&quot;logging_level&quot;);
    Dataset&lt;Row&gt; results = spark.sql(&quot;select level, date_format(datetime, &#39;MMMM&#39;) as month, count(1) as total from logging_level group by level, month&quot;);

stack Trace:

Exception in thread &quot;main&quot; org.apache.spark.sql.AnalysisException: cannot resolve &#39;`month`&#39; given input columns: [level, datetime]; line 1 pos 107
at org.apache.spark.sql.catalyst.analysis.package$AnalysisErrorAt.failAnalysis(package.scala:42)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:77)
at org.apache.spark.sql.catalyst.analysis.CheckAnalysis$$anonfun$checkAnalysis$1$$anonfun$apply$2.applyOrElse(CheckAnalysis.scala:74)....

答案1

得分: 1

你不能在select子句中定义的别名在group by子句中重复使用。你需要重复表达式:

select level, date_format(datetime, 'MMMM') as month, count(*) as total 
from logging_level 
group by level, date_format(datetime, 'MMMM')

注意,我用count(1)替换了count(*):这更高效,并且得到相同的结果。

许多数据库支持位置参数。我认为Spark就是其中之一,所以:

select level, date_format(datetime, 'MMMM') as month, count(*) as total 
from logging_level 
group by 1, 2
英文:

You cannot reuse an alias defined in the select clause in the group by clause. You need to repeat the expression:

select level, date_format(datetime, &#39;MMMM&#39;) as month, count(*) as total 
from logging_level 
group by level, date_format(datetime, &#39;MMMM&#39;)

Note that I replaced count(1) with count(*): it is more efficient, and gives you the same result.

Many databases support positional parameters. I think that Spark is one of them, so:

select level, date_format(datetime, &#39;MMMM&#39;) as month, count(*) as total 
from logging_level 
group by 1, 2

huangapple
  • 本文由 发表于 2020年8月23日 23:16:03
  • 转载请务必保留本文链接:https://go.coder-hub.com/63548629.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定