2020年9月21日 11:26:38go评论79阅读模式

英文:

spark-submit does not find class (while class is being contained in jar)

问题

我正在构建一个非常简单的HelloWorld Spark作业，使用Java和Gradle：

package com.example;

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello World!");
    }
}

我的Gradle配置非常简单：

def sparkVersion = "2.4.6"
def hadoopVersion = "2.7.3"

dependencies {
    compile "org.apache.spark:spark-core_2.11:$sparkVersion"
    compile "org.apache.spark:spark-sql_2.11:$sparkVersion"
    compile 'org.slf4j:slf4j-simple:1.7.9'
    compile "org.apache.hadoop:hadoop-aws:$hadoopVersion"
    compile "org.apache.hadoop:hadoop-common:$hadoopVersion"
    testCompile group: 'junit', name: 'junit', version: '4.12'
}

我还确保构建了一个包含所有依赖项的远程jar文件，就像Scala中的SBT assembly正在执行的操作一样：

jar {
    zip64 = true
    from {
        configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
    }
}

构建工作正常，我的类出现在jar文件中：

jar tvf build/libs/output.jar | grep -i hello
com/example/HelloWorld.class

然而，在运行spark-submit作业时：

spark-submit --class 'com.example.HelloWorld' --master=local build/libs/output.jar

我得到的只是调试日志：

20/09/21 13:07:46 WARN Utils: Your hostname, example.local resolves to a loopback address: 127.0.0.1; using 192.168.43.208 instead (on interface en0)
20/09/21 13:07:46 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/09/21 13:07:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

我的本地Spark正确报告了Scala 2.11和Spark 2.4.6，构建于Hadoop 2.7.3之上。
我还尝试了一个更复杂的Spark作业，但输出日志是相同的。

然而，在IntelliJ IDEA中运行代码是正常的（选中了"包括具有'Provided'范围的依赖项"选项）。

我是否遗漏了什么？非常感谢。

英文:

I am building a very simple HelloWorld Spark job, in Java with Gradle:

package com.example;

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println(&quot;Hello World!&quot;);
    }
}

My gradle config is very straightforward:

def sparkVersion = &quot;2.4.6&quot;
def hadoopVersion = &quot;2.7.3&quot;

dependencies {
    compile &quot;org.apache.spark:spark-core_2.11:$sparkVersion&quot;
    compile &quot;org.apache.spark:spark-sql_2.11:$sparkVersion&quot;
    compile &#39;org.slf4j:slf4j-simple:1.7.9&#39;
    compile &quot;org.apache.hadoop:hadoop-aws:$hadoopVersion&quot;
    compile &quot;org.apache.hadoop:hadoop-common:$hadoopVersion&quot;
    testCompile group: &#39;junit&#39;, name: &#39;junit&#39;, version: &#39;4.12&#39;
}

I also made sure I build a far jar to include all the dependencies, like SBT assembly is doing in Scala:

jar {
    zip64 = true
    from {
        configurations.runtimeClasspath.collect { it.isDirectory() ? it : zipTree(it) }
    }
}

The build works well and my class appears in the jar:

jar tvf build/libs/output.jar | grep -i hello
com/example/HelloWorld.class

However, when running spark-submit job:

 spark-submit --class &#39;com.example.HelloWorld&#39; --master=local build/libs/output.jar

All I am getting is debug logs:

20/09/21 13:07:46 WARN Utils: Your hostname, example.local resolves to a loopback address: 127.0.0.1; using 192.168.43.208 instead (on interface en0)
20/09/21 13:07:46 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/09/21 13:07:46 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

My local spark is rightfully reporting Scala 2.11 and Spark 2.4.6 built for Hadoop 2.7.3.
I also tested with a more complexe Spark job but the output logs are the same.

The code is however running well in IntelliJ Idea (with option Include dependencies with "Provided" scope ticked).

Am I missing something? Thank you very much

答案1

得分: 0

问题可能来自于zip64 = true，也可能来自于fat jar的生成（尽管shadowJar插件也没有解决这个问题）。

我决定改用Maven，并使用maven-assembly-plugin来生成fat jar，使用maven-compiler-plugin仅包含与我想要构建的Spark作业相关的某些文件，最后使用maven-jar-plugin来避免构建一个包含所有Spark作业的jar（每个jar对应一个作业）。

英文:

The problem could have come from zip64 = true or from the fat jar generation (although the shadowJar plugin did not fix this either).

I decided to go with Maven instead and use the maven-assembly-plugin for the fat jar generation, the maven-compiler-plugin to only include certain files related to the Spark job I want to build and finally maven-jar-plugin to avoid building a jar containing all the spark jobs (1 job per jar).

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

spark-submit找不到类（虽然类包含在jar中）

问题

答案1

Why JVM is giving error "incompatible type: String cannot be converted to char" and how to fix it without using other method?

当我在我的应用程序中添加一张图片时，它不会显示出来。

如何比较两个字符？

根据Maven/Gradle的范围如何更改行为？

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论