java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

huangapple go评论59阅读模式
英文:

Trying to run simple code that writes a dataframe as a csv file using spark and Java. java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

问题

这是我的简单代码:

package org.example;

import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;

import java.util.Arrays;
import java.util.List;

public class Main {
    public static void writeOutput(Dataset<Row> df, String outputPath) {
        df.write()
                .option("header", "true")
                .option("delimiter", "\t")
                .csv(outputPath);
    }
    public static void main(String[] args) {

        // 创建一个 SparkSession
        SparkSession spark = SparkSession.builder()
                .appName("DataFrameWriter")
                .getOrCreate();

        // 创建一个 DataFrame(假设 df 已经定义)
        List<Row> data = Arrays.asList(
                RowFactory.create("John", 25, "New York"),
                RowFactory.create("Alice", 30, "San Francisco"),
                RowFactory.create("Bob", 35, "Chicago")
        );

        StructType schema = DataTypes.createStructType(new StructField[] {
                DataTypes.createStructField("name", DataTypes.StringType, true),
                DataTypes.createStructField("age", DataTypes.IntegerType, true),
                DataTypes.createStructField("city", DataTypes.StringType, true)
        });

        Dataset<Row> df = spark.createDataFrame(data, schema);

        // 指定输出路径
        String outputPath = "src/main/java/output";

        // 调用 writeOutput 方法
        writeOutput(df, outputPath);

        // 停止 SparkSession
        spark.stop();
    }
}

这是我的 build.gradle 文件:

plugins {
    id 'java'
}

group = 'org.example'
version = '1.0-SNAPSHOT'

repositories {
    mavenCentral()
}

dependencies {
    compileOnly 'org.apache.spark:spark-sql_2.12:3.2.0'
    implementation 'org.apache.spark:spark-core_2.12:3.2.0'

    testImplementation platform('org.junit:junit-bom:5.9.1')
    testImplementation 'org.junit.jupiter:junit-jupiter'
}

test {
    useJUnitPlatform()
}

错误信息:

Task :Main.main() FAILED
Error: Unable to initialize main class org.example.Main
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset

FAILURE: Build failed with an exception.

* What went wrong:
Execution failed for task ':Main.main()'.
> Process 'command '/Library/Java/JavaVirtualMachines/jdk-11.0.11.jdk/Contents/Home/bin/java'' finished with non-zero exit value 1

Java 版本:

java version "11.0.19" 2023-04-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.19+9-LTS-224)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.19+9-LTS-224, mixed mode)

Scala 版本:

Scala code runner version 3.3.0 -- Copyright 2002-2023, LAMP/EPFL

Spark 版本:3.4.0
使用 Scala 版本:2.12.17(OpenJDK 64-Bit Server VM,Java 17.0.7)

你能告诉我可能出了什么问题吗?代码非常简单,我只是不能弄清楚要检查什么。我已经尝试重新安装了一切。

英文:

Here is my simple code:

package org.example;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import java.util.Arrays;
import java.util.List;
public class Main {
public static void writeOutput(Dataset&lt;Row&gt; df, String outputPath) {
df.write()
.option(&quot;header&quot;, &quot;true&quot;)
.option(&quot;delimiter&quot;, &quot;\t&quot;)
.csv(outputPath);
}
public static void main(String[] args) {
// Create a SparkSession
SparkSession spark = SparkSession.builder()
.appName(&quot;DataFrameWriter&quot;)
.getOrCreate();
// Create a DataFrame (assuming df is already defined)
List&lt;Row&gt; data = Arrays.asList(
RowFactory.create(&quot;John&quot;, 25, &quot;New York&quot;),
RowFactory.create(&quot;Alice&quot;, 30, &quot;San Francisco&quot;),
RowFactory.create(&quot;Bob&quot;, 35, &quot;Chicago&quot;)
);
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField(&quot;name&quot;, DataTypes.StringType, true),
DataTypes.createStructField(&quot;age&quot;, DataTypes.IntegerType, true),
DataTypes.createStructField(&quot;city&quot;, DataTypes.StringType, true)
});
Dataset&lt;Row&gt; df = spark.createDataFrame(data, schema);
// Specify the output path
String outputPath = &quot;src/main/java/output&quot;;
// Call the writeOutput method
writeOutput(df, outputPath);
// Stop the SparkSession
spark.stop();
}
}

Here is my build.gradle file:

plugins {
id &#39;java&#39;
}
group = &#39;org.example&#39;
version = &#39;1.0-SNAPSHOT&#39;
repositories {
mavenCentral()
}
dependencies {
compileOnly &#39;org.apache.spark:spark-sql_2.12:3.2.0&#39;
implementation &#39;org.apache.spark:spark-core_2.12:3.2.0&#39;
testImplementation platform(&#39;org.junit:junit-bom:5.9.1&#39;)
testImplementation &#39;org.junit.jupiter:junit-jupiter&#39;
}
test {
useJUnitPlatform()
}

And errors:

Task :Main.main() FAILED
Error: Unable to initialize main class org.example.Main
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task &#39;:Main.main()&#39;.
&gt; Process &#39;command &#39;/Library/Java/JavaVirtualMachines/jdk-11.0.11.jdk/Contents/Home/bin/java&#39;&#39; finished with non-zero exit value 1

java -version:

java version &quot;11.0.19&quot; 2023-04-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.19+9-LTS-224)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.19+9-LTS-224, mixed mode)

scala -version:

Scala code runner version 3.3.0 -- Copyright 2002-2023, LAMP/EPFL

Spark: version 3.4.0
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 17.0.7)

Could you tell me what could be wrong? Pretty simple code, just can't figure out what to check. I've already tried reinstalling everything.

答案1

得分: 1

避免在需要在运行时使用的依赖项上使用 compileOnly 指令,如Gradle的Java库插件用户指南(https://docs.gradle.org/current/userguide/java_library_plugin.html)和博客(https://blog.gradle.org/introducing-compile-only-dependencies)中所述。

英文:

Avoid using compileOnly directive for dependencies which implementation will be needed during runtime as stated on Gradle's Java library plugin user guide https://docs.gradle.org/current/userguide/java_library_plugin.html and blog https://blog.gradle.org/introducing-compile-only-dependencies

huangapple
  • 本文由 发表于 2023年6月2日 01:51:24
  • 转载请务必保留本文链接:https://go.coder-hub.com/76384487.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定