英文:
Trying to run simple code that writes a dataframe as a csv file using spark and Java. java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset
问题
这是我的简单代码:
package org.example;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import java.util.Arrays;
import java.util.List;
public class Main {
public static void writeOutput(Dataset<Row> df, String outputPath) {
df.write()
.option("header", "true")
.option("delimiter", "\t")
.csv(outputPath);
}
public static void main(String[] args) {
// 创建一个 SparkSession
SparkSession spark = SparkSession.builder()
.appName("DataFrameWriter")
.getOrCreate();
// 创建一个 DataFrame(假设 df 已经定义)
List<Row> data = Arrays.asList(
RowFactory.create("John", 25, "New York"),
RowFactory.create("Alice", 30, "San Francisco"),
RowFactory.create("Bob", 35, "Chicago")
);
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField("name", DataTypes.StringType, true),
DataTypes.createStructField("age", DataTypes.IntegerType, true),
DataTypes.createStructField("city", DataTypes.StringType, true)
});
Dataset<Row> df = spark.createDataFrame(data, schema);
// 指定输出路径
String outputPath = "src/main/java/output";
// 调用 writeOutput 方法
writeOutput(df, outputPath);
// 停止 SparkSession
spark.stop();
}
}
这是我的 build.gradle 文件:
plugins {
id 'java'
}
group = 'org.example'
version = '1.0-SNAPSHOT'
repositories {
mavenCentral()
}
dependencies {
compileOnly 'org.apache.spark:spark-sql_2.12:3.2.0'
implementation 'org.apache.spark:spark-core_2.12:3.2.0'
testImplementation platform('org.junit:junit-bom:5.9.1')
testImplementation 'org.junit.jupiter:junit-jupiter'
}
test {
useJUnitPlatform()
}
错误信息:
Task :Main.main() FAILED
Error: Unable to initialize main class org.example.Main
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':Main.main()'.
> Process 'command '/Library/Java/JavaVirtualMachines/jdk-11.0.11.jdk/Contents/Home/bin/java'' finished with non-zero exit value 1
Java 版本:
java version "11.0.19" 2023-04-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.19+9-LTS-224)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.19+9-LTS-224, mixed mode)
Scala 版本:
Scala code runner version 3.3.0 -- Copyright 2002-2023, LAMP/EPFL
Spark 版本:3.4.0
使用 Scala 版本:2.12.17(OpenJDK 64-Bit Server VM,Java 17.0.7)
你能告诉我可能出了什么问题吗?代码非常简单,我只是不能弄清楚要检查什么。我已经尝试重新安装了一切。
英文:
Here is my simple code:
package org.example;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.RowFactory;
import org.apache.spark.sql.SparkSession;
import org.apache.spark.sql.types.DataTypes;
import org.apache.spark.sql.types.StructField;
import org.apache.spark.sql.types.StructType;
import java.util.Arrays;
import java.util.List;
public class Main {
public static void writeOutput(Dataset<Row> df, String outputPath) {
df.write()
.option("header", "true")
.option("delimiter", "\t")
.csv(outputPath);
}
public static void main(String[] args) {
// Create a SparkSession
SparkSession spark = SparkSession.builder()
.appName("DataFrameWriter")
.getOrCreate();
// Create a DataFrame (assuming df is already defined)
List<Row> data = Arrays.asList(
RowFactory.create("John", 25, "New York"),
RowFactory.create("Alice", 30, "San Francisco"),
RowFactory.create("Bob", 35, "Chicago")
);
StructType schema = DataTypes.createStructType(new StructField[] {
DataTypes.createStructField("name", DataTypes.StringType, true),
DataTypes.createStructField("age", DataTypes.IntegerType, true),
DataTypes.createStructField("city", DataTypes.StringType, true)
});
Dataset<Row> df = spark.createDataFrame(data, schema);
// Specify the output path
String outputPath = "src/main/java/output";
// Call the writeOutput method
writeOutput(df, outputPath);
// Stop the SparkSession
spark.stop();
}
}
Here is my build.gradle file:
plugins {
id 'java'
}
group = 'org.example'
version = '1.0-SNAPSHOT'
repositories {
mavenCentral()
}
dependencies {
compileOnly 'org.apache.spark:spark-sql_2.12:3.2.0'
implementation 'org.apache.spark:spark-core_2.12:3.2.0'
testImplementation platform('org.junit:junit-bom:5.9.1')
testImplementation 'org.junit.jupiter:junit-jupiter'
}
test {
useJUnitPlatform()
}
And errors:
Task :Main.main() FAILED
Error: Unable to initialize main class org.example.Main
Caused by: java.lang.NoClassDefFoundError: org/apache/spark/sql/Dataset
FAILURE: Build failed with an exception.
* What went wrong:
Execution failed for task ':Main.main()'.
> Process 'command '/Library/Java/JavaVirtualMachines/jdk-11.0.11.jdk/Contents/Home/bin/java'' finished with non-zero exit value 1
java -version:
java version "11.0.19" 2023-04-18 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.19+9-LTS-224)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.19+9-LTS-224, mixed mode)
scala -version:
Scala code runner version 3.3.0 -- Copyright 2002-2023, LAMP/EPFL
Spark: version 3.4.0
Using Scala version 2.12.17 (OpenJDK 64-Bit Server VM, Java 17.0.7)
Could you tell me what could be wrong? Pretty simple code, just can't figure out what to check. I've already tried reinstalling everything.
答案1
得分: 1
避免在需要在运行时使用的依赖项上使用 compileOnly
指令,如Gradle的Java库插件用户指南(https://docs.gradle.org/current/userguide/java_library_plugin.html)和博客(https://blog.gradle.org/introducing-compile-only-dependencies)中所述。
英文:
Avoid using compileOnly
directive for dependencies which implementation will be needed during runtime as stated on Gradle's Java library plugin user guide https://docs.gradle.org/current/userguide/java_library_plugin.html and blog https://blog.gradle.org/introducing-compile-only-dependencies
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论