Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

huangapple go评论67阅读模式
英文:

Running Spark programs from Eclipse IDE - pom.xml is all good but getting run time errors

问题

我从Ubuntu Linux 22.04 LTS的Eclipse IDE执行Spark程序。它使用Oracle Sun JDK 1.8.0_361。

Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

参考了这个帖子

直接从Eclipse IDE执行Spark程序。
在Spark-SQL程序中遇到了问题。
常规Spark程序都正常工作。

这个程序

package org.example;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;

import static org.apache.spark.sql.functions.avg;
import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.max;

public class HousePriceSolution {

    private static final String PRICE = "Price";
    private static final String PRICE_SQ_FT = "Price SQ Ft";

    public static void main(String[] args) throws Exception {

        Logger.getLogger("org").setLevel(Level.ERROR);
        SparkSession session = SparkSession.builder().appName("HousePriceSolution").master("local[1]").getOrCreate();

        Dataset<Row> realEstate = session.read().option("header", "true").csv("src/main/resources/RealEstate.csv");

        Dataset<Row> castedRealEstate = realEstate.withColumn(PRICE, col(PRICE).cast("long"))
                                                  .withColumn(PRICE_SQ_FT, col(PRICE_SQ_FT).cast("long"));

        castedRealEstate.groupBy("Location")
                        .agg(avg(PRICE_SQ_FT), max(PRICE))
                        .orderBy(col("avg(" + PRICE_SQ_FT + ")").desc())
                        .show();
    }
}

这是对应的pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  <modelVersion>4.0.0</modelVersion>
  <groupId>sparkwordcount</groupId>
  <artifactId>sparkwordcount</artifactId>
  <version>0.0.1-SNAPSHOT</version>
  <packaging>jar</packaging>
  <name>"Spark Word Count"</name>
  
  <repositories>
    <repository>
      <id>scala-tools.org</id>
      <name>Scala-tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </repository>
  </repositories>

  <pluginRepositories>
    <pluginRepository>
      <id>scala-tools.org</id>
      <name>Scala-tools Maven2 Repository</name>
      <url>http://scala-tools.org/repo-releases</url>
    </pluginRepository>
  </pluginRepositories>

  <properties>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
    <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
  </properties>

  <build>
    <plugins>
<!-- this plugin is for scala code. It uses the version of the Scala library dependency to pick the Scala version -->
<!--
      <plugin>
        <groupId>net.alchim31.maven</groupId>
        <artifactId>scala-maven-plugin</artifactId>
        <version>4.4.0</version>
        <executions>
          <execution>
            <goals>
              <goal>compile</goal>
            </goals>
          </execution>
        </executions>
      </plugin>
      -->

<!-- this plugin is for java code. the source and target versions are Java versions -->
      <plugin>
        <artifactId>maven-compiler-plugin</artifactId>
        <version>3.8.1</version>
        <configuration>
          <source>1.8</source>
          <target>1.8</target>
        </configuration>
      </plugin>
    </plugins>  
  </build>

  <dependencies>
     
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.12.17</version>
    </dependency>
   
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-core_2.13</artifactId>
      <version>3.4.0</version>
      <scope>provided</scope>
    </dependency>
    <!-- the following aren't needed for the word count demo, but
     will be for more complex things.
    -->
    <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-text</artifactId>
      <version>1.6</version>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-sql_2.13</artifactId>
      <version>3.4.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.13</artifactId>
      <version>3.4.0</version>
      <scope>provided</scope>
    </dependency>
    
    <dependency>
      <groupId>org.apache.commons</groupId>
      <artifactId>commons-lang3</artifactId>
      <version>3.12.0</version>
    </dependency>
    <!-- the following artifacts are also avaiable, depending upon
         what you're doing. They would use  the same groupId and version as the ones above:
     spark-mllib
     spark-hive
     spark-catalyst
     spark-streaming-kafka
     spark-repl
     spark-graphx
    -->
  </dependencies>
</project>

但我在运行时得到以下错误,构建都正常

Exception in thread "main" java.lang.NoClassDefFoundError: scala/$less$colon$less
	at org.example.HousePriceSolution.main(HousePriceSolution.java:22)
Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
	at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
	at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
	... 1 more

尝试了这个但没有成功

Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

英文:

I am executing Spark programs from Ubuntu Linux 22.04 LTS - Eclipse IDE. It's using Oracle Sun JDK 1.8.0_361.
Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.
Referred this thread

It's directly executing the Spark programs from Eclipse IDE.
Getting issues for Spark-SQL programs
Regular Spark programs are all working fine.

This program

package org.example;
import org.apache.log4j.Level;
import org.apache.log4j.Logger;
import org.apache.spark.sql.Dataset;
import org.apache.spark.sql.Row;
import org.apache.spark.sql.SparkSession;
import static org.apache.spark.sql.functions.avg;
import static org.apache.spark.sql.functions.col;
import static org.apache.spark.sql.functions.max;
public class HousePriceSolution {
private static final String PRICE = &quot;Price&quot;;
private static final String PRICE_SQ_FT = &quot;Price SQ Ft&quot;;
public static void main(String[] args) throws Exception 
{
Logger.getLogger(&quot;org&quot;).setLevel(Level.ERROR);
SparkSession session = SparkSession.builder().appName(&quot;HousePriceSolution&quot;).master(&quot;local[1]&quot;).getOrCreate();
Dataset&lt;Row&gt; realEstate = session.read().option(&quot;header&quot;, &quot;true&quot;).csv(&quot;src/main/resources/RealEstate.csv&quot;);
Dataset&lt;Row&gt; castedRealEstate = realEstate.withColumn(PRICE, col(PRICE).cast(&quot;long&quot;))
.withColumn(PRICE_SQ_FT, col(PRICE_SQ_FT).cast(&quot;long&quot;));
castedRealEstate.groupBy(&quot;Location&quot;)
.agg(avg(PRICE_SQ_FT), max(PRICE))
.orderBy(col(&quot;avg(&quot; + PRICE_SQ_FT + &quot;)&quot;).desc())
.show();
}
}

and here is the corresponding pom.xml

&lt;project xmlns=&quot;http://maven.apache.org/POM/4.0.0&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd&quot;&gt;
&lt;modelVersion&gt;4.0.0&lt;/modelVersion&gt;
&lt;groupId&gt;sparkwordcount&lt;/groupId&gt;
&lt;artifactId&gt;sparkwordcount&lt;/artifactId&gt;
&lt;version&gt;0.0.1-SNAPSHOT&lt;/version&gt;
&lt;packaging&gt;jar&lt;/packaging&gt;
&lt;name&gt;&quot;Spark Word Count&quot;&lt;/name&gt;
&lt;repositories&gt;
&lt;repository&gt;
&lt;id&gt;scala-tools.org&lt;/id&gt;
&lt;name&gt;Scala-tools Maven2 Repository&lt;/name&gt;
&lt;url&gt;http://scala-tools.org/repo-releases&lt;/url&gt;
&lt;/repository&gt;
&lt;/repositories&gt;
&lt;pluginRepositories&gt;
&lt;pluginRepository&gt;
&lt;id&gt;scala-tools.org&lt;/id&gt;
&lt;name&gt;Scala-tools Maven2 Repository&lt;/name&gt;
&lt;url&gt;http://scala-tools.org/repo-releases&lt;/url&gt;
&lt;/pluginRepository&gt;
&lt;/pluginRepositories&gt;
&lt;properties&gt;
&lt;project.build.sourceEncoding&gt;UTF-8&lt;/project.build.sourceEncoding&gt;
&lt;project.reporting.outputEncoding&gt;UTF-8&lt;/project.reporting.outputEncoding&gt;
&lt;/properties&gt;
&lt;build&gt;
&lt;plugins&gt;
&lt;!-- this plugin is for scala code. It uses the version of the Scala library dependency to pick the Scala version --&gt;
&lt;!--
&lt;plugin&gt;
&lt;groupId&gt;net.alchim31.maven&lt;/groupId&gt;
&lt;artifactId&gt;scala-maven-plugin&lt;/artifactId&gt;
&lt;version&gt;4.4.0&lt;/version&gt;
&lt;executions&gt;
&lt;execution&gt;
&lt;goals&gt;
&lt;goal&gt;compile&lt;/goal&gt;
&lt;/goals&gt;
&lt;/execution&gt;
&lt;/executions&gt;
&lt;/plugin&gt;
--&gt;
&lt;!-- this plugin is for java code. the source and target versions are Java versions --&gt;
&lt;plugin&gt;
&lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;
&lt;version&gt;3.8.1&lt;/version&gt;
&lt;configuration&gt;
&lt;source&gt;1.8&lt;/source&gt;
&lt;target&gt;1.8&lt;/target&gt;
&lt;/configuration&gt;
&lt;/plugin&gt;
&lt;/plugins&gt;  
&lt;/build&gt;
&lt;dependencies&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.scala-lang&lt;/groupId&gt;
&lt;artifactId&gt;scala-library&lt;/artifactId&gt;
&lt;version&gt;2.12.17&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
&lt;artifactId&gt;spark-core_2.13&lt;/artifactId&gt;
&lt;version&gt;3.4.0&lt;/version&gt;
&lt;scope&gt;provided&lt;/scope&gt;
&lt;/dependency&gt;
&lt;!-- the following aren&#39;t needed for the word count demo, but
will be for more complex things.
--&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.commons&lt;/groupId&gt;
&lt;artifactId&gt;commons-text&lt;/artifactId&gt;
&lt;version&gt;1.6&lt;/version&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
&lt;artifactId&gt;spark-sql_2.13&lt;/artifactId&gt;
&lt;version&gt;3.4.0&lt;/version&gt;
&lt;scope&gt;provided&lt;/scope&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
&lt;artifactId&gt;spark-streaming_2.13&lt;/artifactId&gt;
&lt;version&gt;3.4.0&lt;/version&gt;
&lt;scope&gt;provided&lt;/scope&gt;
&lt;/dependency&gt;
&lt;dependency&gt;
&lt;groupId&gt;org.apache.commons&lt;/groupId&gt;
&lt;artifactId&gt;commons-lang3&lt;/artifactId&gt;
&lt;version&gt;3.12.0&lt;/version&gt;
&lt;/dependency&gt;
&lt;!-- the following artifacts are also avaiable, depending upon
what you&#39;re doing. They would use  the same groupId and version as the ones above:
spark-mllib
spark-hive
spark-catalyst
spark-streaming-kafka
spark-repl
spark-graphx
--&gt;
&lt;/dependencies&gt;
&lt;/project&gt;

But I am getting these errors at runtime, build is all good

    Exception in thread &quot;main&quot; java.lang.NoClassDefFoundError: scala/$less$colon$less
at org.example.HousePriceSolution.main(HousePriceSolution.java:22)
Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
... 1 more

tried this also but no success
Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

答案1

得分: 1

听起来你的Scala、Spark和KafkaClient之间版本不匹配。你在你的pom.xml文件中有以下依赖:

    <!-- scala 2.12.x -->
    <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.12.17</version>
    </dependency>

    <!-- spark using scala 2.13.x -->
    <dependency>
      <groupId>org.apache.spark</groupId>
        <artifactId>spark-core_2.13</artifactId>
        <version>3.4.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
    <artifactId>spark-sql_2.13</artifactId>
    <version>3.4.0</version>
      <scope>provided</scope>
    </dependency>
    <dependency>
      <groupId>org.apache.spark</groupId>
      <artifactId>spark-streaming_2.13</artifactId>
    <version>3.4.0</version>
      <scope>provided</scope>
    </dependency>

我猜测,修复这个问题将解决你的问题。

英文:

Sounds like you are having a mismatch version between Scala, Spark and KafkaClient. You have the following in your pom.xml

    &lt;!-- scala 2.12.x --&gt;
    &lt;dependency&gt;
      &lt;groupId&gt;org.scala-lang&lt;/groupId&gt;
      &lt;artifactId&gt;scala-library&lt;/artifactId&gt;
      &lt;version&gt;2.12.17&lt;/version&gt;
    &lt;/dependency&gt;

    &lt;!-- spark using scala 2.13.x --&gt;
    &lt;dependency&gt;
      &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
        &lt;artifactId&gt;spark-core_2.13&lt;/artifactId&gt;
        &lt;version&gt;3.4.0&lt;/version&gt;
      &lt;scope&gt;provided&lt;/scope&gt;
    &lt;/dependency&gt;
    &lt;dependency&gt;
      &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
    &lt;artifactId&gt;spark-sql_2.13&lt;/artifactId&gt;
    &lt;version&gt;3.4.0&lt;/version&gt;
      &lt;scope&gt;provided&lt;/scope&gt;
    &lt;/dependency&gt;
    &lt;dependency&gt;
      &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
      &lt;artifactId&gt;spark-streaming_2.13&lt;/artifactId&gt;
    &lt;version&gt;3.4.0&lt;/version&gt;
      &lt;scope&gt;provided&lt;/scope&gt;
    &lt;/dependency&gt;

I guess fixing that, it will solve the problem

huangapple
  • 本文由 发表于 2023年6月2日 07:04:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76386219.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定