Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

huangapple go评论82阅读模式
英文:

Running Spark programs from Eclipse IDE - pom.xml is all good but getting run time errors

问题

我从Ubuntu Linux 22.04 LTS的Eclipse IDE执行Spark程序。它使用Oracle Sun JDK 1.8.0_361。

Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

参考了这个帖子

直接从Eclipse IDE执行Spark程序。
在Spark-SQL程序中遇到了问题。
常规Spark程序都正常工作。

这个程序

  1. package org.example;
  2. import org.apache.log4j.Level;
  3. import org.apache.log4j.Logger;
  4. import org.apache.spark.sql.Dataset;
  5. import org.apache.spark.sql.Row;
  6. import org.apache.spark.sql.SparkSession;
  7. import static org.apache.spark.sql.functions.avg;
  8. import static org.apache.spark.sql.functions.col;
  9. import static org.apache.spark.sql.functions.max;
  10. public class HousePriceSolution {
  11. private static final String PRICE = "Price";
  12. private static final String PRICE_SQ_FT = "Price SQ Ft";
  13. public static void main(String[] args) throws Exception {
  14. Logger.getLogger("org").setLevel(Level.ERROR);
  15. SparkSession session = SparkSession.builder().appName("HousePriceSolution").master("local[1]").getOrCreate();
  16. Dataset<Row> realEstate = session.read().option("header", "true").csv("src/main/resources/RealEstate.csv");
  17. Dataset<Row> castedRealEstate = realEstate.withColumn(PRICE, col(PRICE).cast("long"))
  18. .withColumn(PRICE_SQ_FT, col(PRICE_SQ_FT).cast("long"));
  19. castedRealEstate.groupBy("Location")
  20. .agg(avg(PRICE_SQ_FT), max(PRICE))
  21. .orderBy(col("avg(" + PRICE_SQ_FT + ")").desc())
  22. .show();
  23. }
  24. }

这是对应的pom.xml

  1. <project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
  2. <modelVersion>4.0.0</modelVersion>
  3. <groupId>sparkwordcount</groupId>
  4. <artifactId>sparkwordcount</artifactId>
  5. <version>0.0.1-SNAPSHOT</version>
  6. <packaging>jar</packaging>
  7. <name>"Spark Word Count"</name>
  8. <repositories>
  9. <repository>
  10. <id>scala-tools.org</id>
  11. <name>Scala-tools Maven2 Repository</name>
  12. <url>http://scala-tools.org/repo-releases</url>
  13. </repository>
  14. </repositories>
  15. <pluginRepositories>
  16. <pluginRepository>
  17. <id>scala-tools.org</id>
  18. <name>Scala-tools Maven2 Repository</name>
  19. <url>http://scala-tools.org/repo-releases</url>
  20. </pluginRepository>
  21. </pluginRepositories>
  22. <properties>
  23. <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
  24. <project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
  25. </properties>
  26. <build>
  27. <plugins>
  28. <!-- this plugin is for scala code. It uses the version of the Scala library dependency to pick the Scala version -->
  29. <!--
  30. <plugin>
  31. <groupId>net.alchim31.maven</groupId>
  32. <artifactId>scala-maven-plugin</artifactId>
  33. <version>4.4.0</version>
  34. <executions>
  35. <execution>
  36. <goals>
  37. <goal>compile</goal>
  38. </goals>
  39. </execution>
  40. </executions>
  41. </plugin>
  42. -->
  43. <!-- this plugin is for java code. the source and target versions are Java versions -->
  44. <plugin>
  45. <artifactId>maven-compiler-plugin</artifactId>
  46. <version>3.8.1</version>
  47. <configuration>
  48. <source>1.8</source>
  49. <target>1.8</target>
  50. </configuration>
  51. </plugin>
  52. </plugins>
  53. </build>
  54. <dependencies>
  55. <dependency>
  56. <groupId>org.scala-lang</groupId>
  57. <artifactId>scala-library</artifactId>
  58. <version>2.12.17</version>
  59. </dependency>
  60. <dependency>
  61. <groupId>org.apache.spark</groupId>
  62. <artifactId>spark-core_2.13</artifactId>
  63. <version>3.4.0</version>
  64. <scope>provided</scope>
  65. </dependency>
  66. <!-- the following aren't needed for the word count demo, but
  67. will be for more complex things.
  68. -->
  69. <dependency>
  70. <groupId>org.apache.commons</groupId>
  71. <artifactId>commons-text</artifactId>
  72. <version>1.6</version>
  73. </dependency>
  74. <dependency>
  75. <groupId>org.apache.spark</groupId>
  76. <artifactId>spark-sql_2.13</artifactId>
  77. <version>3.4.0</version>
  78. <scope>provided</scope>
  79. </dependency>
  80. <dependency>
  81. <groupId>org.apache.spark</groupId>
  82. <artifactId>spark-streaming_2.13</artifactId>
  83. <version>3.4.0</version>
  84. <scope>provided</scope>
  85. </dependency>
  86. <dependency>
  87. <groupId>org.apache.commons</groupId>
  88. <artifactId>commons-lang3</artifactId>
  89. <version>3.12.0</version>
  90. </dependency>
  91. <!-- the following artifacts are also avaiable, depending upon
  92. what you're doing. They would use the same groupId and version as the ones above:
  93. spark-mllib
  94. spark-hive
  95. spark-catalyst
  96. spark-streaming-kafka
  97. spark-repl
  98. spark-graphx
  99. -->
  100. </dependencies>
  101. </project>

但我在运行时得到以下错误,构建都正常

  1. Exception in thread "main" java.lang.NoClassDefFoundError: scala/$less$colon$less
  2. at org.example.HousePriceSolution.main(HousePriceSolution.java:22)
  3. Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
  4. at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
  5. at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  6. at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
  7. at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  8. ... 1 more

尝试了这个但没有成功

Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

英文:

I am executing Spark programs from Ubuntu Linux 22.04 LTS - Eclipse IDE. It's using Oracle Sun JDK 1.8.0_361.
Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.
Referred this thread

It's directly executing the Spark programs from Eclipse IDE.
Getting issues for Spark-SQL programs
Regular Spark programs are all working fine.

This program

  1. package org.example;
  2. import org.apache.log4j.Level;
  3. import org.apache.log4j.Logger;
  4. import org.apache.spark.sql.Dataset;
  5. import org.apache.spark.sql.Row;
  6. import org.apache.spark.sql.SparkSession;
  7. import static org.apache.spark.sql.functions.avg;
  8. import static org.apache.spark.sql.functions.col;
  9. import static org.apache.spark.sql.functions.max;
  10. public class HousePriceSolution {
  11. private static final String PRICE = &quot;Price&quot;;
  12. private static final String PRICE_SQ_FT = &quot;Price SQ Ft&quot;;
  13. public static void main(String[] args) throws Exception
  14. {
  15. Logger.getLogger(&quot;org&quot;).setLevel(Level.ERROR);
  16. SparkSession session = SparkSession.builder().appName(&quot;HousePriceSolution&quot;).master(&quot;local[1]&quot;).getOrCreate();
  17. Dataset&lt;Row&gt; realEstate = session.read().option(&quot;header&quot;, &quot;true&quot;).csv(&quot;src/main/resources/RealEstate.csv&quot;);
  18. Dataset&lt;Row&gt; castedRealEstate = realEstate.withColumn(PRICE, col(PRICE).cast(&quot;long&quot;))
  19. .withColumn(PRICE_SQ_FT, col(PRICE_SQ_FT).cast(&quot;long&quot;));
  20. castedRealEstate.groupBy(&quot;Location&quot;)
  21. .agg(avg(PRICE_SQ_FT), max(PRICE))
  22. .orderBy(col(&quot;avg(&quot; + PRICE_SQ_FT + &quot;)&quot;).desc())
  23. .show();
  24. }
  25. }

and here is the corresponding pom.xml

  1. &lt;project xmlns=&quot;http://maven.apache.org/POM/4.0.0&quot; xmlns:xsi=&quot;http://www.w3.org/2001/XMLSchema-instance&quot; xsi:schemaLocation=&quot;http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd&quot;&gt;
  2. &lt;modelVersion&gt;4.0.0&lt;/modelVersion&gt;
  3. &lt;groupId&gt;sparkwordcount&lt;/groupId&gt;
  4. &lt;artifactId&gt;sparkwordcount&lt;/artifactId&gt;
  5. &lt;version&gt;0.0.1-SNAPSHOT&lt;/version&gt;
  6. &lt;packaging&gt;jar&lt;/packaging&gt;
  7. &lt;name&gt;&quot;Spark Word Count&quot;&lt;/name&gt;
  8. &lt;repositories&gt;
  9. &lt;repository&gt;
  10. &lt;id&gt;scala-tools.org&lt;/id&gt;
  11. &lt;name&gt;Scala-tools Maven2 Repository&lt;/name&gt;
  12. &lt;url&gt;http://scala-tools.org/repo-releases&lt;/url&gt;
  13. &lt;/repository&gt;
  14. &lt;/repositories&gt;
  15. &lt;pluginRepositories&gt;
  16. &lt;pluginRepository&gt;
  17. &lt;id&gt;scala-tools.org&lt;/id&gt;
  18. &lt;name&gt;Scala-tools Maven2 Repository&lt;/name&gt;
  19. &lt;url&gt;http://scala-tools.org/repo-releases&lt;/url&gt;
  20. &lt;/pluginRepository&gt;
  21. &lt;/pluginRepositories&gt;
  22. &lt;properties&gt;
  23. &lt;project.build.sourceEncoding&gt;UTF-8&lt;/project.build.sourceEncoding&gt;
  24. &lt;project.reporting.outputEncoding&gt;UTF-8&lt;/project.reporting.outputEncoding&gt;
  25. &lt;/properties&gt;
  26. &lt;build&gt;
  27. &lt;plugins&gt;
  28. &lt;!-- this plugin is for scala code. It uses the version of the Scala library dependency to pick the Scala version --&gt;
  29. &lt;!--
  30. &lt;plugin&gt;
  31. &lt;groupId&gt;net.alchim31.maven&lt;/groupId&gt;
  32. &lt;artifactId&gt;scala-maven-plugin&lt;/artifactId&gt;
  33. &lt;version&gt;4.4.0&lt;/version&gt;
  34. &lt;executions&gt;
  35. &lt;execution&gt;
  36. &lt;goals&gt;
  37. &lt;goal&gt;compile&lt;/goal&gt;
  38. &lt;/goals&gt;
  39. &lt;/execution&gt;
  40. &lt;/executions&gt;
  41. &lt;/plugin&gt;
  42. --&gt;
  43. &lt;!-- this plugin is for java code. the source and target versions are Java versions --&gt;
  44. &lt;plugin&gt;
  45. &lt;artifactId&gt;maven-compiler-plugin&lt;/artifactId&gt;
  46. &lt;version&gt;3.8.1&lt;/version&gt;
  47. &lt;configuration&gt;
  48. &lt;source&gt;1.8&lt;/source&gt;
  49. &lt;target&gt;1.8&lt;/target&gt;
  50. &lt;/configuration&gt;
  51. &lt;/plugin&gt;
  52. &lt;/plugins&gt;
  53. &lt;/build&gt;
  54. &lt;dependencies&gt;
  55. &lt;dependency&gt;
  56. &lt;groupId&gt;org.scala-lang&lt;/groupId&gt;
  57. &lt;artifactId&gt;scala-library&lt;/artifactId&gt;
  58. &lt;version&gt;2.12.17&lt;/version&gt;
  59. &lt;/dependency&gt;
  60. &lt;dependency&gt;
  61. &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
  62. &lt;artifactId&gt;spark-core_2.13&lt;/artifactId&gt;
  63. &lt;version&gt;3.4.0&lt;/version&gt;
  64. &lt;scope&gt;provided&lt;/scope&gt;
  65. &lt;/dependency&gt;
  66. &lt;!-- the following aren&#39;t needed for the word count demo, but
  67. will be for more complex things.
  68. --&gt;
  69. &lt;dependency&gt;
  70. &lt;groupId&gt;org.apache.commons&lt;/groupId&gt;
  71. &lt;artifactId&gt;commons-text&lt;/artifactId&gt;
  72. &lt;version&gt;1.6&lt;/version&gt;
  73. &lt;/dependency&gt;
  74. &lt;dependency&gt;
  75. &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
  76. &lt;artifactId&gt;spark-sql_2.13&lt;/artifactId&gt;
  77. &lt;version&gt;3.4.0&lt;/version&gt;
  78. &lt;scope&gt;provided&lt;/scope&gt;
  79. &lt;/dependency&gt;
  80. &lt;dependency&gt;
  81. &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
  82. &lt;artifactId&gt;spark-streaming_2.13&lt;/artifactId&gt;
  83. &lt;version&gt;3.4.0&lt;/version&gt;
  84. &lt;scope&gt;provided&lt;/scope&gt;
  85. &lt;/dependency&gt;
  86. &lt;dependency&gt;
  87. &lt;groupId&gt;org.apache.commons&lt;/groupId&gt;
  88. &lt;artifactId&gt;commons-lang3&lt;/artifactId&gt;
  89. &lt;version&gt;3.12.0&lt;/version&gt;
  90. &lt;/dependency&gt;
  91. &lt;!-- the following artifacts are also avaiable, depending upon
  92. what you&#39;re doing. They would use the same groupId and version as the ones above:
  93. spark-mllib
  94. spark-hive
  95. spark-catalyst
  96. spark-streaming-kafka
  97. spark-repl
  98. spark-graphx
  99. --&gt;
  100. &lt;/dependencies&gt;
  101. &lt;/project&gt;

But I am getting these errors at runtime, build is all good

  1. Exception in thread &quot;main&quot; java.lang.NoClassDefFoundError: scala/$less$colon$less
  2. at org.example.HousePriceSolution.main(HousePriceSolution.java:22)
  3. Caused by: java.lang.ClassNotFoundException: scala.$less$colon$less
  4. at java.net.URLClassLoader.findClass(URLClassLoader.java:387)
  5. at java.lang.ClassLoader.loadClass(ClassLoader.java:418)
  6. at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:355)
  7. at java.lang.ClassLoader.loadClass(ClassLoader.java:351)
  8. ... 1 more

tried this also but no success
Running Spark programs from Eclipse IDE – pom.xml is all good but getting run time errors.

答案1

得分: 1

听起来你的Scala、Spark和KafkaClient之间版本不匹配。你在你的pom.xml文件中有以下依赖:

  1. <!-- scala 2.12.x -->
  2. <dependency>
  3. <groupId>org.scala-lang</groupId>
  4. <artifactId>scala-library</artifactId>
  5. <version>2.12.17</version>
  6. </dependency>
  7. <!-- spark using scala 2.13.x -->
  8. <dependency>
  9. <groupId>org.apache.spark</groupId>
  10. <artifactId>spark-core_2.13</artifactId>
  11. <version>3.4.0</version>
  12. <scope>provided</scope>
  13. </dependency>
  14. <dependency>
  15. <groupId>org.apache.spark</groupId>
  16. <artifactId>spark-sql_2.13</artifactId>
  17. <version>3.4.0</version>
  18. <scope>provided</scope>
  19. </dependency>
  20. <dependency>
  21. <groupId>org.apache.spark</groupId>
  22. <artifactId>spark-streaming_2.13</artifactId>
  23. <version>3.4.0</version>
  24. <scope>provided</scope>
  25. </dependency>

我猜测,修复这个问题将解决你的问题。

英文:

Sounds like you are having a mismatch version between Scala, Spark and KafkaClient. You have the following in your pom.xml

  1. &lt;!-- scala 2.12.x --&gt;
  2. &lt;dependency&gt;
  3. &lt;groupId&gt;org.scala-lang&lt;/groupId&gt;
  4. &lt;artifactId&gt;scala-library&lt;/artifactId&gt;
  5. &lt;version&gt;2.12.17&lt;/version&gt;
  6. &lt;/dependency&gt;
  7. &lt;!-- spark using scala 2.13.x --&gt;
  8. &lt;dependency&gt;
  9. &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
  10. &lt;artifactId&gt;spark-core_2.13&lt;/artifactId&gt;
  11. &lt;version&gt;3.4.0&lt;/version&gt;
  12. &lt;scope&gt;provided&lt;/scope&gt;
  13. &lt;/dependency&gt;
  14. &lt;dependency&gt;
  15. &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
  16. &lt;artifactId&gt;spark-sql_2.13&lt;/artifactId&gt;
  17. &lt;version&gt;3.4.0&lt;/version&gt;
  18. &lt;scope&gt;provided&lt;/scope&gt;
  19. &lt;/dependency&gt;
  20. &lt;dependency&gt;
  21. &lt;groupId&gt;org.apache.spark&lt;/groupId&gt;
  22. &lt;artifactId&gt;spark-streaming_2.13&lt;/artifactId&gt;
  23. &lt;version&gt;3.4.0&lt;/version&gt;
  24. &lt;scope&gt;provided&lt;/scope&gt;
  25. &lt;/dependency&gt;

I guess fixing that, it will solve the problem

huangapple
  • 本文由 发表于 2023年6月2日 07:04:37
  • 转载请务必保留本文链接:https://go.coder-hub.com/76386219.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定