2020年4月6日 05:37:03go评论159阅读模式

英文:

String Index Out Of Bounds Exception When Initializing Spark Context

问题

我已经使用Spark超过5年了。最近，我遇到了一个基本错误，以前从未见过，它阻止了开发。当我进行常规调用以创建Spark上下下文时，我会收到一个由StringIndexOutOfBoundsException引起的ExceptionInInitializerError。以下是我代码的简单示例：

public class SparkTest {
    public static final SparkConf SPARK_CONFIGURATION = new SparkConf().setAppName("MOSDEX").setMaster("local[*]");
    public static final JavaSparkContext SPARK_CONTEXT= new JavaSparkContext(SPARK_CONFIGURATION);
    public static final SparkSession SPARK_SESSION= SparkSession.builder()
        .config(SPARK_CONFIGURATION)
        .getOrCreate();

    public static void main(String[] args) {
        setupTest();        
    }
    
    public static void setupTest() {
        List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
        JavaRDD<Integer> distData = SPARK_CONTEXT.parallelize(data);
        int sum= distData.reduce((a, b) -> a + b);
        System.out.println("Sum of " + data.toString() + " = " + sum);
        System.out.println();
    }//SetupTest
    
    public SparkTest() {
        super();
    }

}//class SparkTest

以下是错误信息链：

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/C:/Users/Owner/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.4.5/spark-unsafe_2.11-2.4.5.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/05 13:55:21 INFO SparkContext: Running Spark version 2.4.5
20/04/05 13:55:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread "main" java.lang.ExceptionInInitializerError
    at org.apache.hadoop.util.StringUtils.<clinit>(StringUtils.java:79)
    at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:93)
    at org.apache.hadoop.security.Groups.<init>(Groups.java:73)
    at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
    at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
    at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
    at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
    at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
    at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
    at org.apache.spark.SparkContext.<init>(SparkContext.scala:293)
    at io.github.JeremyBloom.mosdex.SparkTest.<clinit>(SparkTest.java:28)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
    at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3720)
    at java.base/java.lang.String.substring(String.java:1909)
    at org.apache.hadoop.util.Shell.<clinit>(Shell.java:50)
    ... 16 more

当我使用SparkContext而不是JavaSparkContext时，我也会得到相同的错误。我对这个错误进行了广泛的搜索，没有看到其他人遇到过这个问题，所以我认为这不是Spark的一个错误。我以前在其他应用程序中使用过这段代码（使用早期版本的Spark），没有出过问题。

我正在使用最新版本的Spark（2.4.5）。为什么这不起作用呢？

英文:

I've been working with Spark for more than 5 years. Recently, I encountered a basic error I have never seen before, and it has stopped development cold. When I do a routine call to create a Spark Context, I get an ExceptionInInitializerError caused by a StringIndexOutOfBoundsException. Here is a simple sample of my code:

public class SparkTest {
	public static final SparkConf SPARK_CONFIGURATION = new SparkConf().setAppName(&quot;MOSDEX&quot;).setMaster(&quot;local[*]&quot;);
	public static final JavaSparkContext SPARK_CONTEXT= new JavaSparkContext(SPARK_CONFIGURATION);
	public static final SparkSession SPARK_SESSION= SparkSession.builder()
			.config(SPARK_CONFIGURATION)
			.getOrCreate();

	public static void main(String[] args) {
		setupTest();		
	}
	
	public static void setupTest() {
		List&lt;Integer&gt; data = Arrays.asList(1, 2, 3, 4, 5);
		JavaRDD&lt;Integer&gt; distData = SPARK_CONTEXT.parallelize(data);
		int sum= distData.reduce((a, b) -&gt; a + b);
		System.out.println(&quot;Sum of &quot; + data.toString() + &quot; = &quot; + sum);
		System.out.println();
	}//SetupTest
	
	public SparkTest() {
		super();
	}

}//class SparkTest

Here is the error message chain:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.spark.unsafe.Platform (file:/C:/Users/Owner/.m2/repository/org/apache/spark/spark-unsafe_2.11/2.4.5/spark-unsafe_2.11-2.4.5.jar) to method java.nio.Bits.unaligned()
WARNING: Please consider reporting this to the maintainers of org.apache.spark.unsafe.Platform
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
Using Spark&#39;s default log4j profile: org/apache/spark/log4j-defaults.properties
20/04/05 13:55:21 INFO SparkContext: Running Spark version 2.4.5
20/04/05 13:55:22 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Exception in thread &quot;main&quot; java.lang.ExceptionInInitializerError
	at org.apache.hadoop.util.StringUtils.&lt;clinit&gt;(StringUtils.java:79)
	at org.apache.hadoop.security.Groups.parseStaticMapping(Groups.java:116)
	at org.apache.hadoop.security.Groups.&lt;init&gt;(Groups.java:93)
	at org.apache.hadoop.security.Groups.&lt;init&gt;(Groups.java:73)
	at org.apache.hadoop.security.Groups.getUserToGroupsMappingService(Groups.java:293)
	at org.apache.hadoop.security.UserGroupInformation.initialize(UserGroupInformation.java:283)
	at org.apache.hadoop.security.UserGroupInformation.ensureInitialized(UserGroupInformation.java:260)
	at org.apache.hadoop.security.UserGroupInformation.loginUserFromSubject(UserGroupInformation.java:789)
	at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:774)
	at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:647)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
	at org.apache.spark.util.Utils$$anonfun$getCurrentUserName$1.apply(Utils.scala:2422)
	at scala.Option.getOrElse(Option.scala:121)
	at org.apache.spark.util.Utils$.getCurrentUserName(Utils.scala:2422)
	at org.apache.spark.SparkContext.&lt;init&gt;(SparkContext.scala:293)
	at io.github.JeremyBloom.mosdex.SparkTest.&lt;clinit&gt;(SparkTest.java:28)
Caused by: java.lang.StringIndexOutOfBoundsException: begin 0, end 3, length 2
	at java.base/java.lang.String.checkBoundsBeginEnd(String.java:3720)
	at java.base/java.lang.String.substring(String.java:1909)
	at org.apache.hadoop.util.Shell.&lt;clinit&gt;(Shell.java:50)
	... 16 more

I also get the same error when I use SparkContext instead of JavaSparkContext. I've done extensive search for this error and have not seen anyone else who has it, so I don't think it's a bug in Spark. I've used this code in other applications previously (with earlier versions of Spark) without a problem.

I'm using the latest version of Spark (2.4.5). Why isn't this working?

答案1

得分: 2

以下是您提供的内容的翻译部分：

我正在使用Spark 2.4.5和jdk1.8.0_181，对我来说运行正常。

package examples;

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;

import java.util.Arrays;
import java.util.List;

public class SparkTest {
  public static final SparkConf SPARK_CONFIGURATION = new SparkConf().setAppName("MOSDEX").setMaster("local[*]");
  public static final JavaSparkContext SPARK_CONTEXT= new JavaSparkContext(SPARK_CONFIGURATION);
  public static final SparkSession SPARK_SESSION= SparkSession.builder()
    .config(SPARK_CONFIGURATION)
    .getOrCreate();

  public static void main(String[] args) {
    setupTest();
  }

  public static void setupTest() {
    List<Integer> data = Arrays.asList(1, 2, 3, 4, 5);
    JavaRDD<Integer> distData = SPARK_CONTEXT.parallelize(data);
    int sum= distData.reduce((a, b) -> a + b);
    System.out.println("Sum of " + data.toString() + " = " + sum);
    System.out.println();
  }//SetupTest

  public SparkTest() {
    super();
  }

}//class SparkTest

结果：

[2020-04-05 18:14:42,184] INFO Running Spark version 2.4.5 (org.apache.spark.SparkContext:54)

...

[2020-04-05 18:14:44,060] WARN Using an existing SparkContext; some configuration may not take effect. (org.apache.spark.SparkContext:66)
Sum of [1, 2, 3, 4, 5] = 15

据我所知，您在这里提到了与Java版本有关的问题。HADOOP-14586 StringIndexOutOfBoundsException breaks org.apache.hadoop.util.Shell on 2.7.x with Java 9

更改适用于Hadoop版本的Java版本。

在这里查看

Latest Release (Spark 2.4.5) - Apache Spark文档

Spark适用于Java 8，Python 2.7+/3.4+和R 3.1+。对于Scala API，Spark 2.4.5使用Scala 2.12。您需要使用兼容的Scala版本（2.12.x）。

注意：根据评论，Spark不支持Java 13。您需要将Java版本降级到Java 8

英文:

I am using spark 2.4.5 and jdk1.8.0_181 its working fine for me

package examples;



import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.sql.SparkSession;

import java.util.Arrays;
import java.util.List;


public class SparkTest {
  public static final SparkConf SPARK_CONFIGURATION = new SparkConf().setAppName(&quot;MOSDEX&quot;).setMaster(&quot;local[*]&quot;);
  public static final JavaSparkContext SPARK_CONTEXT= new JavaSparkContext(SPARK_CONFIGURATION);
  public static final SparkSession SPARK_SESSION= SparkSession.builder()
    .config(SPARK_CONFIGURATION)
    .getOrCreate();

  public static void main(String[] args) {
    setupTest();
  }

  public static void setupTest() {
    List&lt;Integer&gt; data = Arrays.asList(1, 2, 3, 4, 5);
    JavaRDD&lt;Integer&gt; distData = SPARK_CONTEXT.parallelize(data);
    int sum= distData.reduce((a, b) -&gt; a + b);
    System.out.println(&quot;Sum of &quot; + data.toString() + &quot; = &quot; + sum);
    System.out.println();
  }//SetupTest

  public SparkTest() {
    super();
  }

}//class SparkTest

Result :

[2020-04-05 18:14:42,184] INFO Running Spark version 2.4.5 (org.apache.spark.SparkContext:54)

...

[2020-04-05 18:14:44,060] WARN Using an existing SparkContext; some configuration may not take effect. (org.apache.spark.SparkContext:66)
Sum of [1, 2, 3, 4, 5] = 15

AFAIK you are facing issue with java version as mentioned in this. HADOOP-14586 StringIndexOutOfBoundsException breaks org.apache.hadoop.util.Shell on 2.7.x with Java 9

Change the java version which is suitable with hadoop version.

See here

Latest Release (Spark 2.4.5) - Apache Spark docs

Spark runs on Java 8, Python 2.7+/3.4+ and R 3.1+. For the Scala API, Spark 2.4.5 uses Scala 2.12. You will need to use a compatible Scala version (2.12.x).

NOTE : As per comment, java 13 is not supported by Spark. You need to downgrade the java version to java 8

答案2

得分: 0

结果发现，如果使用的是 Hadoop 版本大于 2.8，你可以使用 Java 13（我现在正在使用的是 Hadoop 2.8.5）。如果你正在使用 Spark 2.4.5，会比较棘手，因为在 Maven 上，它预先构建了适用于 Hadoop 2.6 的版本。你需要为 Hadoop 2.8.5 创建一个独立的依赖，来覆盖预构建的组件。我进行了不少实验才使它生效。另外，我是在 Windows 上工作，所以我还需要将 Hadoop 与 winutils 进行链接，这是另一个复杂的情况。所有这些都没有很好的文档支持，因此我不得不阅读很多 Stackoverflow 上的帖子才使它生效。

英文:

It turns out that if you use Hadoop > 2.8, you can use Java 13 (I'm now using Hadoop 2.8.5). This is tricky if you are using Spark 2.4.5 because on Maven, it comes prebuilt with Hadoop 2.6. You have to create a separate dependency for Hadoop 2.8.5 which overrides the prebuilt components. It took me quite a bit of experimenting to make that work. Plus, I'm working in Windows, so I also needed to link Hadoop with winutils, which is another complication. None of this is very well documented, so I had to read a lot of posts on Stackoverflow to get it working.

答案3

得分: 0

出现了与以下版本相同的问题

Spark 版本为 2.12-2.4.4
Hadoop 版本为 2.6.5
JDK 版本为 java 16，位于 Spring STS 中

解决方案：在 Spring STS 中将 JDK 版本更正为 JDK 1.8，问题得到解决。

英文:

Had the same issue with below versions
 
1)Spark with version 2.12-2.4.4 
2)Hadoop with version 2.6.5 
3)JDK with version java 16 in Spring STS

Solution : In Spring STS corrected the jdk version to jdk 1.8 and issue is resolved.

通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库，让每个人都能够通过互相帮助和分享经验来进步。

字符串索引越界异常在初始化Spark上下文时发生。

问题

答案1

据我所知，您在这里提到了与Java版本有关的问题。HADOOP-14586 StringIndexOutOfBoundsException breaks org.apache.hadoop.util.Shell on 2.7.x with Java 9

Latest Release (Spark 2.4.5) - Apache Spark文档

注意：根据评论，Spark不支持Java 13。您需要将Java版本降级到Java 8

AFAIK you are facing issue with java version as mentioned in this. HADOOP-14586 StringIndexOutOfBoundsException breaks org.apache.hadoop.util.Shell on 2.7.x with Java 9

Latest Release (Spark 2.4.5) - Apache Spark docs

NOTE : As per comment, java 13 is not supported by Spark. You need to downgrade the java version to java 8

答案2

答案3

使用数组列表指针而不是多个循环？

我们如何在webView中加载一个URL，以便存储客户端数据。

为什么 Java 线程在独立代码之后执行？

request.getScheme() 方法返回了错误的值

What's the correct way to type hint an empty list as a literal in python?

如何在Highcharts Gantt中更改本地化的星期名称

如何在同一个流中使用多个过滤器和映射函数？

如何使用Map/Set来将代码优化到O(n)？

.NET MAUI Android在GitHub Actions上构建失败，错误代码为1。

如何在Playwright视觉比较中屏蔽多个定位器？

在C++中，可以使用可变模板参数来检索类型的内部类型。

selenium.common.exceptions.StaleElementReferenceException: Message: stale element reference: stale element not found

Creating and opening a URL to log in to Website via Basic Auth with Robot Framework/Selenium (Python)

AG Grid 在上下文菜单中以大文本形式打开

发表评论