空指针异常由Mapper类中的静态字段引起

huangapple go评论63阅读模式
英文:

NullPointerException caused by static field in Mapper class

问题

我有一个包含自定义MyMapper类的HBase MapReduce批量加载应用程序,其中有一个静态字段parser,在应用程序运行期间使用。在配置作业时,我使用config方法初始化静态字段parser

但是,当作业运行时,注释的那一行抛出了空指针异常,似乎在作业提交给Yarn后,静态字段parser变为null。

这是Mapper代码,Hadoop的版本是2.7.7。

public class MyMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {

  private static StringParser parser;

  public static void config(StringParser parser) {
    MyMapper.parser = parser;
  }

  @Override
  protected void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {
    String lineValue = value.toString();
    String output;
    try {
      // 在这一行出现了空指针异常。
      output = parser.parse(lineValue); 
      context.write(new ImmutableBytesWritable(..., ...);
    } catch (ParseException e) {
      e.printStackTrace();
    }
  }
}

以下是有关作业提交的代码:

Job job = Job.getInstance(conf, "批量导入HBase表:" + tableName);
job.setJarByClass(TextBulkLoadDriver.class);
FileInputFormat.setInputPaths(job, inPath);

// 配置Mapper相关内容,这里我在MyMapper类中设置了静态字段。
MyMapper.config(parser); 
Class<MyMapper> cls = MyMapper.class;
job.setMapperClass(cls);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);

job.setNumReduceTasks(1);
job.setReducerClass(PutSortReducer.class);

RegionLocator locator = instance.getConnection().getRegionLocator(TableName.valueOf(tableName));
try (Admin admin = instance.getAdmin(); Table table = instance.getTable(tableName)) {
  HFileOutputFormat2.configureIncrementalLoad(job, table, locator);
  HFileOutputFormat2.setOutputPath(job, outPath);
  // 运行作业
  job.waitForCompletion(true);
  logger.info("HFileOutputFormat2文件已准备就绪:{}", outPath);
  LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
  loader.doBulkLoad(outPath, admin, table, locator);
} catch (Exception e) {
  throw new RuntimeException(e);
}

感谢您的所有建议!

英文:

I have a HBase MapReduce Bulkload application which include a customized MyMapper Class, and it has a static field parser which is used during the application running, When I config the job, I use config method to init the static field parser.

But when the job is running, the annotated line throws a null pointer exception, seems like after the job being submitted to Yarn, the static field parser becomes null.

This is the Mapper code, the version of hadoop is 2.7.7.

public class MyMapper extends Mapper&lt;LongWritable, Text, ImmutableBytesWritable, Put&gt; {

  private static StringParser parser;

  public static void config(StringParser parser) {
    MyMapper.parser = parser;
  }

  @Override
  protected void map(LongWritable key, Text value, Context context)
      throws IOException, InterruptedException {
    String lineValue = value.toString();
    String output;
    try {
      // null pointer exception this line.
      output = parser.parse(lineValue); 
      context.write(new ImmutableBytesWritable(..., ...);
    } catch (ParseException e) {
      e.printStackTrace();
    }
  }
}

Here is the code about the job submition:

 
    Job job = Job.getInstance(conf, &quot;Batch Import HBase Table&quot; + tableName);
    job.setJarByClass(TextBulkLoadDriver.class);
    FileInputFormat.setInputPaths(job, inPath);

    // Config Mapper related content, here I set the static field in MyMapper class.
    MyMapper.config(parser); 
    Class&lt;MyMapper&gt; cls = MyMapper.class;
    job.setMapperClass(cls);
    job.setMapOutputKeyClass(ImmutableBytesWritable.class);
    job.setMapOutputValueClass(Put.class);

    job.setNumReduceTasks(1);
    job.setReducerClass(PutSortReducer.class);

    RegionLocator locator = instance.getConnection().getRegionLocator(TableName.valueOf(tableName));
    try (Admin admin = instance.getAdmin(); Table table = instance.getTable(tableName)) {
      HFileOutputFormat2.configureIncrementalLoad(job, table, locator);
      HFileOutputFormat2.setOutputPath(job, outPath);
      // run the job
      job.waitForCompletion(true);
      logger.info(&quot;HFileOutputFormat2 file ready on {}&quot;, outPath);
      LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
      loader.doBulkLoad(outPath, admin, table, locator);
    } catch (Exception e) {
      throw new RuntimeException(e);
    }

TIA for all suggestions!

答案1

得分: 1

静态变量不会发送到MapReduce中的分布式数据处理。这些变量仅存储在jobTracker运行的内存中,而不存储在执行节点中。

Yarn通过序列化任务并将其发送到处理节点来将任务分发到节点。静态方法config不会在每个节点上被评估,因此parser对象会变为null。

如果要初始化静态变量,您可能需要将对象序列化并将其发送到每个映射器。

英文:

Static variables are not sent to the distributed data processing in MapReduce. These variables are stored in memory only where the jobTracker is running and not in the executing nodes.

Yarn distributes the tasks to the nodes by serializing the task and sending that to the processing nodes. The static method config will not be going to get evaluated at every node, thus making the parser object null.

If you want to initialize the static variables, you might need to serialize the object and send it to each mapper.

huangapple
  • 本文由 发表于 2023年2月27日 11:57:18
  • 转载请务必保留本文链接:https://go.coder-hub.com/75576668.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定