英文:
NullPointerException caused by static field in Mapper class
问题
我有一个包含自定义MyMapper
类的HBase MapReduce批量加载应用程序,其中有一个静态字段parser
,在应用程序运行期间使用。在配置作业时,我使用config
方法初始化静态字段parser
。
但是,当作业运行时,注释的那一行抛出了空指针异常,似乎在作业提交给Yarn后,静态字段parser
变为null。
这是Mapper代码,Hadoop的版本是2.7.7。
public class MyMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
private static StringParser parser;
public static void config(StringParser parser) {
MyMapper.parser = parser;
}
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String lineValue = value.toString();
String output;
try {
// 在这一行出现了空指针异常。
output = parser.parse(lineValue);
context.write(new ImmutableBytesWritable(..., ...);
} catch (ParseException e) {
e.printStackTrace();
}
}
}
以下是有关作业提交的代码:
Job job = Job.getInstance(conf, "批量导入HBase表:" + tableName);
job.setJarByClass(TextBulkLoadDriver.class);
FileInputFormat.setInputPaths(job, inPath);
// 配置Mapper相关内容,这里我在MyMapper类中设置了静态字段。
MyMapper.config(parser);
Class<MyMapper> cls = MyMapper.class;
job.setMapperClass(cls);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setNumReduceTasks(1);
job.setReducerClass(PutSortReducer.class);
RegionLocator locator = instance.getConnection().getRegionLocator(TableName.valueOf(tableName));
try (Admin admin = instance.getAdmin(); Table table = instance.getTable(tableName)) {
HFileOutputFormat2.configureIncrementalLoad(job, table, locator);
HFileOutputFormat2.setOutputPath(job, outPath);
// 运行作业
job.waitForCompletion(true);
logger.info("HFileOutputFormat2文件已准备就绪:{}", outPath);
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(outPath, admin, table, locator);
} catch (Exception e) {
throw new RuntimeException(e);
}
感谢您的所有建议!
英文:
I have a HBase MapReduce Bulkload application which include a customized MyMapper
Class, and it has a static field parser
which is used during the application running, When I config the job, I use config
method to init the static field parser
.
But when the job is running, the annotated line throws a null pointer exception, seems like after the job being submitted to Yarn, the static field parser
becomes null.
This is the Mapper code, the version of hadoop is 2.7.7.
public class MyMapper extends Mapper<LongWritable, Text, ImmutableBytesWritable, Put> {
private static StringParser parser;
public static void config(StringParser parser) {
MyMapper.parser = parser;
}
@Override
protected void map(LongWritable key, Text value, Context context)
throws IOException, InterruptedException {
String lineValue = value.toString();
String output;
try {
// null pointer exception this line.
output = parser.parse(lineValue);
context.write(new ImmutableBytesWritable(..., ...);
} catch (ParseException e) {
e.printStackTrace();
}
}
}
Here is the code about the job submition:
Job job = Job.getInstance(conf, "Batch Import HBase Table:" + tableName);
job.setJarByClass(TextBulkLoadDriver.class);
FileInputFormat.setInputPaths(job, inPath);
// Config Mapper related content, here I set the static field in MyMapper class.
MyMapper.config(parser);
Class<MyMapper> cls = MyMapper.class;
job.setMapperClass(cls);
job.setMapOutputKeyClass(ImmutableBytesWritable.class);
job.setMapOutputValueClass(Put.class);
job.setNumReduceTasks(1);
job.setReducerClass(PutSortReducer.class);
RegionLocator locator = instance.getConnection().getRegionLocator(TableName.valueOf(tableName));
try (Admin admin = instance.getAdmin(); Table table = instance.getTable(tableName)) {
HFileOutputFormat2.configureIncrementalLoad(job, table, locator);
HFileOutputFormat2.setOutputPath(job, outPath);
// run the job
job.waitForCompletion(true);
logger.info("HFileOutputFormat2 file ready on {}", outPath);
LoadIncrementalHFiles loader = new LoadIncrementalHFiles(conf);
loader.doBulkLoad(outPath, admin, table, locator);
} catch (Exception e) {
throw new RuntimeException(e);
}
TIA for all suggestions!
答案1
得分: 1
静态变量不会发送到MapReduce中的分布式数据处理。这些变量仅存储在jobTracker
运行的内存中,而不存储在执行节点中。
Yarn通过序列化任务并将其发送到处理节点来将任务分发到节点。静态方法config
不会在每个节点上被评估,因此parser
对象会变为null。
如果要初始化静态变量,您可能需要将对象序列化并将其发送到每个映射器。
英文:
Static variables are not sent to the distributed data processing in MapReduce. These variables are stored in memory only where the jobTracker
is running and not in the executing nodes.
Yarn distributes the tasks to the nodes by serializing the task and sending that to the processing nodes. The static method config
will not be going to get evaluated at every node, thus making the parser
object null.
If you want to initialize the static variables, you might need to serialize the object and send it to each mapper.
通过集体智慧和协作来改善编程学习和解决问题的方式。致力于成为全球开发者共同参与的知识库,让每个人都能够通过互相帮助和分享经验来进步。
评论