Failed to ".add(StringTokenizer.nextToken())" in an ArrayList<String> inside Hadoop's MapReducer code

huangapple go评论69阅读模式
英文:

Failed to ".add(StringTokenizer.nextToken())" in an ArrayList<String> inside Hadoop's MapReducer code

问题

以下是您提供的Java代码的翻译部分:

我正在尝试将StringTokenizer.nextToken()添加到我的Hadoop Map Reduce代码中的ArrayList<String>代码运行正常一旦我添加了SstringTokenizer行它突然中断

这是我的代码
```java
public void map(Object key, Text value, Context context
) throws IOException, InterruptedException {

    List<String> texts = new ArrayList<String>();  

    StringTokenizer itr = new StringTokenizer(value.toString(), "P");

    while (itr.hasMoreTokens()) {
        System.out.println(itr.nextToken());
        texts.add(itr.nextToken());  //代码在这里中断了
    }
}

注意,我尚未在此代码中添加Hadoop的Text类以进行写入,但它与我的先前代码一起工作正常。

这是我的Reducer:

public static class IntSumReducer
        extends Reducer<Text, IntWritable, Text, IntWritable> {
    private IntWritable result = new IntWritable();

    public void reduce(Text key, Iterable<IntWritable> values,
                       Context context
    ) throws IOException, InterruptedException {
        int sum = 0;
        for (IntWritable val : values) {
            sum += val.get();
        }
        result.set(sum);
        context.write(key, result);
    }
}

这是主要的部分:

public static void main(String[] args) throws Exception {
    Configuration conf = new Configuration();
    Job job = Job.getInstance(conf, "word count");
    job.setJarByClass(JobCount.class);
    job.setMapperClass(TokenizerMapper.class);
    job.setCombinerClass(IntSumReducer.class);
    job.setReducerClass(IntSumReducer.class);
    job.setOutputKeyClass(Text.class);
    job.setOutputValueClass(IntWritable.class);
    FileInputFormat.addInputPath(job, new Path(args[0]));
    FileOutputFormat.setOutputPath(job, new Path(args[1]));
    System.exit(job.waitForCompletion(true) ? 0 : 1);

}

注意:我还尝试使用普通的数组,但它仍然中断了。

该项目正在Java 8 JDK上运行,并已导入Maven的HadoopCommon版本3.3.0和HadoopCore版本1.2.0 [Mac OS]。

这是我的错误日志:

警告:发生了不合法的反射访问操作
警告:不合法的反射访问由org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/domesama/.m2/repository/org/apache/hadoop/hadoop-core/1.2.1/hadoop-core-1.2.1.jar)对sun.security.krb5.Config.getInstance()方法的操作引发
警告:请考虑向org.apache.hadoop.security.authentication.util.KerberosUtil的维护者报告此问题
警告:使用--illegal-access=warn来启用更多不合法的反射访问操作的警告
警告:所有不合法的访问操作将在将来的版本中被拒绝
20/09/15 14:18:07 WARN util.NativeCodeLoader: 无法加载本地Hadoop库以用于您的平台...在适用的地方使用内置的Java类
20/09/15 14:18:07 WARN mapred.JobClient: 使用GenericOptionsParser来解析参数。应用程序应该实现Tool来实现相同的功能。
20/09/15 14:18:07 WARN mapred.JobClient: 未设置作业jar文件。用户类可能无法找到。请参阅JobConf(Class)或JobConf#setJar(String)。
20/09/15 14:18:07 INFO input.FileInputFormat: 要处理的总输入路径:1
20/09/15 14:18:07 WARN snappy.LoadSnappy: 未加载Snappy本机库
20/09/15 14:18:07 INFO mapred.JobClient: 正在运行作业:job_local1465674096_0001
20/09/15 14:18:07 INFO mapred.LocalJobRunner: 等待映射任务
20/09/15 14:18:07 INFO mapred.LocalJobRunner: 启动任务:attempt_local1465674096_0001_m_000000_0
...

System.out.print(itr.nextToken());也打印了,但似乎以某种方式执行了texts.add(itr.nextToken()); //代码在这里中断了。也许我需要在我的代码中使用类似于JS中的await异步(等待)操作?


请注意,这些翻译仅包括代码和相关注释的内容,没有其他内容。如果您需要进一步的帮助,请随时提问。

<details>
<summary>英文:</summary>

I am working on trying to add StringTokenizer.nextToken() to an ArrayList&lt;String&gt; within my Hadoop Map Reduce code. The code works just fine and has an output file once run, but it once I&#39;ve added an SstringTokenizer line it suddenly broke. 

Here&#39;s my code:
```java
public void map(Object key, Text value, Context context
        ) throws IOException, InterruptedException {

            List&lt;String&gt; texts = new ArrayList&lt;String&gt;();  

            StringTokenizer itr = new StringTokenizer(value.toString(), &quot;P&quot;);

            while (itr.hasMoreTokens()) {
                System.out.println(itr.nextToken());
                texts.add(itr.nextToken());  //The code broke here
            }
      }

Note I didn't add the Hadoop's Text Class to write just yet in this code, but it works with my previous code.

Here's my Reducer

   public static class IntSumReducer
            extends Reducer&lt;Text, IntWritable, Text, IntWritable&gt; {
        private IntWritable result = new IntWritable();

        public void reduce(Text key, Iterable&lt;IntWritable&gt; values,
                           Context context
        ) throws IOException, InterruptedException {
            int sum = 0;
            for (IntWritable val : values) {
                sum += val.get();
            }
            result.set(sum);
            context.write(key, result);
        }
    }

Here's the .main

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, &quot;word count&quot;);
        job.setJarByClass(JobCount.class);
        job.setMapperClass(TokenizerMapper.class);
        job.setCombinerClass(IntSumReducer.class);
        job.setReducerClass(IntSumReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(IntWritable.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);

    }
    ```

Note: I&#39;ve also tried using the normal Array and it still broke.

The project is running on Java 8 jdk and has imported Maven&#39;s HadoopCommon version 3.3.0 and HadoopCore of 1.2.0 [Mac OS] 

Here's my error log:

WARNING: An illegal reflective access operation has occurred
WARNING: Illegal reflective access by org.apache.hadoop.security.authentication.util.KerberosUtil (file:/Users/domesama/.m2/repository/org/apache/hadoop/hadoop-core/1.2.1/hadoop-core-1.2.1.jar) to method sun.security.krb5.Config.getInstance()
WARNING: Please consider reporting this to the maintainers of org.apache.hadoop.security.authentication.util.KerberosUtil
WARNING: Use --illegal-access=warn to enable warnings of further illegal reflective access operations
WARNING: All illegal access operations will be denied in a future release
20/09/15 14:18:07 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
20/09/15 14:18:07 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
20/09/15 14:18:07 WARN mapred.JobClient: No job jar file set.  User classes may not be found. See JobConf(Class) or JobConf#setJar(String).
20/09/15 14:18:07 INFO input.FileInputFormat: Total input paths to process : 1
20/09/15 14:18:07 WARN snappy.LoadSnappy: Snappy native library not loaded
20/09/15 14:18:07 INFO mapred.JobClient: Running job: job_local1465674096_0001
20/09/15 14:18:07 INFO mapred.LocalJobRunner: Waiting for map tasks
20/09/15 14:18:07 INFO mapred.LocalJobRunner: Starting task: attempt_local1465674096_0001_m_000000_0
20/09/15 14:18:07 INFO mapred.Task:  Using ResourceCalculatorPlugin : null
20/09/15 14:18:07 INFO mapred.MapTask: Processing split: file:/Users/domesama/Desktop/Github Respositories/HadoopMapReduce/input/SampleFile.txt:0+1891
20/09/15 14:18:07 INFO mapred.MapTask: io.sort.mb = 100
20/09/15 14:18:07 INFO mapred.MapTask: data buffer = 79691776/99614720
20/09/15 14:18:07 INFO mapred.MapTask: record buffer = 262144/327680
20/09/15 14:18:07 INFO mapred.MapTask: Starting flush of map output
20/09/15 14:18:07 INFO mapred.LocalJobRunner: Map task executor complete.
20/09/15 14:18:07 WARN mapred.LocalJobRunner: job_local1465674096_0001
java.lang.Exception: java.util.NoSuchElementException
	at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.util.NoSuchElementException
	at java.base/java.util.StringTokenizer.nextToken(StringTokenizer.java:349)
	at JobCount$TokenizerMapper.map(JobCount.java:50)
	at JobCount$TokenizerMapper.map(JobCount.java:20)
	at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:145)
	at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
	at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
	at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:830)
,84,01,02600,01,1007549,00065,19,1,,,2,2,2,2,2,,,2,,2,,,,1,2,2,2,2,2,2,0000000,,,,2,5,,,,,,1,4,,,,,,,,,,2,5,2,2,3,000000,00000,17,000000,2,15,19,0000000,2,00000,00000,0000000,,3,,2,,4,999,999,,2,,,6,,,1,01,,,,,,,6,,1,,0,,,000000000,000000000,028,,,,1,2,1,1,01,001,0,0,0,0,1,0,0,1,0,,,,,,,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,,0,,0,,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,00005,00127,00065,00066,00069,00005,00120,00066,00063,00005,00067,00006,00005,00137,00124,00065,00066,00064,00063,00006,00131,00006,00062,00063,00060,00126,00006,00066,00068,00120,00066,00126,00115,00005,00005,00063,00066,00066,00062,00005,00118,00006,00064,00066,00062,00124,00006,00063,00068,00132,00062,00119,00126,00006,00005,00068,00072,00065,00066,00125,00005,00123,00062,00064,00065,00006,00123,00065,00067,00006,00068,00006,00005,00127,00119,00063,00068,00067,00064,00122
20/09/15 14:18:08 INFO mapred.JobClient:  map 0% reduce 0%
20/09/15 14:18:08 INFO mapred.JobClient: Job complete: job_local1465674096_0001
20/09/15 14:18:08 INFO mapred.JobClient: Counters: 0

The System.out.print(itr.nextToken()); did also print, but it seems like it somehow execute the

texts.add(itr.nextToken());  //The code broke here

Perhaps I may need something like await async (like in JS) in my code?

答案1

得分: 2

如果您使用 StringTokenizer,您总是需要在调用 nextToken() 方法之前调用 hasMoreTokens() 方法来检查是否还有任何标记剩余,而在您的代码中您调用了两次 nextToken()

修复方法应该是在循环中只调用一次 nextToken()

while (itr.hasMoreTokens()) {
    String token = itr.nextToken();  // 每个 hasMoreTokens 调用一次
    System.out.println(token);
    texts.add(token);  
}
英文:

If you use StringTokenizer you always need to call hasMoreTokens() method to check if there is any token left before calling nextToken(), while in your code you call nextToken() twice.

The fix should be just to call nextToken() one time in the loop.

while (itr.hasMoreTokens()) {
    String token = itr.nextToken();  // one call for each hasMoreTokens
    System.out.println(token);
    texts.add(token);  
}

huangapple
  • 本文由 发表于 2020年9月15日 15:32:55
  • 转载请务必保留本文链接:https://go.coder-hub.com/63897117.html
匿名

发表评论

匿名网友

:?: :razz: :sad: :evil: :!: :smile: :oops: :grin: :eek: :shock: :???: :cool: :lol: :mad: :twisted: :roll: :wink: :idea: :arrow: :neutral: :cry: :mrgreen:

确定