1

I am a new Hadoop user. I run this sample code from Hadoop Beginner's Guide (Garry Turkington) but I encounter a problem of job failed. I haven't see output file(part file) in my output folder.I did many changes in mapred-site.xml file but I cannot solve the problem of job failed. How should I do?

import java.io.IOException;
import org.apache.hadoop.conf.* ;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.* ;
import org.apache.hadoop.mapred.* ;
import org.apache.hadoop.mapred.lib.* ;
import org.apache.hadoop.mapred.SkipBadRecords;

public class SkipData
{
    public static class MapClass extends MapReduceBase implements   Mapper<LongWritable, Text, Text, LongWritable>
    {
        private final static LongWritable one = new LongWritable(1);
        private Text word = new Text("totalcount");

        public void map(LongWritable key, Text value,     OutputCollector<Text,LongWritable> output, Reporter reporter) throws IOException
        {
            String line = value.toString();
            if (line.equals("skiptext"))
                throw new RuntimeException("Found skiptext") ;
            output.collect(word, one);
        }
    }

    public static void main(String[] args) throws Exception
    {
        System.setProperty("hadoop.home.dir", "/home/saung/hadoop-2.7.4/");
        Configuration config = new Configuration() ;
        JobConf conf = new JobConf(config, SkipData.class);
        conf.setJobName("SkipData");
        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(LongWritable.class);
        conf.setMapperClass(MapClass.class);
        conf.setCombinerClass(LongSumReducer.class);
        conf.setReducerClass(LongSumReducer.class);
        FileInputFormat.setInputPaths(conf,args[0]) ;
        FileOutputFormat.setOutputPath(conf, new Path(args[1])) ;
        JobClient.runJob(conf);
    }
}  

mapred-site.xml

<configuration>

<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>

<property>
<name>mapred.skip.map.max.skip.records</name>
<value>5</value>
</property>

<property>
<name>mapred.map.max.attempts</name>
<value>4</value>
</property>

<property>
<name>mapred.reduce.max.attempts</name>
<value>4</value>
</property>

<property>
<name>mapreduce.max.map.failures.percent</name>
<value>20</value>
</property>

</configuration>

input file got form this ruby file

File.open('skipdata.txt','w') do |file|
3.times do
500000.times{file.write("A valid record\n")}
5.times{file.write("skiptext\n")}
end
500000.times{file.write("A valid record\n")}
end

I used 8 map tasks and errors are look like this:

 18/02/27 14:21:50 INFO mapred.Task: Task     'attempt_local1745706455_0001_m_000007_0' done.
18/02/27 14:21:50 INFO mapred.LocalJobRunner: Finishing task:     attempt_local1745706455_0001_m_000007_0
18/02/27 14:21:50 INFO mapred.LocalJobRunner: map task executor complete.
18/02/27 14:21:50 DEBUG ipc.Client: IPC Client (1882349076) connection to     localhost/127.0.0.1:9000 from saung sending #18
18/02/27 14:21:50 DEBUG security.UserGroupInformation: PrivilegedAction          as:saung (auth:SIMPLE)  from:org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:670)
18/02/27 14:21:50 DEBUG security.UserGroupInformation: PrivilegedAction  as:saung (auth:SIMPLE)  from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
18/02/27 14:21:50 DEBUG security.UserGroupInformation: PrivilegedAction  as:saung (auth:SIMPLE)  from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
18/02/27 14:21:50 DEBUG ipc.Client: IPC Client (1882349076) connection to      localhost/127.0.0.1:9000 from naychi got value #18
18/02/27 14:21:50 DEBUG ipc.ProtobufRpcEngine: Call: delete took 69ms
18/02/27 14:21:50 WARN mapred.LocalJobRunner: job_local1745706455_0001
java.lang.Exception: java.lang.RuntimeException: Found skiptext
at  org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)
at     org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)
Caused by: java.lang.RuntimeException: Found skiptext
at mapredpack.SkipData$MapClass.map(SkipData.java:81)
at mapredpack.SkipData$MapClass.map(SkipData.java:1)
at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner  .java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at  java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at  java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/02/27 14:21:50 DEBUG security.UserGroupInformation: PrivilegedAction  as:saung (auth:SIMPLE)  from:org.apache.hadoop.fs.FileContext.getAbstractFileSystem(FileContext.java:3 31)
18/02/27 14:21:51 INFO mapred.LocalJobRunner: hdfs://localhost:9000 /taskfailure/skipdata.txt:4194304+4194304 > map
18/02/27 14:21:51 INFO mapreduce.Job:  map 69% reduce 0%
18/02/27 14:21:51 DEBUG security.UserGroupInformation: PrivilegedAction  as:saung (auth:SIMPLE)  from:org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:670)
18/02/27 14:21:51 DEBUG security.UserGroupInformation: PrivilegedAction as:saung (auth:SIMPLE)  from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
18/02/27 14:21:51 DEBUG security.UserGroupInformation: PrivilegedAction  as:saung (auth:SIMPLE)  from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
18/02/27 14:21:51 DEBUG security.UserGroupInformation: PrivilegedAction as:saung (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:670)
18/02/27 14:21:51 DEBUG security.UserGroupInformation: PrivilegedAction as:saung (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
18/02/27 14:21:51 DEBUG security.UserGroupInformation: PrivilegedAction as:saung (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.updateStatus(Job.java:320)
18/02/27 14:21:51 INFO mapreduce.Job: Job job_local1745706455_0001 failed with state FAILED due to: NA
18/02/27 14:21:51 DEBUG security.UserGroupInformation: PrivilegedAction as:saung (auth:SIMPLE) from:org.apache.hadoop.mapreduce.Job.getCounters(Job.java:758)
18/02/27 14:21:51 INFO mapreduce.Job: Counters: 23
File System Counters
    FILE: Number of bytes read=24806
    FILE: Number of bytes written=1741540
    FILE: Number of read operations=0
    FILE: Number of large read operations=0
    FILE: Number of write operations=0
    HDFS: Number of bytes read=107677710
    HDFS: Number of bytes written=0
    HDFS: Number of read operations=70
    HDFS: Number of large read operations=0
    HDFS: Number of write operations=7
Map-Reduce Framework
    Map input records=1381528
    Map output records=1381527
    Map output bytes=26249013
    Map output materialized bytes=135
    Input split bytes=588
    Combine input records=1161148
    Combine output records=5
    Spilled Records=5
    Failed Shuffles=0
    Merged Map outputs=0
    GC time elapsed (ms)=2000
    Total committed heap usage (bytes)=3535798272
File Input Format Counters 
    Bytes Read=20743175
Exception in thread "main" java.io.IOException: Job failed!
at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:873)
at mapredpack.SkipData.main(SkipData.java:100)
18/02/27 14:21:51 DEBUG ipc.Client: stopping client from cache:  org.apache.hadoop.ipc.Client@2cd2a21f
18/02/27 14:21:51 DEBUG ipc.Client: removing client from cache: org.apache.hadoop.ipc.Client@2cd2a21f
18/02/27 14:21:51 DEBUG ipc.Client: stopping actual client because no more references remain: org.apache.hadoop.ipc.Client@2cd2a21f
18/02/27 14:21:51 DEBUG ipc.Client: Stopping client
18/02/27 14:21:51 DEBUG ipc.Client: IPC Client (1882349076) connection to localhost/127.0.0.1:9000 from saung: closed
18/02/27 14:21:51 DEBUG ipc.Client: IPC Client (1882349076) connection to localhost/127.0.0.1:9000 from saung: stopped, remaining connections 0
18/02/27 14:21:51 INFO mapred.LocalJobRunner: hdfs://localhost:9000/taskfailure/skipdata.txt:12582912+4194304 > map
2
  • Please provide the full stacktrace Commented Feb 27, 2018 at 7:31
  • I provided with required data. How should I do to prevent failed job? Commented Feb 27, 2018 at 8:34

1 Answer 1

1

It looks to me as if this expected behavior.

  • You have a Mapper that is designed to throw an exception when it encounters a record (line) "skiptext".

  • Your input contains some lines of that form.

  • The exception is being thrown ...

  • .... and the job is failing.

I suspect that you are trying to implement bad record skipping as described in the MapReduce Tutorial. However, it doesn't look like you are doing it correctly.

I can see that you have imported org.apache.hadoop.mapred.SkipBadRecords ... but simply importing a class in Java doesn't actually do anything.

According to the Tutorial:

By default this feature is disabled.

For enabling it, refer to SkipBadRecords.setMapperMaxSkipRecords(Configuration, long) and SkipBadRecords.setReducerMaxSkipGroups(Configuration, long).

Sign up to request clarification or add additional context in comments.

2 Comments

Thanks. I tried to add these two lines in main(), but I still have this problem. Could you give me suggestion again ?
You should read the tutorial and the javadocs for those methods so that you understand what you are doing. (And reread what I wrote. For a start, I did not suggest that you add two lines, and nor do the lines I quoted from the tutorial.)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.